FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Trying to decode text that is supposed to be ISO-8859-1
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Trying to decode text that is supposed to be ISO-8859-1 [message #175404 is a reply to message #175401] Wed, 14 September 2011 13:37 Go to previous messageGo to previous message
Peter H. Coffin is currently offline  Peter H. Coffin
Messages: 245
Registered: September 2010
Karma:
Senior Member
On Wed, 14 Sep 2011 14:07:27 +0200, Thomas 'PointedEars' Lahn wrote:
> Peter H. Coffin wrote:
>
>> On Tue, 13 Sep 2011 19:56:20 -0600, Bart Kastermans wrote:
>>> I have downloaded a file that claims to be ISO-8859-1. In it (among
>>> many other stuff) are the bytes shown here (first column is the
>>> character, the second is ord(character), the third and fourth are binary
>>> respectively hexidecimal representations of the character.
>>>
>>> P / 80 / 01010000 / 50
>>> l / 108 / 01101100 / 6c
>>> z / 122 / 01111010 / 7a
>>> e / 101 / 01100101 / 65
>>> \303 / 195 / 11000011 / c3
>>> \205 / 133 / 10000101 / 85
>>> \313 / 203 / 11001011 / cb
>>> \206 / 134 / 10000110 / 86
>>>
>>> This is supposed to be ISO-8859-1 encoded, and should encode the
>>> character U+0148 (\v{n}; Latin small letter n with caron).
>>>
>>> Does anybody have any idea how I could decode this (or how it was
>>> encoded in the first place)? Any suggestions would be greatly
>>> appreciated.
>>
>> It's UTF-8 encoded representation of a false ISO-8859-1(? probably
>> CP1251, actually) [???]
>
> Windows-125_2_ (Western) corresponds largely with ISO-8859-1. Windows-1251,
> which is the proper name for that character set and encoding, is Cyrillic
> above 0x7F, and corresponds largely with ISO-8859-5.

Yeah, I know that. But there's 0x8n values in the hex that don't
represent in 8859-1 but do in CP1251. And there's a LOT more
charset-unaware stuff out there that assumes all the world is CP1251
than assumes everything is 8859-1.

--
A government big enough to give you everything you want is a government
big enough to take from you everything you have.
-- Gerald Ford in an address to Congress on August 12, 1974
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: php developer
Next Topic: Website Designer and SEO - Ahmedabad
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 14:53:24 GMT 2024

Total time taken to generate the page: 0.03745 seconds