FUDforum: comp.lang.php » Trying to decode text that is supposed to be ISO-8859-1

Home » Imported messages » comp.lang.php » Trying to decode text that is supposed to be ISO-8859-1

Show: Today's Messages :: Polls :: Message Navigator

Re: Trying to decode text that is supposed to be ISO-8859-1 [message #175421 is a reply to message #175404]

Tue, 20 September 2011 22:14

Thomas 'PointedEars'
Messages: 701
Registered: October 2010

Karma:

Senior Member

Peter H. Coffin wrote:

> On Wed, 14 Sep 2011 14:07:27 +0200, Thomas 'PointedEars' Lahn wrote:
>> Peter H. Coffin wrote:
>>> On Tue, 13 Sep 2011 19:56:20 -0600, Bart Kastermans wrote:
>>>> I have downloaded a file that claims to be ISO-8859-1. In it (among
>>>> many other stuff) are the bytes shown here (first column is the
>>>> character, the second is ord(character), the third and fourth are
>>>> binary respectively hexidecimal representations of the character.
>>>>
>>>> P / 80 / 01010000 / 50
>>>> l / 108 / 01101100 / 6c
>>>> z / 122 / 01111010 / 7a
>>>> e / 101 / 01100101 / 65
>>>> \303 / 195 / 11000011 / c3
>>>> \205 / 133 / 10000101 / 85
>>>> \313 / 203 / 11001011 / cb
>>>> \206 / 134 / 10000110 / 86
>>>>
>>>> This is supposed to be ISO-8859-1 encoded, and should encode the
>>>> character U+0148 (\v{n}; Latin small letter n with caron).
>>>>
>>>> Does anybody have any idea how I could decode this (or how it was
>>>> encoded in the first place)? Any suggestions would be greatly
>>>> appreciated.
>>> It's UTF-8 encoded representation of a false ISO-8859-1(? probably
>>> CP1251, actually) [???]
>> Windows-125_2_ (Western) corresponds largely with ISO-8859-1.
>> Windows-1251, which is the proper name for that character set and
>> encoding, is Cyrillic above 0x7F, and corresponds largely with
>> ISO-8859-5.
>
> Yeah, I know that. But there's 0x8n values in the hex that don't
> represent in 8859-1 but do in CP1251. And there's a LOT more
> charset-unaware stuff out there that assumes all the world is CP1251
> than assumes everything is 8859-1.

You are missing the point. Windows-125*1* (or "CP1251" as you put it) is
not remotely the same as ISO-8859-1x; Windows-125_2_ is.

It is also misleading to state that 0x85 and 0x86 had no representation in
the widely unused ISO/IEC 8859-1 because that encoding is _not_ equivalent
to ISO-8859-1, which is what the OP stated and you referred to instead. In
ISO-8859-1, 0x85 represents NEL (ISO C1 Next Line, marks end-of-line on some
IBM Mainframes) and 0x86 represents SSA (ISO C1 Start of Selected Area, used
by block-oriented terminals).

PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)

Report message to a moderator

[Message index]

		Trying to decode text that is supposed to be ISO-8859-1 By: Bart Kastermans on Wed, 14 September 2011 01:56
		Re: Trying to decode text that is supposed to be ISO-8859-1 By: Peter H. Coffin on Wed, 14 September 2011 03:42
		Re: Trying to decode text that is supposed to be ISO-8859-1 By: Thomas 'PointedEars' on Wed, 14 September 2011 12:07
		Re: Trying to decode text that is supposed to be ISO-8859-1 By: Peter H. Coffin on Wed, 14 September 2011 13:37
		Re: Trying to decode text that is supposed to be ISO-8859-1 By: Thomas 'PointedEars' on Tue, 20 September 2011 22:14
		Re: Trying to decode text that is supposed to be ISO-8859-1 By: alvaro.NOSPAMTHANX on Wed, 14 September 2011 10:25

Previous Topic:	php developer
Next Topic:	Website Designer and SEO - Ahmedabad

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Apr 29 13:43:54 GMT 2025

Total time taken to generate the page: 0.03910 seconds