FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Trying to decode text that is supposed to be ISO-8859-1
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Trying to decode text that is supposed to be ISO-8859-1 [message #175399 is a reply to message #175395] Wed, 14 September 2011 10:25 Go to previous message
alvaro.NOSPAMTHANX is currently offline  alvaro.NOSPAMTHANX
Messages: 277
Registered: September 2010
Karma:
Senior Member
El 14/09/2011 3:56, Bart Kastermans escribió/wrote:
> I have downloaded a file that claims to be ISO-8859-1. In it (among
> many other stuff) are the bytes shown here (first column is the
> character, the second is ord(character), the third and fourth are binary
> respectively hexidecimal representations of the character.
>
> P / 80 / 01010000 / 50
> l / 108 / 01101100 / 6c
> z / 122 / 01111010 / 7a
> e / 101 / 01100101 / 65
> \303 / 195 / 11000011 / c3
> \205 / 133 / 10000101 / 85
> \313 / 203 / 11001011 / cb
> \206 / 134 / 10000110 / 86
>
> This is supposed to be ISO-8859-1 encoded, and should encode the
> character U+0148 (\v{n}; Latin small letter n with caron).

Funny... I think that character (ň) does not even exist in ISO-8859-1:

http://www.fileformat.info/info/unicode/char/148/index.htm
http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout

And in fact the 0x85 and 0x86 positions are empty in ISO-8859-1.

The mb_detect_encoding() function suggests that the string is actually
in UTF-8 and contains two chars: 0xC385 and 0xCB86 (ň). The "ň" string
is exactly what you get if you encode "ň" in UTF-8 and try to display as
ISO-8859-1, so I guess that's what the data creator is doing.


> Does anybody have any idea how I could decode this (or how it was
> encoded in the first place)? Any suggestions would be greatly
> appreciated.

To begin with, you cannot use ISO-8859-1 as target encoding if you want
to use U+0148.

Now, if you decide to switch to UTF-8... well, I'll report back if I
find something more precise :)



--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: php developer
Next Topic: Website Designer and SEO - Ahmedabad
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 14:39:50 GMT 2024

Total time taken to generate the page: 0.04408 seconds