FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » encoding
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
encoding [message #186326] Thu, 26 June 2014 01:50 Go to next message
Denis McMahon is currently offline  Denis McMahon
Messages: 634
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Tear my hair out time.

I have a csv file that contains text strings that I wish to display in a
web page.

The csv file is utf-8, and the text strings include the british pound
symbol encoded as two bytes 0xc2/0xa3

I'm using

setlocale(LC_CTYPE|LC_COLLATE,"en_GB.UTF8");

before reading the csv file, which I hope means that the csv file is read
as utf-8.

Then I feed the string through htmlentities() before adding it to the web
page.

However, the web page that arrives at the client has £
instead of just £.

I'm not sure where it's going wrong, partly because right now I may be
too tired to work out where and how I can inspect the string without
character encodings getting in the way.

If I print_r the data that has been read in to the web page, that shows
ok, but at that point it's still utf-8, not an html entity.

The following is at http://www.sined.co.uk/tmp/pound.php and seems to
demonstrate the issue:

<?php
setlocale(LC_CTYPE|LC_COLLATE, "en_GB.UTF8");
$str1 = "\xc2\xa3";
$str2 = htmlentities( "$str1" );
echo <<< EOT
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Broken Pound</title>
<head>
<body>
<pre>
Original: {$str1}
Entities: {$str2}
</pre>
</body>
</html>
EOT;

I'm not sure how to fix this. Ideas anyone?

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
Re: encoding [message #186327 is a reply to message #186326] Thu, 26 June 2014 03:04 Go to previous messageGo to next message
Denis McMahon is currently offline  Denis McMahon
Messages: 634
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On Thu, 26 Jun 2014 05:50:06 +0000, Denis McMahon wrote:

> I'm not sure how to fix this. Ideas anyone?

Fix was:

htmlentities( $string, ENT_COMPAT, "UTF-8" );

Not sure if I actually need the setlocale or not. Seems to work without
it.

ENT_HTML5 isn't supported in my server distro's current php (5.3) ...
mutter mutter

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
Re: encoding [message #186328 is a reply to message #186327] Thu, 26 June 2014 07:20 Go to previous message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Denis McMahon wrote:

> htmlentities( $string, ENT_COMPAT, "UTF-8" );
>
> Not sure if I actually need the setlocale or not. Seems to work without
> it.

In PHP < 5.4 the default of the 3rd parameter is 'ISO-8859-1', so
setting this parameter appropriately is important when $string may
contain non ASCII characters. For instance:

htmlentities("\xC3\xA4", ENT_COMPAT, 'ISO-8859-1');
// => '&Atilde;&curren;'

htmlentities("\xC3\xA4", ENT_COMPAT, 'UTF-8');
// => '&auml;'

--
Christoph M. Becker
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: SplFileObject always returns an extra "last" line -- why?
Next Topic: how to revise my User Contributed Note on php.net
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Oct 18 07:29:51 EDT 2017

Total time taken to generate the page: 0.00758 seconds