FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Processing accented characters submitted from forms
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Processing accented characters submitted from forms [message #184503 is a reply to message #184500] Fri, 03 January 2014 16:08 Go to previous messageGo to previous message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma:
Senior Member
Thomas 'Ingrid' Lahn wrote:

> JohnT wrote:
>> On Fri, 03 Jan 2014 13:53:04 +0100, Thomas 'PointedEars' Lahn wrote:
>>> JohnT wrote:
>>>> [UTF-8] is the PHP 5 default,
>>>
>>> How did you get this idea?
>>
>> http://uk1.php.net/manual/en/function.htmlentities.php
>>
>> says:
>> Like htmlspecialchars(), htmlentities() takes an optional third
>> argument encoding which defines encoding used in conversion. If omitted,
>> the default value for this argument is ISO-8859-1 in versions of PHP
>> prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
>
> […]
> That said, htmlentities() is insufficient to represent arbitrary Unicode
> characters, encoded with UTF-8 server-side, in an HTML document if the
> document encoding is not UTF-8; you would have to use htmlspecialchars()
> which has the same default parameter value since PHP 5.4.0.
>
> <http://php.net/htmlspecialchars>

Actually, it is worse. In such a document, to refer to even those Unicode
characters for which there is *not* a character entity reference in HTML,
you have to use mb_encode_numericentity():

$ php -r 'echo mb_encode_numericentity("∎", array(0x0, 0x10000, 0, 0xfffff),
"UTF-8") . PHP_EOL;'
&#8718;

$ locale
LANG=de_CH.UTF-8
LANGUAGE=
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=

----------

<http://php.net/mb_encode_numericentity>

None of this is necessary if you use UTF-8 throughout.


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: ORMs comparisons/complaints.
Next Topic: thank you, richard@noreply
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu Sep 19 21:41:55 GMT 2024

Total time taken to generate the page: 0.04197 seconds