FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Processing accented characters submitted from forms
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Processing accented characters submitted from forms [message #184502 is a reply to message #184498] Fri, 03 January 2014 15:11 Go to previous messageGo to previous message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma:
Senior Member
On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:
>
>> On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:
>>
>>> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>>>> We're already using iso-8859-1 for the whole website. It will be a lot
>>>> of work to change all that, so I guess we'll have to put up with the
>>>> odd Turkish I causing problems.
>>>
>>> It's not clear (to me at least) what's happening to the data, but as far
>>> as any normal set of HTML pages are concerned (PHP generated or
>>> otherwise) you don't have to put up with a dotted I causing problems on
>>> an ISO-8859-1 encoded page. You can represent any Unicode character in
>>> a page using character entities (browser and font support is always and
>>> issue but not nowadays for anything as ordinary as İ).
>>
>> I think it must be the browser that is encoding the character because İ
>> is not supported by iso-8859-1.
>
> Note that the browser behaviour can be altered by form attributes
> (specifically accept-charset). You can have a form that accepts UTF-8
> on an ISO-8859-1 served page.
>
>> It arrives in the request data as the html numeric entity code, as that
>> is the only way it can be transmitted.
>>
>> This causes issues:
>>
>> As I always htmlencode user entered data before display, it means that it
>> gets encoded twice. I'll have to add the 'disable double encode' flag
>> thoughout my code :-)
>
> Sure. One way or another you need to get the right encoding. This
> method is not perfect since a user typing &#304; into a form may not
> expect a dotted I to come out.
>
> The best method is (probably) to:
> (a) Give UTF-8 as the form's accept-charset.
> (b) Encode htmlentities giving UTF-8 as the encoding. This should leave
> the UTF-8 characters as UTF-8.
> (c) Use mb_convert_encoding($etext, "HTML-ENTITIES", "UTF-8") to make
> the string displayable in a page regardless of the page's character
> encoding.
>
>> Secondly, it will be added to the database as the entity code, so this
>> will break searching the database etc...
>
> If you take the approach of accepting UTF-8 from the form, you can put
> that directly into the database.
>
>> I think the proper fix would would be to convert to UTF-8.
>> But thats a lot of work. For now, I think I'll just manually translit the
>> codes that cause issues.
>
> You really only need UTF-8 in the database. The page encoding is not
> that important.
>

I beg to differ. Page encoding is important if you want the correct
characters displayed.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: ORMs comparisons/complaints.
Next Topic: thank you, richard@noreply
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu Sep 19 21:30:14 GMT 2024

Total time taken to generate the page: 0.03702 seconds