Re: Processing accented characters submitted from forms [message #184502 is a reply to message #184498] |
Fri, 03 January 2014 15:11 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma:
|
Senior Member |
|
|
On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:
>
>> On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:
>>
>>> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>>>> We're already using iso-8859-1 for the whole website. It will be a lot
>>>> of work to change all that, so I guess we'll have to put up with the
>>>> odd Turkish I causing problems.
>>>
>>> It's not clear (to me at least) what's happening to the data, but as far
>>> as any normal set of HTML pages are concerned (PHP generated or
>>> otherwise) you don't have to put up with a dotted I causing problems on
>>> an ISO-8859-1 encoded page. You can represent any Unicode character in
>>> a page using character entities (browser and font support is always and
>>> issue but not nowadays for anything as ordinary as İ).
>>
>> I think it must be the browser that is encoding the character because İ
>> is not supported by iso-8859-1.
>
> Note that the browser behaviour can be altered by form attributes
> (specifically accept-charset). You can have a form that accepts UTF-8
> on an ISO-8859-1 served page.
>
>> It arrives in the request data as the html numeric entity code, as that
>> is the only way it can be transmitted.
>>
>> This causes issues:
>>
>> As I always htmlencode user entered data before display, it means that it
>> gets encoded twice. I'll have to add the 'disable double encode' flag
>> thoughout my code :-)
>
> Sure. One way or another you need to get the right encoding. This
> method is not perfect since a user typing İ into a form may not
> expect a dotted I to come out.
>
> The best method is (probably) to:
> (a) Give UTF-8 as the form's accept-charset.
> (b) Encode htmlentities giving UTF-8 as the encoding. This should leave
> the UTF-8 characters as UTF-8.
> (c) Use mb_convert_encoding($etext, "HTML-ENTITIES", "UTF-8") to make
> the string displayable in a page regardless of the page's character
> encoding.
>
>> Secondly, it will be added to the database as the entity code, so this
>> will break searching the database etc...
>
> If you take the approach of accepting UTF-8 from the form, you can put
> that directly into the database.
>
>> I think the proper fix would would be to convert to UTF-8.
>> But thats a lot of work. For now, I think I'll just manually translit the
>> codes that cause issues.
>
> You really only need UTF-8 in the database. The page encoding is not
> that important.
>
I beg to differ. Page encoding is important if you want the correct
characters displayed.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|