FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Processing accented characters submitted from forms
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
Processing accented characters submitted from forms [message #184469] Thu, 02 January 2014 10:55 Go to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
One of the websites that I am working on is getting a lot of interest
from countries that make a lot of use of accented characters.

Usually accented characters come through fine.

However, some are replaced by the character codes. e.g. İ
This seems to be occurring for some Turkish and Romanian characters.

Is PHP doing this ?
ie: is it something I can fix with settings ?

I need to send the data in non-html emails, so I need the original
characters.

html_entity_decode() doesn't seem to work for me.

How do people usually handle this ?

Thanks
JohnT
Re: Processing accented characters submitted from forms [message #184471 is a reply to message #184469] Thu, 02 January 2014 12:15 Go to previous messageGo to next message
J.O. Aho is currently offline  J.O. Aho
Messages: 194
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 02/01/14 16:55, JohnT wrote:
>
> One of the websites that I am working on is getting a lot of interest
> from countries that make a lot of use of accented characters.
>
> Usually accented characters come through fine.
>
> However, some are replaced by the character codes. e.g. İ
> This seems to be occurring for some Turkish and Romanian characters.
>
> Is PHP doing this ?

It's not doing it by magic, something in your code does it.

> ie: is it something I can fix with settings ?

You decode it, see
http://www.php.net/manual/en/function.htmlspecialchars-decode.php

You need to have a charset which supports all of the characters, like UTF-8.

> I need to send the data in non-html emails, so I need the original
> characters.
>
> html_entity_decode() doesn't seem to work for me.
>
> How do people usually handle this ?

You see to that everything is stored in UTF-8 from the beginning and do
not encode strings.

--

//Aho
Re: Processing accented characters submitted from forms [message #184474 is a reply to message #184469] Thu, 02 January 2014 14:52 Go to previous messageGo to next message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
JohnT wrote:

> One of the websites that I am working on is getting a lot of interest
> from countries that make a lot of use of accented characters.
>
> Usually accented characters come through fine.
>
> However, some are replaced by the character codes. e.g. İ
> This seems to be occurring for some Turkish and Romanian characters.
>
> Is PHP doing this ?

It seems to me that this might already been done by the browser.
Unfortunately, I was not able to find any normative reference, so out of
curiosity I set up a simple form with accept-charset=ISO-8859-1, and
entered İ (U+0130) in the contained textarea. Firefox 26 and Chrome
31.0 send it as HTML entity İ, IE 11, however, sends it as Ý (U+00DD).

A solution is to use UTF-8 encoding, as J.O. already mentioned.

--
Christoph M. Becker
Re: Processing accented characters submitted from forms [message #184485 is a reply to message #184471] Fri, 03 January 2014 06:37 Go to previous messageGo to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On Thu, 02 Jan 2014 18:15:50 +0100, J.O. Aho wrote:

> On 02/01/14 16:55, JohnT wrote:
>>
>> One of the websites that I am working on is getting a lot of interest
>> from countries that make a lot of use of accented characters.
>>
>> Usually accented characters come through fine.
>>
>> However, some are replaced by the character codes. e.g. İ This
>> seems to be occurring for some Turkish and Romanian characters.
>>

>
> You need to have a charset which supports all of the characters, like
> UTF-8.

Currently using ISO-8859-1.
It all displays OK, so I assumed that the characters are supported, but
it appears the browser must be doing some magic to display them as
further investigation shows that those characters are not supported by
the character set.

>>
>> How do people usually handle this ?
>
> You see to that everything is stored in UTF-8 from the beginning and do
> not encode strings.

We're already using iso-8859-1 for the whole website.
It will be a lot of work to change all that, so I guess we'll have to put
up with the odd Turkish I causing problems.


Thanks
JohnT
Re: Processing accented characters submitted from forms [message #184486 is a reply to message #184474] Fri, 03 January 2014 06:40 Go to previous messageGo to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On Thu, 02 Jan 2014 20:52:37 +0100, Christoph Michael Becker wrote:

> JohnT wrote:
>
>> One of the websites that I am working on is getting a lot of interest
>> from countries that make a lot of use of accented characters.
>>
>> Usually accented characters come through fine.
>>
>> However, some are replaced by the character codes. e.g. İ This
>> seems to be occurring for some Turkish and Romanian characters.
>>
>> Is PHP doing this ?
>
> It seems to me that this might already been done by the browser.
> Unfortunately, I was not able to find any normative reference, so out of
> curiosity I set up a simple form with accept-charset=ISO-8859-1, and
> entered İ (U+0130) in the contained textarea. Firefox 26 and Chrome
> 31.0 send it as HTML entity İ, IE 11, however, sends it as Ý
> (U+00DD).
>
> A solution is to use UTF-8 encoding, as J.O. already mentioned.

Changing to UTF-8 is not an option, but I had already decided to use UTF-8
for any future websites as that is the PHP 5 default, and makes the
programming a lot easier.

Regards
JohnT
Re: Processing accented characters submitted from forms [message #184487 is a reply to message #184485] Fri, 03 January 2014 07:37 Go to previous messageGo to next message
Ben Bacarisse is currently offline  Ben Bacarisse
Messages: 82
Registered: November 2013
Karma: 0
Member
add to buddy list
ignore all messages by this user
JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:
<snip>
> We're already using iso-8859-1 for the whole website.
> It will be a lot of work to change all that, so I guess we'll have to put
> up with the odd Turkish I causing problems.

It's not clear (to me at least) what's happening to the data, but as far
as any normal set of HTML pages are concerned (PHP generated or
otherwise) you don't have to put up with a dotted I causing problems on
an ISO-8859-1 encoded page. You can represent any Unicode character in
a page using character entities (browser and font support is always and
issue but not nowadays for anything as ordinary as İ).

Can you make a cut-down page that exhibits the problem? Can you provide
a URL? Can you at least describe the path the data takes and what
happens to it?

--
Ben.
Re: Processing accented characters submitted from forms [message #184488 is a reply to message #184487] Fri, 03 January 2014 07:52 Go to previous messageGo to next message
Tim Streater is currently offline  Tim Streater
Messages: 328
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
In article
<0(dot)cb1d1865c94178664953(dot)20140103123727GMT(dot)87mwjdz1fc(dot)fsf(at)bsb(dot)me(dot)uk>,
Ben Bacarisse <ben(dot)usenet(at)bsb(dot)me(dot)uk> wrote:

> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:
> <snip>
>> We're already using iso-8859-1 for the whole website.
>> It will be a lot of work to change all that, so I guess we'll have to put
>> up with the odd Turkish I causing problems.
>
> It's not clear (to me at least) what's happening to the data, but as far
> as any normal set of HTML pages are concerned (PHP generated or
> otherwise) you don't have to put up with a dotted I causing problems on
> an ISO-8859-1 encoded page. You can represent any Unicode character in
> a page using character entities (browser and font support is always and
> issue but not nowadays for anything as ordinary as ?).
>
> Can you make a cut-down page that exhibits the problem? Can you provide
> a URL? Can you at least describe the path the data takes and what
> happens to it?

Wasn't this to do with user-entered data? Why not make *those* pages
UTF-8?

--
"The idea that Bill Gates has appeared like a knight in shining armour to
lead all customers out of a mire of technological chaos neatly ignores
the fact that it was he who, by peddling second-rate technology, led them
into it in the first place." - Douglas Adams
Re: Processing accented characters submitted from forms [message #184489 is a reply to message #184486] Fri, 03 January 2014 07:53 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
JohnT wrote:

> On Thu, 02 Jan 2014 20:52:37 +0100, Christoph Michael Becker wrote:
>> JohnT wrote:
>>> One of the websites that I am working on is getting a lot of interest
>>> from countries that make a lot of use of accented characters.
>>>
>>> Usually accented characters come through fine.
>>>
>>> However, some are replaced by the character codes. e.g. &#304; This
>>> seems to be occurring for some Turkish and Romanian characters.
>>>
>>> Is PHP doing this ?
>>
>> It seems to me that this might already been done by the browser.
>> Unfortunately, I was not able to find any normative reference, so out of
>> curiosity I set up a simple form with accept-charset=ISO-8859-1, and
>> entered İ (U+0130) in the contained textarea. Firefox 26 and Chrome
>> 31.0 send it as HTML entity &#304;, IE 11, however, sends it as Ý
>> (U+00DD).
>>
>> A solution is to use UTF-8 encoding, as J.O. already mentioned.
>
> Changing to UTF-8 is not an option,

Why not?

> but I had already decided to use UTF-8 for any future websites

Good idea.

> as that is the PHP 5 default,

How did you get this idea?

> and makes the programming a lot easier.

If only it were so. PHP 5 still is oblivious as to character encoding.


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)
Re: Processing accented characters submitted from forms [message #184490 is a reply to message #184487] Fri, 03 January 2014 07:53 Go to previous messageGo to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:

> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>> We're already using iso-8859-1 for the whole website. It will be a lot
>> of work to change all that, so I guess we'll have to put up with the
>> odd Turkish I causing problems.
>
> It's not clear (to me at least) what's happening to the data, but as far
> as any normal set of HTML pages are concerned (PHP generated or
> otherwise) you don't have to put up with a dotted I causing problems on
> an ISO-8859-1 encoded page. You can represent any Unicode character in
> a page using character entities (browser and font support is always and
> issue but not nowadays for anything as ordinary as İ).

I think it must be the browser that is encoding the character because İ
is not supported by iso-8859-1.

It arrives in the request data as the html numeric entity code, as that
is the only way it can be transmitted.

This causes issues:

As I always htmlencode user entered data before display, it means that it
gets encoded twice. I'll have to add the 'disable double encode' flag
thoughout my code :-)

Secondly, it will be added to the database as the entity code, so this
will break searching the database etc...

I think the proper fix would would be to convert to UTF-8.
But thats a lot of work. For now, I think I'll just manually translit the
codes that cause issues.


JohnT
Re: Processing accented characters submitted from forms [message #184495 is a reply to message #184488] Fri, 03 January 2014 08:00 Go to previous messageGo to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On Fri, 03 Jan 2014 12:52:50 +0000, Tim Streater wrote:

>
> Wasn't this to do with user-entered data? Why not make *those* pages
> UTF-8?

It is not as simple as that as all the pages are created on-the-fly using
shared code.

The easiest solution is to catch the user data before it gets used, and
translit it.

Regards
JohnT
Re: Processing accented characters submitted from forms [message #184496 is a reply to message #184489] Fri, 03 January 2014 08:05 Go to previous messageGo to next message
JohnT is currently offline  JohnT
Messages: 16
Registered: April 2011
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On Fri, 03 Jan 2014 13:53:04 +0100, Thomas 'PointedEars' Lahn wrote:

> JohnT wrote:
>

>>
>> Changing to UTF-8 is not an option,
>
> Why not?

It's a big site.
It would take too much work to rebuild it all.


>
>> as that is the PHP 5 default,
>
> How did you get this idea?

http://uk1.php.net/manual/en/function.htmlentities.php

says:
Like htmlspecialchars(), htmlentities() takes an optional third
argument encoding which defines encoding used in conversion. If omitted,
the default value for this argument is ISO-8859-1 in versions of PHP
prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.


>
>> and makes the programming a lot easier.
>
> If only it were so. PHP 5 still is oblivious as to character encoding.

http://uk1.php.net/manual/en/book.iconv.php
Re: Processing accented characters submitted from forms [message #184498 is a reply to message #184490] Fri, 03 January 2014 09:30 Go to previous messageGo to next message
Ben Bacarisse is currently offline  Ben Bacarisse
Messages: 82
Registered: November 2013
Karma: 0
Member
add to buddy list
ignore all messages by this user
JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:

> On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:
>
>> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>>> We're already using iso-8859-1 for the whole website. It will be a lot
>>> of work to change all that, so I guess we'll have to put up with the
>>> odd Turkish I causing problems.
>>
>> It's not clear (to me at least) what's happening to the data, but as far
>> as any normal set of HTML pages are concerned (PHP generated or
>> otherwise) you don't have to put up with a dotted I causing problems on
>> an ISO-8859-1 encoded page. You can represent any Unicode character in
>> a page using character entities (browser and font support is always and
>> issue but not nowadays for anything as ordinary as İ).
>
> I think it must be the browser that is encoding the character because İ
> is not supported by iso-8859-1.

Note that the browser behaviour can be altered by form attributes
(specifically accept-charset). You can have a form that accepts UTF-8
on an ISO-8859-1 served page.

> It arrives in the request data as the html numeric entity code, as that
> is the only way it can be transmitted.
>
> This causes issues:
>
> As I always htmlencode user entered data before display, it means that it
> gets encoded twice. I'll have to add the 'disable double encode' flag
> thoughout my code :-)

Sure. One way or another you need to get the right encoding. This
method is not perfect since a user typing &#304; into a form may not
expect a dotted I to come out.

The best method is (probably) to:
(a) Give UTF-8 as the form's accept-charset.
(b) Encode htmlentities giving UTF-8 as the encoding. This should leave
the UTF-8 characters as UTF-8.
(c) Use mb_convert_encoding($etext, "HTML-ENTITIES", "UTF-8") to make
the string displayable in a page regardless of the page's character
encoding.

> Secondly, it will be added to the database as the entity code, so this
> will break searching the database etc...

If you take the approach of accepting UTF-8 from the form, you can put
that directly into the database.

> I think the proper fix would would be to convert to UTF-8.
> But thats a lot of work. For now, I think I'll just manually translit the
> codes that cause issues.

You really only need UTF-8 in the database. The page encoding is not
that important.

--
Ben.
Re: Processing accented characters submitted from forms [message #184500 is a reply to message #184496] Fri, 03 January 2014 10:03 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
JohnT wrote:
^^^^^
Please fix.

> On Fri, 03 Jan 2014 13:53:04 +0100, Thomas 'PointedEars' Lahn wrote:
>> JohnT wrote:
>>> Changing to UTF-8 is not an option,
>> Why not?
>
> It's a big site.
> It would take too much work to rebuild it all.

Looks like an inherent design flaw to me. It is rather easy to switch a
properly developed site to UTF-8. BTDT.

>>> as that is the PHP 5 default,
>>
>> How did you get this idea?
>
> http://uk1.php.net/manual/en/function.htmlentities.php
>
> says:
> Like htmlspecialchars(), htmlentities() takes an optional third
> argument encoding which defines encoding used in conversion. If omitted,
> the default value for this argument is ISO-8859-1 in versions of PHP
> prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
>
>>> and makes the programming a lot easier.
>> If only it were so. PHP 5 still is oblivious as to character encoding.
>
> http://uk1.php.net/manual/en/book.iconv.php

That is interesting (I did not know about the new htmlentities() default),
but it does not refute my argument. First, there have been versions of
PHP 5 *before* 5.4.0. Second, so far you have to *tell* PHP 5 what encoding
you use; there is no automatism or assumed default encoding for source code
(as in some other recent programming languages) – *only* in the PHP 5.*4*
case *with* htmlentities() the default suffices. (Such an automatism is
considered for PHP *6*.)

That said, htmlentities() is insufficient to represent arbitrary Unicode
characters, encoded with UTF-8 server-side, in an HTML document if the
document encoding is not UTF-8; you would have to use htmlspecialchars()
which has the same default parameter value since PHP 5.4.0.

<http://php.net/htmlspecialchars>


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
Re: Processing accented characters submitted from forms [message #184501 is a reply to message #184495] Fri, 03 January 2014 10:09 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 1/3/2014 8:00 AM, JohnT wrote:
> On Fri, 03 Jan 2014 12:52:50 +0000, Tim Streater wrote:
>
>>
>> Wasn't this to do with user-entered data? Why not make *those* pages
>> UTF-8?
>
> It is not as simple as that as all the pages are created on-the-fly using
> shared code.
>
> The easiest solution is to catch the user data before it gets used, and
> translit it.
>
> Regards
> JohnT
>

If all of the pages are created by the same code, it should be a
relatively simple matter to change all of the pages.

Isn't that the major reason for having common code?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: Processing accented characters submitted from forms [message #184502 is a reply to message #184498] Fri, 03 January 2014 10:11 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:
>
>> On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:
>>
>>> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>>>> We're already using iso-8859-1 for the whole website. It will be a lot
>>>> of work to change all that, so I guess we'll have to put up with the
>>>> odd Turkish I causing problems.
>>>
>>> It's not clear (to me at least) what's happening to the data, but as far
>>> as any normal set of HTML pages are concerned (PHP generated or
>>> otherwise) you don't have to put up with a dotted I causing problems on
>>> an ISO-8859-1 encoded page. You can represent any Unicode character in
>>> a page using character entities (browser and font support is always and
>>> issue but not nowadays for anything as ordinary as İ).
>>
>> I think it must be the browser that is encoding the character because İ
>> is not supported by iso-8859-1.
>
> Note that the browser behaviour can be altered by form attributes
> (specifically accept-charset). You can have a form that accepts UTF-8
> on an ISO-8859-1 served page.
>
>> It arrives in the request data as the html numeric entity code, as that
>> is the only way it can be transmitted.
>>
>> This causes issues:
>>
>> As I always htmlencode user entered data before display, it means that it
>> gets encoded twice. I'll have to add the 'disable double encode' flag
>> thoughout my code :-)
>
> Sure. One way or another you need to get the right encoding. This
> method is not perfect since a user typing &#304; into a form may not
> expect a dotted I to come out.
>
> The best method is (probably) to:
> (a) Give UTF-8 as the form's accept-charset.
> (b) Encode htmlentities giving UTF-8 as the encoding. This should leave
> the UTF-8 characters as UTF-8.
> (c) Use mb_convert_encoding($etext, "HTML-ENTITIES", "UTF-8") to make
> the string displayable in a page regardless of the page's character
> encoding.
>
>> Secondly, it will be added to the database as the entity code, so this
>> will break searching the database etc...
>
> If you take the approach of accepting UTF-8 from the form, you can put
> that directly into the database.
>
>> I think the proper fix would would be to convert to UTF-8.
>> But thats a lot of work. For now, I think I'll just manually translit the
>> codes that cause issues.
>
> You really only need UTF-8 in the database. The page encoding is not
> that important.
>

I beg to differ. Page encoding is important if you want the correct
characters displayed.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: Processing accented characters submitted from forms [message #184503 is a reply to message #184500] Fri, 03 January 2014 11:08 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Thomas 'Ingrid' Lahn wrote:

> JohnT wrote:
>> On Fri, 03 Jan 2014 13:53:04 +0100, Thomas 'PointedEars' Lahn wrote:
>>> JohnT wrote:
>>>> [UTF-8] is the PHP 5 default,
>>>
>>> How did you get this idea?
>>
>> http://uk1.php.net/manual/en/function.htmlentities.php
>>
>> says:
>> Like htmlspecialchars(), htmlentities() takes an optional third
>> argument encoding which defines encoding used in conversion. If omitted,
>> the default value for this argument is ISO-8859-1 in versions of PHP
>> prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
>
> […]
> That said, htmlentities() is insufficient to represent arbitrary Unicode
> characters, encoded with UTF-8 server-side, in an HTML document if the
> document encoding is not UTF-8; you would have to use htmlspecialchars()
> which has the same default parameter value since PHP 5.4.0.
>
> <http://php.net/htmlspecialchars>

Actually, it is worse. In such a document, to refer to even those Unicode
characters for which there is *not* a character entity reference in HTML,
you have to use mb_encode_numericentity():

$ php -r 'echo mb_encode_numericentity("∎", array(0x0, 0x10000, 0, 0xfffff),
"UTF-8") . PHP_EOL;'
&#8718;

$ locale
LANG=de_CH.UTF-8
LANGUAGE=
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=

----------

<http://php.net/mb_encode_numericentity>

None of this is necessary if you use UTF-8 throughout.


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
Re: Processing accented characters submitted from forms [message #184505 is a reply to message #184495] Fri, 03 January 2014 12:57 Go to previous messageGo to next message
J.O. Aho is currently offline  J.O. Aho
Messages: 194
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 03/01/14 14:00, JohnT wrote:
> On Fri, 03 Jan 2014 12:52:50 +0000, Tim Streater wrote:
>
>>
>> Wasn't this to do with user-entered data? Why not make *those* pages
>> UTF-8?
>
> It is not as simple as that as all the pages are created on-the-fly using
> shared code.
>
> The easiest solution is to catch the user data before it gets used, and
> translit it.

You have three options

1. Switch to UTF-8 for all sites, you will then future prove your sites

2. Switch to ISO-8859-9 and get issues when you have to add French

3. Just live with that you can't support any other languages than those
supported with ISO-8859-1

You can't catch the data before you get it, so you will be stuck with
what you got.

--

//Aho
Re: Processing accented characters submitted from forms [message #184506 is a reply to message #184502] Fri, 03 January 2014 15:28 Go to previous messageGo to next message
Ben Bacarisse is currently offline  Ben Bacarisse
Messages: 82
Registered: November 2013
Karma: 0
Member
add to buddy list
ignore all messages by this user
Jerry Stuckle <jstucklex(at)attglobal(dot)net> writes:
> On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
<snip>
>> You really only need UTF-8 in the database. The page encoding is not
>> that important.
>
> I beg to differ. Page encoding is important if you want the correct
> characters displayed.

It's important, but not *that* important. The OP says that changing it
is not an option, so I gave an example of how one can finesse the page
encoding entirely -- by converting data you take from the data base to
ASCII.

--
Ben.
Re: Processing accented characters submitted from forms [message #184509 is a reply to message #184506] Fri, 03 January 2014 16:54 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Ben Bacarisse wrote:

> Jerry Stuckle <jstucklex(at)attglobal(dot)net> writes:
>> On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
> <snip>
>>> You really only need UTF-8 in the database. The page encoding is not
>>> that important.
>>
>> I beg to differ. Page encoding is important if you want the correct
>> characters displayed.
>
> It's important, but not *that* important. The OP says that changing it
> is not an option, so I gave an example of how one can finesse the page
> encoding entirely -- by converting data you take from the data base to
> ASCII.

You can do that – at the risk of increasing the server-side runtime and
memory usage of the PHP program, the size of the output, the loading and
rendering time, and memory usage of the document client-side, considerably,
to no substantial advantage. It is the *World Wide* Web, and for that
reason alone UTF-8 support is ubiquitous there since years.


PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Re: Processing accented characters submitted from forms [message #184510 is a reply to message #184469] Fri, 03 January 2014 16:58 Go to previous messageGo to next message
Tim Streater is currently offline  Tim Streater
Messages: 328
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
In article <2539330(dot)QTdvZdVxQr(at)PointedEars(dot)de>, Thomas 'PointedEars'
Lahn <PointedEars(at)web(dot)de> wrote:

> reason alone UTF-8 support is ubiquitous there since years.

.... UTF-8 support has been ubiquitous there for years.

--
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
Re: Processing accented characters submitted from forms [message #184511 is a reply to message #184469] Fri, 03 January 2014 17:01 Go to previous messageGo to next message
Ben Bacarisse is currently offline  Ben Bacarisse
Messages: 82
Registered: November 2013
Karma: 0
Member
add to buddy list
ignore all messages by this user
Thomas 'PointedEars' Lahn <PointedEars(at)web(dot)de> writes:

> Ben Bacarisse wrote:
>
>> Jerry Stuckle <jstucklex(at)attglobal(dot)net> writes:
>>> On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
>> <snip>
>>>> You really only need UTF-8 in the database. The page encoding is not
>>>> that important.
>>>
>>> I beg to differ. Page encoding is important if you want the correct
>>> characters displayed.
>>
>> It's important, but not *that* important. The OP says that changing it
>> is not an option, so I gave an example of how one can finesse the page
>> encoding entirely -- by converting data you take from the data base to
>> ASCII.
>
> You can do that – at the risk of increasing the server-side runtime and
> memory usage of the PHP program, the size of the output, the loading and
> rendering speed, and memory usage of the document client-side, considerably,
> to no substantial advantage. It is the *World Wide* Web, and for that
> reason alone UTF-8 support is ubiquitous there since years.

Yes, it's very far from ideal in general, but the OP seems resolved to
switch to UTF-8 pages in the long run so a temporary solution might be
acceptable in this case. For one thing, switching the database to UTF-8
will be needed for the long-term solution, so some of the temporary fix
won't be wasted work.

--
Ben.
Re: Processing accented characters submitted from forms [message #184512 is a reply to message #184506] Fri, 03 January 2014 19:59 Go to previous message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 1/3/2014 3:28 PM, Ben Bacarisse wrote:
> Jerry Stuckle <jstucklex(at)attglobal(dot)net> writes:
>> On 1/3/2014 9:30 AM, Ben Bacarisse wrote:
> <snip>
>>> You really only need UTF-8 in the database. The page encoding is not
>>> that important.
>>
>> I beg to differ. Page encoding is important if you want the correct
>> characters displayed.
>
> It's important, but not *that* important. The OP says that changing it
> is not an option, so I gave an example of how one can finesse the page
> encoding entirely -- by converting data you take from the data base to
> ASCII.
>

No, it's not important if you don't care if characters are displayed
correctly, or if characters are send in from the client correctly. Just
storing in the database as UTF-8 and converting to/from ASCII will not
solve these problems.

But if you DO care about such mundane things, then you need to be using
an encoding which supports those characters.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: ORMs comparisons/complaints.
Next Topic: thank you, richard@noreply
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Fri Oct 20 06:52:35 EDT 2017

Total time taken to generate the page: 0.01021 seconds