Re: windows-1251 (Cyr) & MySQL 4.1 [message #32441 is a reply to message #32333] |
Thu, 29 June 2006 07:51 |
aigumnov
Messages: 2 Registered: June 2006
Karma:
|
Junior Member |
|
|
Ilia wrote on Wed, 21 June 2006 19:12 | A single forum can use multiple languages while a table charset is for all data, so we have to use a neutral setting.
|
'Latin1' is not the neutral setting. Just a few notes to make it clear.
I think it works like follows:
1) By default, PHP is 'latin1' client. The PHP-app and MySQL server handshake this during mysql_connect. To note, later the 'handshaked' charset can be changed using 'SET NAMES' or 'SET CHARACTER SET'.
2) In this case, if the table's charset is 'latin1', the MySQL server compares 'client-charset-is-latin1'='table-charset-is-latin1' and turns off any text conversions on input. The effect of this trick is that any data records are stored without any transformation, 'as is'.
3) Then, while reading data, another php-script handshakes with MySQL using 'latin1' charset. Again, as client encoding matches server, no transformations 'on read' are made and 'Everything is fine, because FUDFORUM makes the right conversion while taking data from the database' (actually there is no conversion at all, everything is based on right webpage charset).
If you try to set another charset on client side using, for example, 'SET NAMES cp1251', the MySQL server decides to perform the conversion 'latin1'->'cp1251'. If your data in 'latin1' table are in 'cp1251', the conversion is actually lossy , and you'll get '?' and other garbled characters.
So, by default, phpMyAdmin can't display the data right, as recommended phpMyAdmin configuration sets the 'cp1251' charset.
To read any data stored in such tricky way, one must specify the 'client charset' matching 'table charset'. To read default FUDForum table structure using phpMyAdmin - just 'SET NAMES latin1' and it will read all data 'as is'.
This behaviour leads to more problems when somebody tries to manage the forum data outside the forum php application. For example, on russian Windows, the mysql commandline client uses cp866 charset (by default). The 'cp866'<->'cp1251' conversion is not lossy, however, if the data goes into 'latin1' table in 'cp866', and then such records came out as is (in 'cp866') on 'cp1251' webpage, making the changes unreadable.
Also, there are server configurations that may override this trick, and FUDForum will not work at all.
I don't know how will FUDForum work with Unicode (UTF-8) charsets in database, but if the one's forums are truly multilanguage, the UTF is the only option. All webpages must be Unicode, as it is impossible to have, for example, one message in Chinese and other in Russian simultaneously without Unicode webpages.
If the forums are single language (not 'latin1'), I personally think its worth to make some changes in server setup, and/or FUDForum configuration and source code. Just to make table structure more manageable and to match other data in database. The decision must be made by forum's owner/administrator.
[Updated on: Thu, 29 June 2006 08:52] Report message to a moderator
|
|
|