Storing multiple character set types (or a representation of em) in a table column [message #172691] |
Fri, 25 February 2011 13:27 |
bizt
Messages: 2 Registered: February 2011
Karma: 0
|
Junior Member |
|
|
Hi,
I'm currently building an online tool designed to help users learn a
foreign language. It's aim is to allow the user to enter their own
vocabulary and they can quiz themselves. This involved entrering their
native word and then the foriend language equivelent. Kind of a
flashcard system but restricted to the words that the user has
previously entered - perhaps specific to the textbook they're
currently learning.
One of the things I want to allow is to support multiple character
sets though (e.g. alphanumeric, japanese, chinese...). I don't want
the system to be specific to a single character set (I'm currently
learning Japanese and want to use the system myself - so want to be
able to support english and japanese charcter sets). when designing my
database table columns I have to specify which character set the text
is though as I don't know any better. I understand why this is but
this is the first time I've ever wanted to have a single text field
support multiple character sets - or at least some representation of
them so I can turn them back to the original format when required.
What's the best way to do this? One idea is some way that I can store
the text in standard abc123... characters but specify in another field
what character set it is. So, when I am inserting my script will
detect which charcter set it is, take a note, encode to a abc123...
representation of it and then do the INSERT. Along side the abc123..
entry in another field I'll state the original character set so when i
need the data I'll decode the abc123... representation to it's
original form.
Anyway I've posted this in the PHP forum as with this technique above
it doesn't require me to do anything different to the database but
instead handle the encoding and decoding in the php script .. if
that's the best technique? How might this done in PHP? If anyone has
any suggestions I'd really appreciate it if you could reply.
Thanks
Burnsy
|
|
|
Re: Storing multiple character set types (or a representation of em) in a table column [message #172692 is a reply to message #172691] |
Fri, 25 February 2011 13:38 |
alvaro.NOSPAMTHANX
Messages: 277 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
El 25/02/2011 14:27, bizt escribió/wrote:
[...]
> One of the things I want to allow is to support multiple character
> sets though (e.g. alphanumeric, japanese, chinese...). I don't want
> the system to be specific to a single character set (I'm currently
> learning Japanese and want to use the system myself - so want to be
> able to support english and japanese charcter sets).
I think you are confusing alphabets with computer character sets. You
need to support many alphabets but I don't think you want to deal with
more than one encoding; it'd be crazy and unnecessary. Just pick a
Unicode encoding (UTF-8 is a popular option but it's not the only one)
and make your life easier.
> when designing my database table columns I have to specify which
> character set the text is though as I don't know any better.
Not all database engines handle encodings the same way. E.g., in Oracle
you must use the same encoding all around your app.
> I understand why this is but this is the first time I've ever wanted
> to have a single text field support multiple character sets - or at
> least some representation of them so I can turn them back to the
> original format when required.
If you are talking about storing different encodings *in the same
column*, well, it can be done (you just need to store it as binary data)
but you won't be able to use any of the text handling features of your
DB engine. For instance, a search for "foo" will never find "Foo".
> What's the best way to do this? One idea is some way that I can store
> the text in standard abc123... characters but specify in another field
> what character set it is. So, when I am inserting my script will
> detect which charcter set it is, take a note, encode to a abc123...
> representation of it and then do the INSERT. Along side the abc123..
> entry in another field I'll state the original character set so when i
> need the data I'll decode the abc123... representation to it's
> original form.
>
> Anyway I've posted this in the PHP forum as with this technique above
> it doesn't require me to do anything different to the database but
> instead handle the encoding and decoding in the php script .. if
> that's the best technique? How might this done in PHP? If anyone has
> any suggestions I'd really appreciate it if you could reply.
I suggest you read the famous "The Absolute Minimum Every Software
Developer Absolutely, Positively Must Know About Unicode and Character
Sets (No Excuses!)" article:
http://www.joelonsoftware.com/articles/Unicode.html
Then give a second thought to your specs.
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
|
|
|
|