FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Plugins and Code Hacks » Codes to judge Simplified Chinese characters(GB2312)
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Codes to judge Simplified Chinese characters(GB2312) [message #7961] Fri, 03 January 2003 08:01 Go to next message
swinemania is currently offline  swinemania   China
Messages: 2
Registered: January 2003
Location: Beijing
Karma: 0
Junior Member
Hi, all!
I wrote some php codes to check if there is any simplified chinese exists in a string. It works fine with GB2312 code-set.

I hope it will do some use to FUDforum.


/**
* This method is to check if a string contains Simplified Chinese (GB2312).
*
* @param $string
* @return boolean returns true, if it has GB2312 chars.
*
*/
function hasChnGBWord($string){

if (!($len=strlen($string))) return false;

for($i=0;$i<$len;$i++) {
if (isChnGBWord($string[$i],$string[$i+1])) return true;
}

return false;
}


/**
* This method is to check if a word is a Simplified Chinese (GB2312).
*
* Each GB2312 character contains two bytes, the higher byte and the lower byte.
*
* @param $higherByte
* @param $lowerByte
* @return boolean returns true, if it is a GB2312 word
*
*/
function isChnGBWord($higherByte,$lowerByte){
$higher=char2u_int($higherByte);
$lower=char2u_int($lowerByte);

$val=($higher & 0x80)?(($lower & 0x80)?1:-1):0; //If -1==$val, then error occurs! It is a half Chinese character.

return (1==$val);
}

/**
* This method is to convert a character into unsigned int.
*
* @param $char the input character
* @return int 0 if error occurs
*
*/
function char2u_int($char){
$str=bin2hex($char);

$val=0;
for($i=0;$i<2;$i++){
$c=ord($str[$i]);

if ($c>=48 && $c<=57) $c-=48; //between '0' and '9'
elseif($c>=65 && $c<=70) $c-=55; //between 'A' and 'F'
elseif($c>=97 && $c<=102) $c-=87; //between 'a' and 'f'
else $c=0; //error! use ascii char as default

if (0==$i) $c*=16; //high bit

$val+=$c;
}

return $val;
}



Re: Codes to judge Simplified Chinese characters(GB2312) [message #7962 is a reply to message #7961] Fri, 03 January 2003 08:05 Go to previous messageGo to next message
swinemania is currently offline  swinemania   China
Messages: 2
Registered: January 2003
Location: Beijing
Karma: 0
Junior Member
Attached txt file is more friendly to view!
  • Attachment: codes.TXT
    (Size: 1.57KB, Downloaded 1169 times)
Re: Codes to judge Simplified Chinese characters(GB2312) [message #8237 is a reply to message #7961] Sat, 18 January 2003 14:13 Go to previous messageGo to next message
laser is currently offline  laser   China
Messages: 9
Registered: January 2003
Karma: 0
Junior Member
emm, I think it's better to use DB to do such job.
I'm using PostgreSQL, it has excellent encoding transform tools.
we can change the encoding between client (php cgi) and server (DB) using some simple query.
In my installation, I created my forum DB using UNICODE encoding.
then I added a global variable into GLOBAL.php:

$DBCLIENT_ENCODING = "GBK";

and then, add a few lines in index.php right after the connection has been made:

if(__dbtype__ == 'pgsql' && __db_connection_ok_ ){
$query = 'SET CLIENT_ENCODING TO '.'\''.$GLOBALS['DBCLIENT_ENCODING'].'\'';
pg_query($GLOBALS['__DB_INC__']['SQL_LINK'], $query);
}

now, we can use PostgreSQL's encoding conversion to support GBK/GB2312 encoding, and the DB still use UNICODE, if we change
the encoding to other scheme, we still can use the same DB, PostgreSQL would do the translation for us automatically.
Don't know much about mysql, but I think we could use some kind
of switch to utilize the DB's feature if it's possible.
Re: Codes to judge Simplified Chinese characters(GB2312) [message #23218 is a reply to message #8237] Wed, 09 March 2005 02:36 Go to previous messageGo to next message
linxiaoming is currently offline  linxiaoming   China
Messages: 6
Registered: March 2005
Karma: 0
Junior Member
1.modify apache->httpd.conf
#AddDefaultCharset ISO-8859-1
#AddDefaultCharset utf-8
AddDefaultCharset gb2312

2.modify postgresql->postgredql.conf
#client_encoding = sql_ascii # actually, defaults to database encoding
client_encoding = GBK

3.modify->Theme Management defalut
Name = default
Theme = default
Language = chinese
Locale = zh_CN


4.result
index.php?t=getfile&id=1854&private=0
  • Attachment: firstpage.PNG
    (Size: 12.82KB, Downloaded 2955 times)
³¹µ×¸ã¶¨¶àÓïÑÔ [message #23650 is a reply to message #23218] Fri, 25 March 2005 05:56 Go to previous message
linxiaoming is currently offline  linxiaoming   China
Messages: 6
Registered: March 2005
Karma: 0
Junior Member
ÎÒÓÐÁËÒ»¸ö¸üºÃµÄ°ì·¨£¬²»Òª£¬don't do so :$DBCLIENT_ENCODING = "GBK";

1.modify apache->httpd.conf
#AddDefaultCharset ISO-8859-1
AddDefaultCharset utf-8
#AddDefaultCharset gb2312

2.modify postgresql->postgrsql.conf
client_encoding = sql_ascii # actually, defaults to database encoding
#client_encoding = GBK

3.backup your theme->chinese ,copy to another directory

4.Theme Management -> add a theme
Name = chinatest
Theme = default
Language = chinese
Locale = C


5.change encoding
open the file "msg" ,save as "msg",bug change the encoding to utf-8
open the file charset,change zh_CN to C


6.registe a new user with theme chinatest

7.login us the new registed user, now you can see chinese char with the utf-8 encoding.

8.but I still don't understand the charset "C"


£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿
ÎÒ»¹ÊDz»ÖªµÀÕâ¸öCÊÇÔõô»ØÊ£¡
ÔÚµÚ4²½ÖУ¬°ÑC¸Ä³Éutf-8Ò²Äܹ¤×÷£¡
¶øÇÒÏÖÔÚÎÒÕýÔÚÊäÈëµÄÕâ¸öÒ³ÃæµÄ×Ö·û¼¯ÏÔʾÊÇgb2312£¬ÊÇÔõô»ØÊ°¡£¡
ÓôÃÆ°¢£¡
£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿

  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: russian translation fixes
Next Topic: pubcookie authentication
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat May 18 22:45:39 GMT 2024

Total time taken to generate the page: 0.03096 seconds