FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Bug Reports » about chinese WordSegment
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
about chinese WordSegment [message #27158] Mon, 29 August 2005 09:58 Go to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
I post a topic
Quote:


PHP开发小组宣布了PHP 4.4.0版本的发布,这是一个专注bug修复的发行版,没有什么新的功能添加到其中。主要是解决了一个严重的PHP相关引用导致的内存崩溃问题。假如引用 出错,PHP将创建一个内存崩溃事件,并且是可以被任意访问的。版本号中间一位的修改,主要是因为这个bug修复涉及到了PHP内部API函数的改动。


Then
select * from fud26_search;

+-----+----------------------+
| id | word |
+-----+----------------------+
| 328 | PHP开发小组宣布了PHP |
| 329 | 版本的发布 |
+-----+----------------------+


those word in the search table make no sense.
would you improve chinese wordsegment.
Re: about chinese WordSegment [message #27162 is a reply to message #27158] Mon, 29 August 2005 13:36 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
What version of the forum are you using?

FUDforum Core Developer
Re: about chinese WordSegment [message #27223 is a reply to message #27158] Tue, 30 August 2005 14:45 Go to previous messageGo to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
FUDforum 2.7.1

Re: about chinese WordSegment [message #27224 is a reply to message #27223] Tue, 30 August 2005 15:23 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Do you have any of the following extensions installed:
mbstring, iconv, recode?


FUDforum Core Developer
Re: about chinese WordSegment [message #27243 is a reply to message #27158] Wed, 31 August 2005 08:59 Go to previous messageGo to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
Quote:


mbstring
Multibyte Support enabled
Japanese support enabled
Simplified chinese support enabled
Traditional chinese support enabled
Korean support enabled
Russian support enabled
Multibyte (japanese) regex support enabled



I have mbstring extensions,above is part of phpinfo .

And the forum default theme is chinese.

[Updated on: Sat, 03 September 2005 01:05]

Report message to a moderator

Re: about chinese WordSegment [message #27244 is a reply to message #27158] Wed, 31 August 2005 09:26 Go to previous messageGo to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
I find a Class: MP Chinese Word Segmentation
http://www.phpn.org/?action=jump&ID_ITEM=19664&ID_LINK=17047

maybe it's useful.
Re: about chinese WordSegment [message #27262 is a reply to message #27244] Thu, 01 September 2005 18:47 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
What character set is your forum using?

FUDforum Core Developer
Re: about chinese WordSegment [message #27288 is a reply to message #27262] Sat, 03 September 2005 01:07 Go to previous messageGo to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
Ilia wrote on Thu, 01 September 2005 14:47

What character set is your forum using?

GB2312
Re: about chinese WordSegment [message #27301 is a reply to message #27288] Sat, 03 September 2005 19:16 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
http://cvs.prohost.org/c/index.cgi/FUDforum/chngview?cn=7293

Apply this patch and rebuild the theme. Then post a few messages and see if they are indexed any better then they were previously.


FUDforum Core Developer
Re: about chinese WordSegment [message #27310 is a reply to message #27158] Sun, 04 September 2005 00:45 Go to previous messageGo to next message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
I had applied patch ,but there is no improve.

For instance,a chinese sentence "中文分词测试".
words in the sentence are "中文" "分词" "测试".

as we know,there is no space between chinese word.
generally there are 2 appproachs use to chinese wordsegment.

1>we assume 2 chinese character nearby are wors.
the sentence above will cut into "中文" "文分" "分词" "词测" "测试".
words "文分" "词测" are no use to us,it's the shortcoming of this approach.

2>look up chinese dictionary and find the word in sentence.
this way has smallest redundancy,but may be slow.


I find some chinese use FUDforum,but most of them have search problem.
If chinese search problem solved,I think more chinese will choose FUDforum.
Thanks.
Re: about chinese WordSegment [message #27311 is a reply to message #27310] Sun, 04 September 2005 01:29 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Right now the forum indexes text by individual chinese characters.

FUDforum Core Developer
Re: about chinese WordSegment [message #33655 is a reply to message #27158] Tue, 12 September 2006 02:37 Go to previous messageGo to next message
hightman is currently offline  hightman   China
Messages: 4
Registered: February 2003
Location: China
Karma: 0
Junior Member

You can get the chinese words segment from here:

http://cws.twomice.net/ (use PHP + dba(cdb))

or use the MySQL 4.0.*, I had patched the MySQL for Chinese Full text support.... what you do just the following SQL:

SELECT * FROM ... WHERE MATCH(title) AGAINST ('...');

more info vist http://myft.twomice.net/ plz ...
Re: about chinese WordSegment [message #33656 is a reply to message #33655] Tue, 12 September 2006 03:02 Go to previous message
u0u0 is currently offline  u0u0   China
Messages: 9
Registered: August 2005
Karma: 0
Junior Member
thanks hightman.
I will spare some time to learn your chinese WordSegment,
maybe it can port into fudforum.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: redirect issue
Next Topic: list insert window size issue
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Mon May 13 00:27:11 GMT 2024

Total time taken to generate the page: 0.02058 seconds