FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Bug Reports » no results when a search includes a numerical character
Show: Today's Messages :: Unread Messages :: Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
no results when a search includes a numerical character [message #25469] Thu, 09 June 2005 03:10 Go to next message
kymw is currently offline  kymw   Australia
Messages: 5
Registered: June 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
Hey Guys,

We have just upgraded to " FUDforum 2.6.13.".
Every time we run a serch that contains a numerical value like "version-7" or "t02001" we get a result of nothing.

Ive rebuilt the search index and still get the same result....
What am i doing wrong?

Thanks in advance
Kym
Re: no results when a search includes a numerical character [message #25483 is a reply to message #25469] Thu, 09 June 2005 09:35 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
Numbers are not being indexed, only textual entries.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #25497 is a reply to message #25483] Thu, 09 June 2005 20:43 Go to previous messageGo to next message
kymw is currently offline  kymw   Australia
Messages: 5
Registered: June 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
is there a way i can set it up so number are indexed?
Re: no results when a search includes a numerical character [message #25507 is a reply to message #25497] Fri, 10 June 2005 14:06 Go to previous messageGo to next message
Ilia is currently offline  Ilia   United States
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
Modify the isearch.php.t word splitting logic.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #25776 is a reply to message #25507] Tue, 21 June 2005 02:58 Go to previous messageGo to next message
kymw is currently offline  kymw   Australia
Messages: 5
Registered: June 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
Thanks,

Where would i look for information on how to do that.
Id really like to be able to index numbers.

Some of the posts in out forum contains error codes like "t02001" which would be great to be able to search for.

Thanks in advance
Kym
Re: no results when a search includes a numerical character [message #25777 is a reply to message #25776] Tue, 21 June 2005 02:59 Go to previous messageGo to next message
kymw is currently offline  kymw   Australia
Messages: 5
Registered: June 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
Or , if you could just point me in the right direction.
Re: no results when a search includes a numerical character [message #25783 is a reply to message #25777] Tue, 21 June 2005 09:12 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
The indexing of the text is done by code inside isearch.inc.t, specifically the text_to_worda() function.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #27093 is a reply to message #25783] Thu, 25 August 2005 19:28 Go to previous messageGo to next message
langer is currently offline  langer   Australia
Messages: 19
Registered: August 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
this seems like a must have... imagine if you needed information regarding a specific version of FUDBB, or a linux package. You would not be able to find it amongst all posts for the same package in all versions!

Any chance of having this as standard?
Re: no results when a search includes a numerical character [message #27094 is a reply to message #27093] Thu, 25 August 2005 19:33 Go to previous messageGo to next message
kymw is currently offline  kymw   Australia
Messages: 5
Registered: June 2005
Location: Sydney, Australia
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
I never did get this to work.
As i dont code much and are unable to get someone to help me with it.

Sigh.

In answer to your idea, i think it would great to have or even the option to include in an out of the box install of fud.

Thanks
Kym
Re: no results when a search includes a numerical character [message #27099 is a reply to message #27093] Thu, 25 August 2005 23:00 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
Numbers are not being indexed.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40018 is a reply to message #25783] Fri, 04 January 2008 06:41 Go to previous messageGo to next message
srchild is currently offline  srchild
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Tue, 21 June 2005 14:12

The indexing of the text is done by code inside isearch.inc.t, specifically the text_to_worda() function.


Is it just a matter of changing

default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1));

to

default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1,1234567890));

(and then rebuild the search index)

TIA



Simon Child
Re: no results when a search includes a numerical character [message #40019 is a reply to message #40018] Fri, 04 January 2008 07:22 Go to previous messageGo to next message
srchild is currently offline  srchild
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
srchild wrote on Fri, 04 January 2008 11:41

Is it just a matter of changing

default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1));

to

default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1,1234567890));


Hmm, maybe this requires PHP >= 5.1?

From the php manual:

http://uk2.php.net/manual/en/function.str-word-count.php

Quote:

Description
mixed str_word_count ( string $string [, int $format [, string $charlist ]] )

ChangeLog
Version Description
5.1.0 Added the charlist parameter






Simon Child
Re: no results when a search includes a numerical character [message #40031 is a reply to message #40019] Sun, 06 January 2008 12:14 Go to previous messageGo to next message
Ilia is currently offline  Ilia   
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
You will need PHP 5.1.0 or greater since the feature was introduced (by me BTW Wink ) in that release.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40854 is a reply to message #40031] Sat, 19 April 2008 19:12 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Sun, 06 January 2008 17:14

You will need PHP 5.1.0 or greater since the feature was introduced (by me BTW Wink ) in that release.


OK, I'm ready to make use of your feature now Smile

Server is scheduled for upgrade to php5 next week, then I want to enable searching on numbers as described above.

The part that concerns me is the rebuilding of the search index, which is said to take a 'long time'. So how long is a 'long time'? I know you can't answer that, but... Smile

My msg_1 file is 120Mb. Current fud_index table has 4,500,000 rows. Server is lightly loaded, loadav usually below 1, have just upgraded the ram to 1.5Gb so it is not needing the pagefile.

Are we talking 30 minutes, or 10 hours, or what? It's a managed server and has a php timeout which I can't increase (I think it is 100minutes). Can I run this reindex from the command line so it runs faster?

What happens if the index is only partially rebuilt? Then I have no useable index at all... Would it be a case then of reinstalling the old fud_index table from backup?

I assume the forum is unavailable during the rebuild?

Thanks


Simon Child
Re: no results when a search includes a numerical character [message #40858 is a reply to message #40854] Sun, 20 April 2008 09:54 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
It would take probably a few hours, running it via command line is therefor recommended. The forum will be available during this time, but be quite slow.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40873 is a reply to message #40858] Tue, 22 April 2008 12:09 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Sun, 20 April 2008 14:54

The forum will be available during this time, but be quite slow.


The manual:

http://fudforum.org/doc/d/html/admin.rebuild_search_index.html

says:

"Warning

While this process is running your forum will be disabled."

....?


Simon Child
Re: no results when a search includes a numerical character [message #40876 is a reply to message #40873] Tue, 22 April 2008 13:54 Go to previous messageGo to next message
Ilia is currently offline  Ilia   United States
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
Well, the speed will be pretty minimal, so you may as well consider it disabled...

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40897 is a reply to message #40876] Sat, 26 April 2008 06:30 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Tue, 22 April 2008 18:54

Well, the speed will be pretty minimal, so you may as well consider it disabled...


Well...

It only took 40minutes. I ran it from commandline, niced, and it didn't bring down the server Smile (though forum wouldn't load during this time).

However, it hasn't indexed numbers.

Before doing the reindexing, having updated the template and rebuilt the theme, I waited a couple of days to check that new posts were being indexed including numbers, and they were. Some new posts, with strings including numbers, could be found by searching those strings, e.g. GP2GP

After rebuilding the index, the search no longer finds those posts.

The rebuild appears to be successful. It cleared the original index, and after the reindex fud_index contains the same number of records as before, and searches for standard strings (text-only e.g. 'test' still works). It just didn't index numbers.

So, does the index rebuild script not make use of word_to_texta? Do I have to make changes somewhere else as well?

using php 5.2.5, mysql 5

Thanks


Simon Child
Re: no results when a search includes a numerical character [message #40919 is a reply to message #40897] Mon, 28 April 2008 17:49 Go to previous messageGo to next message
Ilia is currently offline  Ilia   United States
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
It does use the word_to_texta() function, but perhaps the numbers were shorter then the minimum word length?

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40925 is a reply to message #40919] Mon, 28 April 2008 19:17 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Mon, 28 April 2008 22:49

It does use the word_to_texta() function, but perhaps the numbers were shorter then the minimum word length?


That specific 'word' (GP2GP) was not being indexed before I rebuilt the theme.

After I made the changes to word_to_texta() some posts containing that were indexed and could be found by searching on GP2GP (GP2GP may not mean anything to you, but it is of interest to my forum visitors!)

After I rebuilt the index, no posts including numbers were findable in a search, including those containing GP2GP which were indexed before I rebuilt.

Since I rebuilt, two new posts containing GP2GP have been indexed and can be found in a search for GP2GP, but that is all.

Does the index rebuild use a different minimum word length? Where is the word length set? Incidentally, if you mean the mysql fulltext search word length, I have that set to three characters.

Thanks



Simon Child
Re: no results when a search includes a numerical character [message #40938 is a reply to message #40925] Tue, 29 April 2008 19:41 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
The minimum word length is defined inside search.inc

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40947 is a reply to message #40938] Wed, 30 April 2008 16:22 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Wed, 30 April 2008 00:41

The minimum word length is defined inside search.inc


I guess you mean isearch.inc, I can't find a search.inc

I can't see where word length is defined in there, but in any case the words I'm interested in (e.g. GP2GP) are getting indexed for new posts. What is not working is for these same posts to be indexed when I run indexdb.php in commandline mode to rebuild the search index - even though indexdb.php does indeed rebuild the index the new index does not include these terms.

So somewhere the rebuild of the index is using different parameters to the routine indexing?



Simon Child
Re: no results when a search includes a numerical character [message #40951 is a reply to message #40947] Wed, 30 April 2008 19:38 Go to previous messageGo to next message
Ilia is currently offline  Ilia   United States
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
In your case you need to change the call to function str_word_count()

making it into

array_unique(str_word_count(strtolower($text), 1, '0123456789'));


FUDforum Core Developer
Re: no results when a search includes a numerical character [message #40973 is a reply to message #40951] Sat, 03 May 2008 07:02 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Thu, 01 May 2008 00:38

In your case you need to change the call to function str_word_count()

making it into

array_unique(str_word_count(strtolower($text), 1, '0123456789'));



Hmm, I'd missed off the quotes:

array_unique(str_word_count(strtolower($text),1,1234567890));

But putting them back, rebuilding the theme, checking that include/theme/default/isearch.inc has been updated - it has

array_unique(str_word_count(strtolower($text),1,'1234567890'));

then rebuilding the search index... still not indexing 'GP2GP' in the rebuilt index, even though new posts with that are being indexed as they arrive, and the index rebuild does work otherwise.

I see that indexdb.php does include isearch.inc, but it must be doing something different with it?


Simon Child
Re: no results when a search includes a numerical character [message #41003 is a reply to message #40973] Mon, 05 May 2008 19:09 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
After making the change did you rebuild the theme?

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #41006 is a reply to message #41003] Tue, 06 May 2008 04:51 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Tue, 06 May 2008 00:09

After making the change did you rebuild the theme?


Yes:

%more FUDforum/include/theme/default/isearch.inc

...
 default:
   $t1 = array_unique(str_word_count(strtolower($text),1,'1234567890'));
   if ($text && !$t1) { /* fall through to split by special chars */
      $GLOBALS['usr']->lang = 'latvian';
      continue;
   }
 break;
...


It seems that the command line use of indexdb.php, whilst it appears to include this file, is not using it??

But 'find' doesn't find any other copies of isearch.inc in either FUDforum or forum dorectories.

Confused


Simon Child
Re: no results when a search includes a numerical character [message #41009 is a reply to message #41006] Tue, 06 May 2008 19:15 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
It definitely does, since it used index_text() defined inside isearch.inc to index the message text.

FUDforum Core Developer
Re: no results when a search includes a numerical character [message #41011 is a reply to message #41009] Tue, 06 May 2008 20:21 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Wed, 07 May 2008 00:15

It definitely does, since it used index_text() defined inside isearch.inc to index the message text.


Looking at the code for indexdb.php I can see that as you say it does call index_text() which in turn calls my modified text_to_worda()

However the behaviour of text_to_worda() is influenced by some environment variables:

        /* if no good locale, default to splitting by spaces */
        if (!$GLOBALS['good_locale']) {
                $GLOBALS['usr']->lang = 'latvian';
        }


Might it be that calling it from command line it is not setting a locale?

What would be a good way to fake an appropriate locale?




Simon Child
Re: no results when a search includes a numerical character [message #41018 is a reply to message #41011] Wed, 07 May 2008 19:47 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
Set locale to C.

FUDforum Core Developer
Re: no results when a search includes a numerical character - FIXED [message #41036 is a reply to message #41018] Sun, 11 May 2008 06:32 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
add to buddy list
ignore all messages by this user
Ilia wrote on Thu, 08 May 2008 00:47

Set locale to C.


I found it was already finding it as C.

However stepping through the code I have found the cause, a bug in function text_to_worda

function text_to_worda($text)
{
 $a = array();

 /* if no good locale, default to splitting by spaces */
 if (!$GLOBALS['good_locale']) {
   $GLOBALS['usr']->lang = 'latvian';
 }

 $text = strip_tags(reverse_fmt($text));
 while (1) {
  switch ($GLOBALS['usr']->lang) {
   case 'chinese_big5':
   case 'chinese':
   case 'japanese':
   case 'korean':
    return mb_word_split($text, $GLOBALS['usr']->lang);
    break;

   case 'latvian':
   case 'russian-1251':
    $t1 = array_unique(preg_split('![\x00-\x40]+!', $text, -1, PREG_SPLIT_NO_EMPTY));
    break;

   default:
    $t1 = array_unique(str_word_count(strtolower($text),1,'1234567890'));
    if ($text && !$t1) { /* fall through to split by special chars */
     $GLOBALS['usr']->lang = 'latvian';
     continue;
    }
    break;
                }


The first time through if finds locale as C and language as English, and so as desired goes to 'default':

array_unique(str_word_count(strtolower($text),1,'1234567890'));


However, if any message makes it fall through this:

if ($text && !$t1) { /* fall through to split by special chars */
     $GLOBALS['usr']->lang = 'latvian';
     continue;
    }


then $GLOBALS['usr']->lang is set to Latvian and this persists for the rest of the reindex, affecting parsing of every subsequent message.

When indexing a single message it wouldn't matter that $GLOBALS['usr']->lang gets set to Latvian, since the next message would be a fresh start with it set to English once more. But with the reindex running through all messages in once script, then every subsequent message is processed as though language is Latvian.

So I just changed three lines like this:

function text_to_worda($text)
{
 $a = array();

 /* if no good locale, default to splitting by spaces */
 if (!$GLOBALS['good_locale']) {
   $GLOBALS['usr']->lang = 'latvian';
 }

// use local variable for message language
$thismessagelang = $GLOBALS['usr']->lang;

 $text = strip_tags(reverse_fmt($text));
 while (1) {

//  switch ($GLOBALS['usr']->lang) {
//  switch on message language
  switch ($thismessagelang) {

   case 'chinese_big5':
   case 'chinese':
   case 'japanese':
   case 'korean':
    return mb_word_split($text, $GLOBALS['usr']->lang);
    break;

   case 'latvian':
   case 'russian-1251':
    $t1 = array_unique(preg_split('![\x00-\x40]+!', $text, -1, PREG_SPLIT_NO_EMPTY));
    break;

   default:
    $t1 = array_unique(str_word_count(strtolower($text),1,'1234567890'));
    if ($text && !$t1) { /* fall through to split by special chars */

//   if resetting language, do it locally not globally
//   $GLOBALS['usr']->lang = 'latvian';
     $thismessagelang = 'latvian';

     continue;
    }
    break;
                }


This seems to have fixed it for me, my index now includes numbers as required.

Thanks


Simon Child
Message by kerryg is ignored  [reveal message]  [reveal all messages by kerryg]  [stop ignoring this user] Go to previous messageGo to next message
Re: no results when a search includes a numerical character - FIXED [message #158940 is a reply to message #158789] Fri, 17 April 2009 10:45 Go to previous message
Peter Vendike is currently offline  Peter Vendike   Denmark
Messages: 65
Registered: February 2009
Location: Denmark
Karma: 0
Member
Translator
add to buddy list
ignore all messages by this user
Den tis, 24 mars 2009 03:51 skrev kerryg:
Hi Frank - the ability to search for numerical strings (including the slash character "/" as in "error 1/1" or "7/7" or "1324123g/12341234f") would be be an *extremely* useful function for folks like myself whose forums often discuss software error messages - it's one of the most important things to be able to search for. Do you have any plans to commit this patch? I'd understand if it was best to have it default to "off" for most folks, but it would be a killer feature, well worth some slowdown in searching.



As I read the doc's, the search would not get slower, only the save message (edit) process, as the indexing is made there.

Isn't php >= 5.1 standard these days?

I'm for committing that hack soon if it's working not only for latvian


Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: i18n characters not showing in dates! (utf-8)
Next Topic: Timezones
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Fri Oct 04 19:38:19 EDT 2024

Total time taken to generate the page: 0.05546 seconds