FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? [message #184744 is a reply to message #184743] Mon, 27 January 2014 12:23 Go to previous messageGo to previous message
Denis McMahon is currently offline  Denis McMahon
Messages: 634
Registered: September 2010
Karma:
Senior Member
On Mon, 27 Jan 2014 10:58:42 +0100, Arno Welzel wrote:

> Am 27.01.2014 02:43, schrieb Denis McMahon:
>
>> On Sun, 26 Jan 2014 05:34:21 -0800, rob.bradford2805 wrote:
>>
>>> What is the best/fastest approach to scan 100+ largish text files for
>>> word strings
>>
>> A quick googling finds:
>>
>> http://sourceforge.net/projects/php-grep/
>> http://net-wrench.com/download-tools/php-grep.php
>>
>> Claims to be able to search 1000 files in under 10 secs
>
> Under ideal conditions - maybe. But if each file is more than 1 MB, it
> is barely possible to even read this amount of data in just 10 seconds
> (assuming around 80 MB/s and 1000 MB of data to be searched).
>
> Even using a simple word index (word plus the name of the file(s) and
> the position(s) where the word is located) would be the better solution.

Indeed, the fastest solution would be to index each file when it changes,
and keep the indexes in a db.

Perhaps there are common words you wouldn't index, in english these might
include:

a the in on an this that then ....

Then if you have a search phrase, remove the common words, look for the
uncommon words in close proximity to each other

It might help to know more about the grep too, is this using complex
regexp, or is it a simple string search done externally using grep.

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: include capturing wrong value
Next Topic: help with preg_match pattern
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Nov 27 04:34:19 GMT 2024

Total time taken to generate the page: 0.05719 seconds