FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? [message #184741 is a reply to message #184736] Sun, 26 January 2014 21:34 Go to previous messageGo to previous message
Michael Vilain is currently offline  Michael Vilain
Messages: 88
Registered: September 2010
Karma:
Member
In article <SG8Fu.46700$vG7(dot)15374(at)en-nntp-03(dot)dc1(dot)easynews(dot)com>,
Richard Damon <Richard(at)Damon-Family(dot)org> wrote:

> On 1/26/14, 8:34 AM, rob(dot)bradford2805(at)gmail(dot)com wrote:
>> As part of my hosting providers re-platforming cycle my site has moved
>> server, on the new server and all new servers php exec() and equivalents
>> are blocked, this has taken out my fast document search that used exec() to
>> call grep then awk. I now need to do the grep part as effectively as
>> possible in PHP as I can no longer access the shell from the scripts. The
>> awk part is easily sorted.
>>
>> What is the best/fastest approach to scan 100+ largish text files for word
>> strings, I really don't wish to index each file into a database as the
>> documents change quite frequently. my grep-awk scan was around one second
>> to begin rendering the results page, I know I can't match that but I can't
>> afford too much of a delay.
>>
>> Any ideas appreciated whilst I look for a new hosting provider, I feel that
>> any hosting set up that makes such a change without notification really has
>> no respect for it's clients.
>>
>> Rob
>>
>
> If you can't call grep from the command line via exec, the best solution
> may be to write a version of grep in your program. Read the files (or
> chunks of them in sequence) and use the PHP string search functions on
> the data block. If reading chunks, make sure to do any needed overlap
> between chunks so you don't miss matches across chunk breaks.

If you're greping multiple large files from an exec, grep will produced
the results on each file as it's processed. You can easily replicate
this behavior from within php.

Loop through an array containing the filenames you want to grep.
In each file, open it and read it into memory as an array.
use preg_grep or preg_match_all to scan the entire array for results.
do whatever you want with the resultant array of matching results.
process the next file. That seems fairly straightforward.

If these files are HUGE (e.g. GB), you may have to do your own I/O with
fopen/fread, convert the string buffer into an array with split, grep
it, and get more data. The problem there is you may pull in a partial
line.

--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: include capturing wrong value
Next Topic: help with preg_match pattern
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Nov 30 09:58:56 GMT 2024

Total time taken to generate the page: 0.05827 seconds