FUDforum: comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?

Home » Imported messages » comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?

Show: Today's Messages :: Polls :: Message Navigator

Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? [message #184741 is a reply to message #184736]

Sun, 26 January 2014 21:34

Michael Vilain
Messages: 88
Registered: September 2010

Karma:

Member

In article <SG8Fu.46700$vG7(dot)15374(at)en-nntp-03(dot)dc1(dot)easynews(dot)com>,
Richard Damon <Richard(at)Damon-Family(dot)org> wrote:

> On 1/26/14, 8:34 AM, rob(dot)bradford2805(at)gmail(dot)com wrote:
>> As part of my hosting providers re-platforming cycle my site has moved
>> server, on the new server and all new servers php exec() and equivalents
>> are blocked, this has taken out my fast document search that used exec() to
>> call grep then awk. I now need to do the grep part as effectively as
>> possible in PHP as I can no longer access the shell from the scripts. The
>> awk part is easily sorted.
>>
>> What is the best/fastest approach to scan 100+ largish text files for word
>> strings, I really don't wish to index each file into a database as the
>> documents change quite frequently. my grep-awk scan was around one second
>> to begin rendering the results page, I know I can't match that but I can't
>> afford too much of a delay.
>>
>> Any ideas appreciated whilst I look for a new hosting provider, I feel that
>> any hosting set up that makes such a change without notification really has
>> no respect for it's clients.
>>
>> Rob
>>
>
> If you can't call grep from the command line via exec, the best solution
> may be to write a version of grep in your program. Read the files (or
> chunks of them in sequence) and use the PHP string search functions on
> the data block. If reading chunks, make sure to do any needed overlap
> between chunks so you don't miss matches across chunk breaks.

If you're greping multiple large files from an exec, grep will produced
the results on each file as it's processed. You can easily replicate
this behavior from within php.

Loop through an array containing the filenames you want to grep.
In each file, open it and read it into memory as an array.
use preg_grep or preg_match_all to scan the entire array for results.
do whatever you want with the resultant array of matching results.
process the next file. That seems fairly straightforward.

If these files are HUGE (e.g. GB), you may have to do your own I/O with
fopen/fread, convert the string buffer into an array with split, grep
it, and get more data. The problem there is you may pull in a partial
line.

--
DeeDee, don't press that button! DeeDee! NO! Dee...
[I filter all Goggle Groups posts, so any reply may be automatically ignored]

Report message to a moderator

[Message index]

		I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Rob Bradford on Sun, 26 January 2014 13:34
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Richard Damon on Sun, 26 January 2014 14:09
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Michael Vilain on Sun, 26 January 2014 21:34
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Jerry Stuckle on Sun, 26 January 2014 15:29
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Sun, 26 January 2014 20:14
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: The Natural Philosoph on Sun, 26 January 2014 19:56
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Ben Bacarisse on Sun, 26 January 2014 20:55
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Mon, 27 January 2014 01:43
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Arno Welzel on Mon, 27 January 2014 09:58
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Mon, 27 January 2014 12:23
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Ben Bacarisse on Mon, 27 January 2014 17:05
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Rob Bradford on Mon, 27 January 2014 21:14

Previous Topic:	include capturing wrong value
Next Topic:	help with preg_match pattern

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Wed Apr 09 05:00:46 GMT 2025

Total time taken to generate the page: 0.03747 seconds