FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » in_array performance in unsorted vs sorted array
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
in_array performance in unsorted vs sorted array [message #178204] Wed, 23 May 2012 02:20 Go to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
add to buddy list
ignore all messages by this user
I am reading transaction records from files. Each record has an
alphanumeric GUID but that record may be repeated in more than one file
(because of overlapping samples). I don't want to process duplicate
records, so I am considering a simple flat file to store the GUID's of
previously processed records.

To keep things simple I plan to use $done=file() to read the flat file,
and a simple if in_array to see if the current GUID has already been
processed, if not process the current record and add its GUID to $done.

Does anyone know if sorting an array has any significant impact on
in_array, or can I simply push push values into $done?

Also is there a better way than foreach() write() to get $done back into
the flat file?
Re: in_array performance in unsorted vs sorted array [message #178205 is a reply to message #178204] Wed, 23 May 2012 08:16 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 5/23/2012 2:20 AM, William Gill wrote:
> I am reading transaction records from files. Each record has an
> alphanumeric GUID but that record may be repeated in more than one file
> (because of overlapping samples). I don't want to process duplicate
> records, so I am considering a simple flat file to store the GUID's of
> previously processed records.
>
> To keep things simple I plan to use $done=file() to read the flat file,
> and a simple if in_array to see if the current GUID has already been
> processed, if not process the current record and add its GUID to $done.
>
> Does anyone know if sorting an array has any significant impact on
> in_array, or can I simply push push values into $done?
>
> Also is there a better way than foreach() write() to get $done back into
> the flat file?
>

Arrays in PHP are associative; their keys are handled as hash values.
So I suspect it makes no difference on whether the array is sorted or not.

Also, what's wrong with foreach() write()? That's how you get arrays
back into a file. Ensure you lock the file so that you don't have two
scripts running against it at the same time.

But it sounds like you should be using a database. That would solve a
lot of your problems.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: in_array performance in unsorted vs sorted array [message #178211 is a reply to message #178205] Wed, 23 May 2012 10:57 Go to previous messageGo to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
add to buddy list
ignore all messages by this user
On 5/23/2012 8:16 AM, Jerry Stuckle wrote:
>
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.
>
OK, That seems to be in sync with what I'm (not) finding in Google.

> Also, what's wrong with foreach() write()? That's how you get arrays
> back into a file. Ensure you lock the file so that you don't have two
> scripts running against it at the same time.
>
Nothing. I just wanted to be sure I wasn't unaware of something better,
like a function inverse to file(). Probably looking at less then 100k
records at any time so I guess performance won't be a problem.

> But it sounds like you should be using a database. That would solve a
> lot of your problems.
>
Yes it does, and that's where this is heading. Right now it is just
summarizing some information, but eventually this will become a
pre-processor for a db.
Re: in_array performance in unsorted vs sorted array [message #178212 is a reply to message #178211] Wed, 23 May 2012 11:01 Go to previous messageGo to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
add to buddy list
ignore all messages by this user
On 5/23/2012 10:57 AM, William Gill wrote:
> On 5/23/2012 8:16 AM, Jerry Stuckle wrote:
>>
>> Arrays in PHP are associative; their keys are handled as hash values. So
>> I suspect it makes no difference on whether the array is sorted or not.
>>
> OK, That seems to be in sync with what I'm (not) finding in Google.
>
>> Also, what's wrong with foreach() write()? That's how you get arrays
>> back into a file. Ensure you lock the file so that you don't have two
>> scripts running against it at the same time.
>>
> Nothing. I just wanted to be sure I wasn't unaware of something better,
> like a function inverse to file(). Probably looking at less then 100k
> records at any time so I guess performance won't be a problem.
>
>> But it sounds like you should be using a database. That would solve a
>> lot of your problems.
>>
> Yes it does, and that's where this is heading. Right now it is just
> summarizing some information, but eventually this will become a
> pre-processor for a db.

Pulled the trigger too soon. forgot to say Thanks.
Re: in_array performance in unsorted vs sorted array [message #178237 is a reply to message #178204] Wed, 23 May 2012 05:36 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On May 23, 7:20 am, William Gill <nore...@domain.invalid> wrote:
> I am reading transaction records from files.  Each record has an
> alphanumeric GUID but that record may be repeated in more than one file
> (because of overlapping samples).  I don't want to process duplicate
> records, so I am considering a simple flat file to store the GUID's of
> previously processed records.
>
> To keep things simple I plan to use $done=file() to read the flat file,
> and a simple if in_array to see if the current GUID has already been
> processed, if not process the current record and add its GUID to $done.
>
> Does anyone know if sorting an array has any significant impact on
> in_array, or can I simply push push values into $done?
>
> Also is there a better way than foreach() write() to get $done back into
> the flat file?

Why not load the files into a database table which has a primary key
of the GUID, then you have one record for each GUID
Re: in_array performance in unsorted vs sorted array [message #178248 is a reply to message #178205] Thu, 24 May 2012 16:20 Go to previous message
Thomas Mlynarczyk is currently offline  Thomas Mlynarczyk
Messages: 131
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Jerry Stuckle schrieb:

>> [Performance of in_array()]
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.

The /keys/ are hash values, yes. But in_array() searches through the
/values/, not the keys, so I suppose (haven't tested it) the performance
is O(n) rather than O(1). On the other hand, I doubt if PHP keeps track
of whether the array is sorted or not, so it probably makes no
difference indeed, as you said.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: Re: Windows binaries 64bit for PHP
Next Topic: On the usage of "@" (error control operator)
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Oct 17 07:41:33 EDT 2017

Total time taken to generate the page: 0.00758 seconds