FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » use of array_key_exists() to prevent duplicates?
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
use of array_key_exists() to prevent duplicates? [message #173013] Thu, 17 March 2011 15:35 Go to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
I have a very simple app that I am working on. I have input files
containing records with a unique serial number followed by a short (<255
char) memo. The source program duplicates records from previous runs,
along with any new records. I want to create a result file w/o
duplicates. I am thinking of reading the files into an array where
serial number becomes the key and memo the value, using
array_key_exists() to filter out duplicates. The source files are no
more than a couple hundred records each, and the master should never
exceed a couple thousand.

Does anybody see any drawbacks to this, or have a better approach?

Thanks,

Bill
Re: use of array_key_exists() to prevent duplicates? [message #173014 is a reply to message #173013] Thu, 17 March 2011 15:58 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
On Mar 17, 3:35 pm, William Gill <nos...@domain.invalid> wrote:
> I have a very simple app that I am working on.  I have input files
> containing records with a unique serial number followed by a short (<255
> char) memo.  The source program duplicates records from previous runs,
> along with any new records.  I want to create a result file w/o
> duplicates.  I am thinking of reading the files into an array where
> serial number becomes the key and memo the value, using
> array_key_exists() to filter out duplicates.   The source files are no
> more than a couple hundred records each, and the master should never
> exceed a couple thousand.
>
> Does anybody see any drawbacks to this, or have a better approach?
>
> Thanks,
>
> Bill

Alternative 1) Just assign the values to an associative array with the
serial number as the key. At the end a foreach will produce a list of
unique values. No need for array_key_exists() at all.

Alternative 2) If you are planning on storing the data in a database
just use INSERT IGNORE.
Re: use of array_key_exists() to prevent duplicates? [message #173015 is a reply to message #173014] Thu, 17 March 2011 17:09 Go to previous messageGo to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
On 3/17/2011 11:58 AM, Captain Paralytic wrote:
>
> Alternative 1) Just assign the values to an associative array with the
> serial number as the key. At the end a foreach will produce a list of
> unique values. No need for array_key_exists() at all.
>
Is this any more efficient than if (array_key_exists()) ?
On the minus side, if I edit or modify the memo, it will be overwritten
using this alternative. I could avoid this, but it seems unnecessarily
complicated.

> Alternative 2) If you are planning on storing the data in a database
> just use INSERT IGNORE.

Had considered this, but at present a flat file seems adequate.
Re: use of array_key_exists() to prevent duplicates? [message #173031 is a reply to message #173015] Fri, 18 March 2011 10:05 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
On Mar 17, 5:09 pm, William Gill <nos...@domain.invalid> wrote:
> On 3/17/2011 11:58 AM, Captain Paralytic wrote:
>
>> Alternative 1) Just assign the values to an associative array with the
>> serial number as the key. At the end a foreach will produce a list of
>> unique values. No need for array_key_exists() at all.
>
> Is this any more efficient than if (array_key_exists()) ?
> On the minus side, if I edit or modify the memo, it will be overwritten
> using this alternative.  I could avoid this, but it seems unnecessarily
> complicated.
Not complicated at all. My original suggestion keeps the last copy. If
they are "duplicates" then they will all be the same. If instead you
want the first copy you just add an if test as in
if(!isset($myarray[$mykey]))
$myarray[$mykey] = $myvalue;
Re: use of array_key_exists() to prevent duplicates? [message #173038 is a reply to message #173031] Fri, 18 March 2011 15:06 Go to previous messageGo to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
On 3/18/2011 6:05 AM, Captain Paralytic wrote:
> On Mar 17, 5:09 pm, William Gill<nos...@domain.invalid> wrote:
>> On 3/17/2011 11:58 AM, Captain Paralytic wrote:
>>
>>> Alternative 1) Just assign the values to an associative array with the
>>> serial number as the key. At the end a foreach will produce a list of
>>> unique values. No need for array_key_exists() at all.
>>
>> Is this any more efficient than if (array_key_exists()) ?
>> On the minus side, if I edit or modify the memo, it will be overwritten
>> using this alternative. I could avoid this, but it seems unnecessarily
>> complicated.
> Not complicated at all. My original suggestion keeps the last copy. If
> they are "duplicates" then they will all be the same. If instead you
> want the first copy you just add an if test as in
> if(!isset($myarray[$mykey]))
> $myarray[$mykey] = $myvalue;
>
Yes, after I spoke I realized that it didn't have to be too complicated,
but I still have to ask: is there any advantage to testing for isset()
as opposed to testing for array_key_exists() (besides the treatment of
NULL values), or are you just posing an equally viable alternative?
Re: use of array_key_exists() to prevent duplicates? [message #173039 is a reply to message #173038] Fri, 18 March 2011 16:15 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
On Mar 18, 3:06 pm, William Gill <nos...@domain.invalid> wrote:
> On 3/18/2011 6:05 AM, Captain Paralytic wrote:
>
>
>
>
>
>
>
>> On Mar 17, 5:09 pm, William Gill<nos...@domain.invalid>  wrote:
>>> On 3/17/2011 11:58 AM, Captain Paralytic wrote:
>
>>>> Alternative 1) Just assign the values to an associative array with the
>>>> serial number as the key. At the end a foreach will produce a list of
>>>> unique values. No need for array_key_exists() at all.
>
>>> Is this any more efficient than if (array_key_exists()) ?
>>> On the minus side, if I edit or modify the memo, it will be overwritten
>>> using this alternative.  I could avoid this, but it seems unnecessarily
>>> complicated.
>> Not complicated at all. My original suggestion keeps the last copy. If
>> they are "duplicates" then they will all be the same. If instead you
>> want the first copy you just add an if test as in
>> if(!isset($myarray[$mykey]))
>>    $myarray[$mykey] = $myvalue;
>
> Yes, after I spoke I realized that it didn't have to be too complicated,
> but I still have to ask: is there any advantage to testing for isset()
> as opposed to testing for array_key_exists() (besides the treatment of
> NULL values), or are you just posing an equally viable alternative?

Well actually I'm not clear from your OP what you really need. If the
source program produces duplicates of old records as well as new ones,
then provided that the new records come after the old ones, I would
assume that the one you wanted to end up with was the latest one and
so you would not want to use either array_key_exists() or isset().

If you want the first occurrence then all the benchmark tests show
isset() being much more efficient than array_key_exists().
Re: use of array_key_exists() to prevent duplicates? [message #173040 is a reply to message #173039] Fri, 18 March 2011 16:34 Go to previous messageGo to next message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
On 3/18/2011 12:15 PM, Captain Paralytic wrote:
> Well actually I'm not clear from your OP what you really need. If the
> source program produces duplicates of old records as well as new ones,
> then provided that the new records come after the old ones, I would
> assume that the one you wanted to end up with was the latest one and
> so you would not want to use either array_key_exists() or isset().
To answer your question, not mentioned in the OP, but mentioned
subsequently, edits to the memo field should be maintained.
>
> If you want the first occurrence then all the benchmark tests show
> isset() being much more efficient than array_key_exists().
.... and this answers mine. It's easy enough to take an isset() approach
vise array_key_exists(), and still come up with what I need.

Thanks.
Re: use of array_key_exists() to prevent duplicates? [message #173041 is a reply to message #173040] Fri, 18 March 2011 16:46 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
On Mar 18, 4:34 pm, William Gill <nos...@domain.invalid> wrote:
> On 3/18/2011 12:15 PM, Captain Paralytic wrote:> Well actually I'm not clear from your OP what you really need. If the
>> source program produces duplicates of old records as well as new ones,
>> then provided that the new records come after the old ones, I would
>> assume that the one you wanted to end up with was the latest one and
>> so you would not want to use either array_key_exists() or isset().
>
> To answer your question, not mentioned in the OP, but mentioned
> subsequently, edits to the memo field  should be maintained.

I did read that, but it doesn't say whether the edit record is the
first one or the last one. It is this that decides what one you wish
to keep.
Re: use of array_key_exists() to prevent duplicates? [message #173043 is a reply to message #173041] Fri, 18 March 2011 17:17 Go to previous message
William Gill is currently offline  William Gill
Messages: 31
Registered: March 2011
Karma: 0
Member
On 3/18/2011 12:46 PM, Captain Paralytic wrote:
> On Mar 18, 4:34 pm, William Gill<nos...@domain.invalid> wrote:
>> On 3/18/2011 12:15 PM, Captain Paralytic wrote:> Well actually I'm not clear from your OP what you really need. If the
>>> source program produces duplicates of old records as well as new ones,
>>> then provided that the new records come after the old ones, I would
>>> assume that the one you wanted to end up with was the latest one and
>>> so you would not want to use either array_key_exists() or isset().
>>
>> To answer your question, not mentioned in the OP, but mentioned
>> subsequently, edits to the memo field should be maintained.
>
> I did read that, but it doesn't say whether the edit record is the
> first one or the last one. It is this that decides what one you wish
> to keep.

I'm taking info from reports outside of my control or I would edit
records directly, and this exercise would be unnecessary. I take the
flat file output (report) and create a new flat file db with my
edits/additions for tracking, and analysis. I can easily add additional
fields like comments or status. The problem is the "snapshots" from the
source repeat some old records along any new records. I need to capture
only the new ones, message them and update my file. Fortunately source
records have unique serial numbers (transaction numbers) so I can easily
determine if a record has been seen/processed or not.

Sounded like a good candidate for a SQL db, but after initial analysis I
decided that would be a single table db, and was overkill at this time.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Php Modal
Next Topic: Pipe the content of a variable to a process
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 00:57:19 GMT 2024

Total time taken to generate the page: 0.03543 seconds