FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » FYI performance of hash algorithms
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: FYI performance of hash algorithms [message #177853 is a reply to message #177852] Mon, 23 April 2012 20:16 Go to previous messageGo to previous message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma:
Senior Member
On 4/23/2012 3:42 PM, M. Strobel wrote:
> I am just asking myself which of the many hash algorithms performs best, so I can use
> it to index my translation strings.
>
> So I wrote a test script. It iterates over an array of 3.917.116 strings calculating
> the hash of each.
>
> Here you have the results, with run time, and run time relative to hash length
> (smaller is better).
> Interesting that the tiger hashes seem to perform well.
>
> There were only collisions with adler32 (many), crc32 and crc32b.
>
> /Str.
>

<snip>

Interesting, but only applicable on your system with your libraries and
your data. A different system could have different versions of the
libraries and completely different performance figures.

As for collisions - completely dependent on your data. While you can
say some things with certainty (i.e. hashing 2^16+1 values with CRC-16
is guaranteed to have at least one collision), by picking different
strings to be hashed your collisions will come out differently.

For instance, it's not really surprising that adler32 had a lot of
collisions. It was designed for CRC and works better with binary data;
it was never meant to be a general purpose hashing algorithm. Rather it
was designed for speed. It's also very short.

The same is true but to a lesser extent with the CRC-32 and CRC-32b
algorithms, so again I'm not surprised about the collisions.

But the real question here is - how much does it matter if hashing a
string takes 2 microseconds or 2.5 microseconds? Unless you're doing
millions of them at one time, no one is going to notice the difference.
Personally, when looking at such things I always elect the fewest
collisions over the fastest time.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Stats comp.lang.php (last 7 days)
Next Topic: Download now
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Sep 28 09:27:38 GMT 2024

Total time taken to generate the page: 0.04661 seconds