FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » FYI performance of hash algorithms
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: FYI performance of hash algorithms [message #177854 is a reply to message #177853] Mon, 23 April 2012 20:44 Go to previous message
M. Strobel is currently offline  M. Strobel
Messages: 386
Registered: December 2011
Karma:
Senior Member
Am 23.04.2012 22:16, schrieb Jerry Stuckle:
> On 4/23/2012 3:42 PM, M. Strobel wrote:
>> I am just asking myself which of the many hash algorithms performs best, so I can use
>> it to index my translation strings.
>>
>> So I wrote a test script. It iterates over an array of 3.917.116 strings calculating
>> the hash of each.
>>
>> Here you have the results, with run time, and run time relative to hash length
>> (smaller is better).
>> Interesting that the tiger hashes seem to perform well.
>>
>> There were only collisions with adler32 (many), crc32 and crc32b.
>>
>> /Str.
>>
>
> <snip>
>
> Interesting, but only applicable on your system with your libraries and your data. A
> different system could have different versions of the libraries and completely
> different performance figures.

Agreed. But it gives you hints.

> As for collisions - completely dependent on your data. While you can say some things
> with certainty (i.e. hashing 2^16+1 values with CRC-16 is guaranteed to have at least
> one collision), by picking different strings to be hashed your collisions will come
> out differently.
>
> For instance, it's not really surprising that adler32 had a lot of collisions. It
> was designed for CRC and works better with binary data; it was never meant to be a
> general purpose hashing algorithm. Rather it was designed for speed. It's also very
> short.
>
> The same is true but to a lesser extent with the CRC-32 and CRC-32b algorithms, so
> again I'm not surprised about the collisions.

Neither am I. I just did not cut the list, they are listed by hash_algos().

> But the real question here is - how much does it matter if hashing a string takes 2
> microseconds or 2.5 microseconds? Unless you're doing millions of them at one time,
> no one is going to notice the difference. Personally, when looking at such things I
> always elect the fewest collisions over the fastest time.

Even for this simple test I had to bump up memory_limit to 3G. And a real collision
test is difficult.

The problem is indeed that hashes are often used without collision detection, I am
running the app without as well. I made this test to have an indication about how
much longer one of the longer hashes takes to compute, for example md5 / sha1.

For me the winner is ripemd256: a very long hash, running relatively fast.

/Str.
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Stats comp.lang.php (last 7 days)
Next Topic: Download now
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Sep 28 09:21:57 GMT 2024

Total time taken to generate the page: 0.05442 seconds