[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1333326 - in /subversion/trunk/subversion: include/private/svn_hash_private.h libsvn_fs_fs/temp_serializer.c libsvn_subr/hash.c

From: Greg Stein <gstein_at_gmail.com>
Date: Wed, 23 May 2012 01:25:06 -0400

On Wed, May 23, 2012 at 12:44 AM, Branko Čibej <brane_at_apache.org> wrote:
>...
> I'd really like to see you explain why this change of yours (33 -> 33^4)
> is relevant in practice. It's not at all clear that this multiplier
> gives a better key distribution than the time-honoured 33.

Actually, there are some reasoned/studied arguments for 33 ("it works
well, but nobody knows why"). And 33^4 is likely a poor replacement
:-P

For PoCore's hash table[1], I did a survey of the research around
hashing functions. I selected the FNV-1 hash function:
  http://www.isthe.com/chongo/tech/comp/fnv/

Comparisons of functions are here:
  http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

The 33 variety is named as the "Bernstein hash".

> It's my considered opinion that this fiddling around with hash function
> implementations is way overboard. Just use apr_hashfunc_default already.
> Unless you can prove that using your "optimized" version results in
> siginificant savings in space and/or time, anything else is just piling
> on more lines of code that need to be maintained for no good reason.

I'm assuming Stefan ran some tests, and (iirc) saw a few percent
increase. For that, maybe a new hash function is okay. (it isn't like
he built a whole new type; just a new func)

Cheers,
-g

[1] http://pocore.googlecode.com/svn/trunk/src/hash.c
Received on 2012-05-23 07:25:45 CEST

This is an archived mail posted to the Subversion Dev mailing list.