[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: hash in subversions vdelta.c

From: Branko Čibej <brane_at_xbc.nu>
Date: 2002-08-21 19:12:04 CEST

Mikkel Fahnøe Jørgensen wrote:

>Forward, I originally sent this comment to Brane who
>asked to forward to this list. I'm not on the list.
>The comment relates to a discussion in the forum late
>2000 I found searching on vdelta. (FYI: I just
>discovered zdelta http://cis.poly.edu/zdelta/ , but it
>is more complex than vdiff).
>Regards Mikkel
> --- Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
>skrev: > Dato: Sat, 17 Aug 2002 12:42:33 +0200 (CEST)
>>Fra: Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
>>Emne: hash in subversions vdelta.c
>>Til: brane@xbc.nu
>>Regarding subversion, vdelta source
>>I saw your discussion on hash functions in vdelta.c
>>(see snippet below)
>>I assume the h = h * 127 + *k++ is the faster
>>(it's also in the current source).
>>The C-compiler optimizes this hash to h = (h << 8) -
>>+ *k++,

Actually, this should be "h = (h << 7) - h + *k++". It would be faster
to multiply by 129, then it would become "h += (h << 7) + *k++".

>> whereas the 97 version probably needs a
Interestingly, the multiplication that converts to a shift-plus-add can
be faster even without optimization, because (presumably) executing the
instruction itself takes less time.

>> For a tight loop this may make a
>>difference. Since the hash loops 4 times, the
>>may even unroll the loop.
I fully expect the compiler to unroll this.

>> Depending on processor and
>>byte alignment issues, this may even reduce to
>>the 4 bytes into memory in one operation and
>>subtracting a constant from the result. This is
>>of magnitudes faster than looping 4 times doing
>>multiplications with 97 and adding a constant each
"Orders of magniture" may be right for the hash function itself, but
that's just a small part of the total calculation.

>>I have also attached my 32bit version of Bob Jenkins
>>hash function. It is slower but generates fewer
>>conflicts (not important here as the conflicts
>>mostly come from looking only at a prefix).
Interesting. I'll definitely look into this if hashing turns out to be a
bottleneck. For now, though I think clarity is more important.

>> It does not
>>require that hashtable to be a multiple of a prime
>>(you can just take modulo a power of two).
No good hash function requires a prime modulo. Most 2-universal
multiplicative hash functions (i.e., those with an odd multiplier --
which is what we're using now) definitely don't.


BTW, did you do any performance comparisons between the current code and
the mix hash? I'd be very interested in seeing some numbers. (Our
randm-test.c could probably be used for that.)

Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Aug 21 19:12:41 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.