[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: hash in subversions vdelta.c

From: Branko Čibej <brane_at_xbc.nu>
Date: 2002-08-21 19:12:04 CEST

Mikkel Fahnøe Jørgensen wrote:

>Forward, I originally sent this comment to Brane who
>asked to forward to this list. I'm not on the list.
>The comment relates to a discussion in the forum late
>2000 I found searching on vdelta. (FYI: I just
>discovered zdelta http://cis.poly.edu/zdelta/ , but it
>is more complex than vdiff).
>
>Regards Mikkel
>
> --- Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
>skrev: > Dato: Sat, 17 Aug 2002 12:42:33 +0200 (CEST)
>
>
>>Fra: Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
>>Emne: hash in subversions vdelta.c
>>Til: brane@xbc.nu
>>
>>Regarding subversion, vdelta source
>>
>>I saw your discussion on hash functions in vdelta.c
>>(see snippet below)
>>
>>I assume the h = h * 127 + *k++ is the faster
>>version
>>(it's also in the current source).
>>The C-compiler optimizes this hash to h = (h << 8) -
>>1
>>+ *k++,
>>

Actually, this should be "h = (h << 7) - h + *k++". It would be faster
to multiply by 129, then it would become "h += (h << 7) + *k++".

>> whereas the 97 version probably needs a
>>multiplication.
>>
Interestingly, the multiplication that converts to a shift-plus-add can
be faster even without optimization, because (presumably) executing the
instruction itself takes less time.

>> For a tight loop this may make a
>>difference. Since the hash loops 4 times, the
>>compiler
>>may even unroll the loop.
>>
I fully expect the compiler to unroll this.

>> Depending on processor and
>>byte alignment issues, this may even reduce to
>>loading
>>the 4 bytes into memory in one operation and
>>subtracting a constant from the result. This is
>>orders
>>of magnitudes faster than looping 4 times doing
>>multiplications with 97 and adding a constant each
>>time.
>>
"Orders of magniture" may be right for the hash function itself, but
that's just a small part of the total calculation.

>>I have also attached my 32bit version of Bob Jenkins
>>hash function. It is slower but generates fewer
>>conflicts (not important here as the conflicts
>>mostly come from looking only at a prefix).
>>
Interesting. I'll definitely look into this if hashing turns out to be a
bottleneck. For now, though I think clarity is more important.

>> It does not
>>require that hashtable to be a multiple of a prime
>>(you can just take modulo a power of two).
>>
No good hash function requires a prime modulo. Most 2-universal
multiplicative hash functions (i.e., those with an odd multiplier --
which is what we're using now) definitely don't.

[snip]

BTW, did you do any performance comparisons between the current code and
the mix hash? I'd be very interested in seeing some numbers. (Our
randm-test.c could probably be used for that.)

-- 
Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Aug 21 19:12:41 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.