Mikkel FahnÃ¸e JÃ¸rgensen wrote:

*>Forward, I originally sent this comment to Brane who
*

*>asked to forward to this list. I'm not on the list.
*

*>The comment relates to a discussion in the forum late
*

*>2000 I found searching on vdelta. (FYI: I just
*

*>discovered zdelta http://cis.poly.edu/zdelta/ , but it
*

*>is more complex than vdiff).
*

*>
*

*>Regards Mikkel
*

*>
*

*> --- Mikkel FahnÃ¸e JÃ¸rgensen <mikkel_f_j@yahoo.dk>
*

*>skrev: > Dato: Sat, 17 Aug 2002 12:42:33 +0200 (CEST)
*

*>
*

*>
*

*>>Fra: Mikkel FahnÃ¸e JÃ¸rgensen <mikkel_f_j@yahoo.dk>
*

*>>Emne: hash in subversions vdelta.c
*

*>>Til: brane@xbc.nu
*

*>>
*

*>>Regarding subversion, vdelta source
*

*>>
*

*>>I saw your discussion on hash functions in vdelta.c
*

*>>(see snippet below)
*

*>>
*

*>>I assume the h = h * 127 + *k++ is the faster
*

*>>version
*

*>>(it's also in the current source).
*

*>>The C-compiler optimizes this hash to h = (h << 8) -
*

*>>1
*

*>>+ *k++,
*

*>>
*

Actually, this should be "h = (h << 7) - h + *k++". It would be faster

to multiply by 129, then it would become "h += (h << 7) + *k++".

*>> whereas the 97 version probably needs a
*

*>>multiplication.
*

*>>
*

Interestingly, the multiplication that converts to a shift-plus-add can

be faster even without optimization, because (presumably) executing the

instruction itself takes less time.

*>> For a tight loop this may make a
*

*>>difference. Since the hash loops 4 times, the
*

*>>compiler
*

*>>may even unroll the loop.
*

*>>
*

I fully expect the compiler to unroll this.

*>> Depending on processor and
*

*>>byte alignment issues, this may even reduce to
*

*>>loading
*

*>>the 4 bytes into memory in one operation and
*

*>>subtracting a constant from the result. This is
*

*>>orders
*

*>>of magnitudes faster than looping 4 times doing
*

*>>multiplications with 97 and adding a constant each
*

*>>time.
*

*>>
*

"Orders of magniture" may be right for the hash function itself, but

that's just a small part of the total calculation.

*>>I have also attached my 32bit version of Bob Jenkins
*

*>>hash function. It is slower but generates fewer
*

*>>conflicts (not important here as the conflicts
*

*>>mostly come from looking only at a prefix).
*

*>>
*

Interesting. I'll definitely look into this if hashing turns out to be a

bottleneck. For now, though I think clarity is more important.

*>> It does not
*

*>>require that hashtable to be a multiple of a prime
*

*>>(you can just take modulo a power of two).
*

*>>
*

No good hash function requires a prime modulo. Most 2-universal

multiplicative hash functions (i.e., those with an odd multiplier --

which is what we're using now) definitely don't.

[snip]

BTW, did you do any performance comparisons between the current code and

the mix hash? I'd be very interested in seeing some numbers. (Our

randm-test.c could probably be used for that.)

--
Brane ÄŒibej <brane_at_xbc.nu> http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Wed Aug 21 19:12:41 2002