Forward, I originally sent this comment to Brane who

asked to forward to this list. I'm not on the list.

The comment relates to a discussion in the forum late

2000 I found searching on vdelta. (FYI: I just

discovered zdelta http://cis.poly.edu/zdelta/ , but it

is more complex than vdiff).

Regards Mikkel

--- Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>

skrev: > Dato: Sat, 17 Aug 2002 12:42:33 +0200 (CEST)

*> Fra: Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
*

*> Emne: hash in subversions vdelta.c
*

*> Til: brane@xbc.nu
*

*>
*

*> Regarding subversion, vdelta source
*

*>
*

*> I saw your discussion on hash functions in vdelta.c
*

*> (see snippet below)
*

*>
*

*> I assume the h = h * 127 + *k++ is the faster
*

*> version
*

*> (it's also in the current source).
*

*> The C-compiler optimizes this hash to h = (h << 8) -
*

*> 1
*

*> + *k++, whereas the 97 version probably needs a
*

*> multiplication. For a tight loop this may make a
*

*> difference. Since the hash loops 4 times, the
*

*> compiler
*

*> may even unroll the loop. Depending on processor and
*

*> byte alignment issues, this may even reduce to
*

*> loading
*

*> the 4 bytes into memory in one operation and
*

*> subtracting a constant from the result. This is
*

*> orders
*

*> of magnitudes faster than looping 4 times doing
*

*> multiplications with 97 and adding a constant each
*

*> time.
*

*>
*

*> I have also attached my 32bit version of Bob Jenkins
*

*> hash function. It is slower but generates fewer
*

*> conflicts (not important here as the conflicts
*

*> mostly
*

*> come from looking only at a prefix). It does not
*

*> require that hashtable to be a multiple of a prime
*

*> (you can just take modulo a power of two).
*

*>
*

*> Regards Mikkel
*

*>
*

*> ---
*

*> [brane@lux tests]$ perl stat.pl < stats-1 # h
*

*> =
*

*> h * 127 + *k++
*

*> Average load: 51.21
*

*> Average collisions: 59728.10
*

*> Corrected collisions: 1158.74
*

*> [brane@lux tests]$ perl stat.pl < stats-2 # h
*

*> =
*

*> h * 97 + *k++ + 41
*

*> Average load: 51.22
*

*> Average collisions: 59716.55
*

*> Corrected collisions: 1157.68
*

*>
*

*> Now if someone will volunteer to add some /real/
*

*> analysis code
*

*> (average chain length, deviation, etc., etc.), I'd
*

*> be
*

*> interested
*

*> to see the results. :-)
*

*> ---
*

*>
*

*>
*

*> /*
*

*> Based on Bob Jenkins lookup2.c file, hash2
*

*> function,
*

*> modified to specifically handle 32bit keys
*

*> i.e. hash2 with length=1, initval=0
*

*> note that according to Jenkins, the hashtable
*

*> should be a power
*

*> of two with excessive bits masked. That is, no
*

*> prime number required.
*

*>
*

*> http://burtleburtle.net/bob/c/lookup2.c
*

*> http://burtleburtle.net/bob/hash/hashfaq.html
*

*> */
*

*> typedef unsigned long ub4; /* unsigned 4 byte value
*

*> (32 bit) */
*

*> inline ub4 HashId(ub4 k)
*

*> {
*

*> ub4 a,b,c;
*

*>
*

*> /* Set up the internal state */
*

*>
*

*> a = b = 0x9e3779b9; /* the golden ratio; an
*

*> arbitrary value */
*

*> c = 1;
*

*> a += k;
*

*>
*

*> /* This is Bob Jenkins mix(a,b,c) macro embedded
*

*> */
*

*> a -= b; a -= c; a ^= (c>>13);
*

*> b -= c; b -= a; b ^= (a<<8);
*

*> c -= a; c -= b; c ^= (b>>13);
*

*> a -= b; a -= c; a ^= (c>>12);
*

*> b -= c; b -= a; b ^= (a<<16);
*

*> c -= a; c -= b; c ^= (b>>5);
*

*> a -= b; a -= c; a ^= (c>>3);
*

*> b -= c; b -= a; b ^= (a<<10);
*

*> c -= a; c -= b; c ^= (b>>15);
*

*>
*

*> /*--------------------------------------------
*

*> report the result */
*

*> return c;
*

*> }
*

*>
*

*>
*

*>
*

*> Få den nye Yahoo! Messenger på
*

*> www.yahoo.dk/messenger
*

*> Nu med webkamera, talechat, interaktive baggrunde og
*

*> meget mere!
*

*>
*

Få den nye Yahoo! Messenger på www.yahoo.dk/messenger

Nu med webkamera, talechat, interaktive baggrunde og meget mere!

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org

For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sun Aug 18 22:00:05 2002