[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Fwd: hash in subversions vdelta.c

From: Mikkel Fahnøe Jørgensen <mikkel_f_j_at_yahoo.dk>
Date: 2002-08-17 15:29:15 CEST

Forward, I originally sent this comment to Brane who
asked to forward to this list. I'm not on the list.
The comment relates to a discussion in the forum late
2000 I found searching on vdelta. (FYI: I just
discovered zdelta http://cis.poly.edu/zdelta/ , but it
is more complex than vdiff).

Regards Mikkel

 --- Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
skrev: > Dato: Sat, 17 Aug 2002 12:42:33 +0200 (CEST)
> Fra: Mikkel Fahnøe Jørgensen <mikkel_f_j@yahoo.dk>
> Emne: hash in subversions vdelta.c
> Til: brane@xbc.nu
>
> Regarding subversion, vdelta source
>
> I saw your discussion on hash functions in vdelta.c
> (see snippet below)
>
> I assume the h = h * 127 + *k++ is the faster
> version
> (it's also in the current source).
> The C-compiler optimizes this hash to h = (h << 8) -
> 1
> + *k++, whereas the 97 version probably needs a
> multiplication. For a tight loop this may make a
> difference. Since the hash loops 4 times, the
> compiler
> may even unroll the loop. Depending on processor and
> byte alignment issues, this may even reduce to
> loading
> the 4 bytes into memory in one operation and
> subtracting a constant from the result. This is
> orders
> of magnitudes faster than looping 4 times doing
> multiplications with 97 and adding a constant each
> time.
>
> I have also attached my 32bit version of Bob Jenkins
> hash function. It is slower but generates fewer
> conflicts (not important here as the conflicts
> mostly
> come from looking only at a prefix). It does not
> require that hashtable to be a multiple of a prime
> (you can just take modulo a power of two).
>
> Regards Mikkel
>
> ---
> [brane@lux tests]$ perl stat.pl < stats-1 # h
> =
> h * 127 + *k++
> Average load: 51.21
> Average collisions: 59728.10
> Corrected collisions: 1158.74
> [brane@lux tests]$ perl stat.pl < stats-2 # h
> =
> h * 97 + *k++ + 41
> Average load: 51.22
> Average collisions: 59716.55
> Corrected collisions: 1157.68
>
> Now if someone will volunteer to add some /real/
> analysis code
> (average chain length, deviation, etc., etc.), I'd
> be
> interested
> to see the results. :-)
> ---
>
>
> /*
> Based on Bob Jenkins lookup2.c file, hash2
> function,
> modified to specifically handle 32bit keys
> i.e. hash2 with length=1, initval=0
> note that according to Jenkins, the hashtable
> should be a power
> of two with excessive bits masked. That is, no
> prime number required.
>
> http://burtleburtle.net/bob/c/lookup2.c
> http://burtleburtle.net/bob/hash/hashfaq.html
> */
> typedef unsigned long ub4; /* unsigned 4 byte value
> (32 bit) */
> inline ub4 HashId(ub4 k)
> {
> ub4 a,b,c;
>
> /* Set up the internal state */
>
> a = b = 0x9e3779b9; /* the golden ratio; an
> arbitrary value */
> c = 1;
> a += k;
>
> /* This is Bob Jenkins mix(a,b,c) macro embedded
> */
> a -= b; a -= c; a ^= (c>>13);
> b -= c; b -= a; b ^= (a<<8);
> c -= a; c -= b; c ^= (b>>13);
> a -= b; a -= c; a ^= (c>>12);
> b -= c; b -= a; b ^= (a<<16);
> c -= a; c -= b; c ^= (b>>5);
> a -= b; a -= c; a ^= (c>>3);
> b -= c; b -= a; b ^= (a<<10);
> c -= a; c -= b; c ^= (b>>15);
>
> /*--------------------------------------------
> report the result */
> return c;
> }
>
>
>
> Få den nye Yahoo! Messenger på
> www.yahoo.dk/messenger
> Nu med webkamera, talechat, interaktive baggrunde og
> meget mere!
>

Få den nye Yahoo! Messenger på www.yahoo.dk/messenger
Nu med webkamera, talechat, interaktive baggrunde og meget mere!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Aug 18 22:00:05 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.