[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: hash in subversions vdelta.c

From: Mikkel Fahn°e J°rgensen <mikkel_f_j_at_yahoo.dk>
Date: 2002-08-21 20:48:05 CEST

  --- Branko ─Œibej <brane@xbc.nu> skrev: > Mikkel
Fahn├Şe J├Şrgensen wrote:

> >>The C-compiler optimizes this hash to h = (h << 8)
> -
> >>1
> >>+ *k++,
> >>
> Actually, this should be "h = (h << 7) - h + *k++".
> It would be faster
> to multiply by 129, then it would become "h += (h <<
> 7) + *k++".

Yes, I realized this shortly after posting the mail
but recognized you'd figure it out. It also has the
benefit of not overflowing the result during shift the
last shift. It does invalidate my argument that it can
trivially load 4 bytes at once.

> >> whereas the 97 version probably needs a
> >>multiplication.
> >>
> Interestingly, the multiplication that converts to a
> shift-plus-add can
> be faster even without optimization, because
> (presumably) executing the
> instruction itself takes less time.

That was the idea.

> "Orders of magniture" may be right for the hash
> function itself, but
> that's just a small part of the total calculation.

But apparently a significant part. The speed
significance of changing hash function is interesting.

> Interesting. I'll definitely look into this if
> hashing turns out to be a
> bottleneck. For now, though I think clarity is more
> important.


> >> It does not
> >>require that hashtable to be a multiple of a prime
> >>(you can just take modulo a power of two).
> >>
> No good hash function requires a prime modulo. Most
> 2-universal
> multiplicative hash functions (i.e., those with an
> odd multiplier --
> which is what we're using now) definitely don't.

This is a different kind of hash. I do not claim to be
an expert on this topic, I only cite what Bob Jenkins
claims. He designed the hash to not require taking
modulo subsequently. He did not argue that it wouldn't
be better taking modulo though - but it was a good
tradeoff that would provide a fast convenient and
efficient hash. I suspect the prime nature is somehow
coded into the hash - he did a brute force search
through a large space to find the hash function
meeting his criteria. You should also weigh the cost
of modulo prime against the cost of slightly increased
level of conflicts.

BTW the version I posted was optimized for 32bit
words. But since the input of VDelta is not aligned,
the byte aligned variable length hash may be a better
starting point. (See Jenkins homepage

> BTW, did you do any performance comparisons between
> the current code and
> the mix hash? I'd be very interested in seeing some
> numbers. (Our
> randm-test.c could probably be used for that.)

No. A long time ago I replaced a hash not unlike yours
with the hash in the Compiler Dragon book for use in a
symbol table. It dramatically improved performance due
to reduced conflicts. Yet the classic Dragon book hash
is not considered to be very efficient at avoiding
conflicts. This is for variable length strings and
your case is for 32bit values, so it cannot be
compared directly. It is cheap to test conflicts for

If you are doing a modulo prime on the hash result,
the mix hash could well turn out to be faster because
it only requires an And operation to clamp the hash
key into range. Otherwise it clearly has more
instructions than unrolling 4 shift/add. My best guess
is that the mix hash will at the latest perform better
when conflicts become significant. However, depending
on the avarage shared run in the data, the fact that
only 4 chars are being hashed may generate the bulk of
conflicts, independently of the hash function in use.

If VDelta is using a 64K window (or whatever) you get
an efficient upper limit on the hashtable which
affects the choice of indexing method. In particular
you can oversize a hash table to reduce conflicts and
you do not need to grow the hash nor risc conflicts
with virtual memory.


Fň den nye Yahoo! Messenger pň www.yahoo.dk/messenger
Nu med webkamera, talechat, interaktive baggrunde og meget mere!

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Aug 21 20:48:37 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.