--- Branko ÄŒibej <brane@xbc.nu> skrev: > Mikkel

FahnÃ¸e JÃ¸rgensen wrote:

*> >>The C-compiler optimizes this hash to h = (h << 8)
*

*> -
*

*> >>1
*

*> >>+ *k++,
*

*> >>
*

*>
*

*> Actually, this should be "h = (h << 7) - h + *k++".
*

*> It would be faster
*

*> to multiply by 129, then it would become "h += (h <<
*

*> 7) + *k++".
*

Yes, I realized this shortly after posting the mail

but recognized you'd figure it out. It also has the

benefit of not overflowing the result during shift the

last shift. It does invalidate my argument that it can

trivially load 4 bytes at once.

*>
*

*> >> whereas the 97 version probably needs a
*

*> >>multiplication.
*

*> >>
*

*> Interestingly, the multiplication that converts to a
*

*> shift-plus-add can
*

*> be faster even without optimization, because
*

*> (presumably) executing the
*

*> instruction itself takes less time.
*

That was the idea.

*> "Orders of magniture" may be right for the hash
*

*> function itself, but
*

*> that's just a small part of the total calculation.
*

But apparently a significant part. The speed

significance of changing hash function is interesting.

*> Interesting. I'll definitely look into this if
*

*> hashing turns out to be a
*

*> bottleneck. For now, though I think clarity is more
*

*> important.
*

Certainly.

*> >> It does not
*

*> >>require that hashtable to be a multiple of a prime
*

*> >>(you can just take modulo a power of two).
*

*> >>
*

*> No good hash function requires a prime modulo. Most
*

*> 2-universal
*

*> multiplicative hash functions (i.e., those with an
*

*> odd multiplier --
*

*> which is what we're using now) definitely don't.
*

This is a different kind of hash. I do not claim to be

an expert on this topic, I only cite what Bob Jenkins

claims. He designed the hash to not require taking

modulo subsequently. He did not argue that it wouldn't

be better taking modulo though - but it was a good

tradeoff that would provide a fast convenient and

efficient hash. I suspect the prime nature is somehow

coded into the hash - he did a brute force search

through a large space to find the hash function

meeting his criteria. You should also weigh the cost

of modulo prime against the cost of slightly increased

level of conflicts.

BTW the version I posted was optimized for 32bit

words. But since the input of VDelta is not aligned,

the byte aligned variable length hash may be a better

starting point. (See Jenkins homepage

http://burtleburtle.net/bob/hash/doobs.html).

*> BTW, did you do any performance comparisons between
*

*> the current code and
*

*> the mix hash? I'd be very interested in seeing some
*

*> numbers. (Our
*

*> randm-test.c could probably be used for that.)
*

No. A long time ago I replaced a hash not unlike yours

with the hash in the Compiler Dragon book for use in a

symbol table. It dramatically improved performance due

to reduced conflicts. Yet the classic Dragon book hash

is not considered to be very efficient at avoiding

conflicts. This is for variable length strings and

your case is for 32bit values, so it cannot be

compared directly. It is cheap to test conflicts for

example.

If you are doing a modulo prime on the hash result,

the mix hash could well turn out to be faster because

it only requires an And operation to clamp the hash

key into range. Otherwise it clearly has more

instructions than unrolling 4 shift/add. My best guess

is that the mix hash will at the latest perform better

when conflicts become significant. However, depending

on the avarage shared run in the data, the fact that

only 4 chars are being hashed may generate the bulk of

conflicts, independently of the hash function in use.

If VDelta is using a 64K window (or whatever) you get

an efficient upper limit on the hashtable which

affects the choice of indexing method. In particular

you can oversize a hash table to reduce conflicts and

you do not need to grow the hash nor risc conflicts

with virtual memory.

Mikkel

Få den nye Yahoo! Messenger på www.yahoo.dk/messenger

Nu med webkamera, talechat, interaktive baggrunde og meget mere!

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org

For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Wed Aug 21 20:48:37 2002