[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Further diff optimization ideas

From: Branko ─îibej <brane_at_e-reka.si>
Date: Tue, 15 Feb 2011 09:55:28 +0100

On 15.02.2011 01:42, Johan Corveleyn wrote:
> 2) Faster hash function for hashing lines. [ Quick win ]
>
> - Currently, we use adler32 (after the line has been normalized).
>
> - GNU diff uses some simple bit shifting scheme, which seems to be a
> lot faster (maybe it has more hash-collisions, but that doesn't seem
> to matter that much).

Is it (x << 5) + x? If so, that's the (in)famous "times 33" hash
function, also see the essay in apr_hashfunc_default in
http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?view=markup

I prefer APR's implementation, because a reasonably good compiler these
days /should/ be able to optimize (x * 33) to ((x << 5) + x), depending
on the characteristics of the target architecture.

In other news, why do we need our own hash function anyway? Or our own
table or tree or whatever? NIH syndrome?

-- Brane
Received on 2011-02-15 09:56:12 CET

This is an archived mail posted to the Subversion Dev mailing list.