Re: Further diff optimization ideas

From: Branko ÄŒibej <brane_at_e-reka.si>
Date: Tue, 15 Feb 2011 09:55:28 +0100

On 15.02.2011 01:42, Johan Corveleyn wrote:
> 2) Faster hash function for hashing lines. [ Quick win ]
>
> - Currently, we use adler32 (after the line has been normalized).
>
> - GNU diff uses some simple bit shifting scheme, which seems to be a
> lot faster (maybe it has more hash-collisions, but that doesn't seem
> to matter that much).

Is it (x << 5) + x? If so, that's the (in)famous "times 33" hash
function, also see the essay in apr_hashfunc_default in
http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?view=markup

I prefer APR's implementation, because a reasonably good compiler these
days /should/ be able to optimize (x * 33) to ((x << 5) + x), depending
on the characteristics of the target architecture.

In other news, why do we need our own hash function anyway? Or our own
table or tree or whatever? NIH syndrome?

-- Brane
Received on 2011-02-15 09:56:12 CET

This message: [ Message body ]
Next message: Daniel Becroft: "Re: Patches pending review"
Previous message: Noorul Islam K M: "[PATCH] Fix for issue 3799 - exporting file should not overwrite"
In reply to: Johan Corveleyn: "Further diff optimization ideas"
Next in thread: Johan Corveleyn: "Re: Further diff optimization ideas"
Reply: Johan Corveleyn: "Re: Further diff optimization ideas"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]