[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Further improve xdelta

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2006-03-27 22:07:17 CEST

Justin Erenkrantz writes:
> On 3/23/06, Peter N. Lundblad <peter@famlundblad.se> wrote:
> > Nope, but that's not in APR 0.9 anyway AFAICT.
>
> I would really like to avoid adding a custom data structure if we
> don't have to do so. If it means that only users with APR 1.0+ see
> the performance benefit, so be it as it means that the code stays much
> simpler. -- justin
>

OK, folks. I've been fooling around with this some more the last days.
I tried Justin's suggestion of a custom hash function and it gave less
than half the speed improvement.

Then, I discovered an interesting fact about the adler32 checksum
algorithm (*). In my initial tests, I used big .wav files, which
turns out to be pretty "random". I decided to try some more realistic
data and used uncompressed .tar files of source code (svn 1.2.0, 1.2.3
and 1.3.0). This shows a dramatic slowdown compared with the original
code.

The problem is that the adler32 checksum doesn't make use of all bits
for small block sizes. Some tests I did indicate that the 11th to
16th bit of the checksum are not well distributed and since my first
hash function typically used the 12 low-order bits, this didn't work
out very well.

I've cured this using the following formula:
h = sum ^ (sum >> 12)
to include some higher-order bits in the mess.

The end result is that my patch gives about 15% speed improvements for
some differnt kinds of test data and Justin's suggestion gives us 5%.
(To be fair, I changed my implementation of Justin's suggestion to use
the same hash function that I changed to, but that lead to an
improvement of less than 1%.)

I think the small amount of increased complexity is justified by the
performance win, so I'm going to commit this unless someone
objects.

Thanks,
//Peter

*) I don't claim to be the first who discover this fact. See RFC 3309
 for an explanation.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 27 22:08:08 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.