[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

From: Niels Werensteijn <n.werensteijn_at_student.utwente.nl>
Date: Tue, 22 Jan 2008 16:24:13 +0100

David Glasser schreef:
> 2008/1/22 Niels Werensteijn <n.werensteijn_at_student.utwente.nl>:
>> Summary so far:
>>
>> New files, or new chunks of files have no source stream (chunk) to
>> compare it to, in order to make a diff of it. At this moment, vdelta
>> routine is used on these streams. This takes a lot of cpu power. Even
>> worse, since the diffs are compressed with zlib later on (both on disk
>> and during transmission), vdelta is actualy making it harder for zlib to
>> compress the stream, resulting in larger streams.
>>
>> This patch:
>> In this patch I replace the call to vdelta with a single call to
>> svn_txdelta__insert_op and just insert the whole chunk as new data. This
>> costs almost no cpu time, and lets zlib compress it better.
>>
>> My test results on two repositories confirmed that the on disk size
>> shrunk a little bit, and that cpu time was greatly reduced.
>>
>> My time results:
>> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
>> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds
>
> This seems like a reasonable idea, but note that we don't always send
> txdeltas compressed. Specifically, only the serialization of txdeltas
> called "svndiff1" is compressed, not "svndiff0". Some examples of
> when each can be used:
>
> * In FSFS repositories, svndiff1 is only used in repositories of DB
> format 2 and above.
>
> * In the svnserve protocol, svndiff1 is only used if the client and
> server both support it (and declare so in their header).

True. Altough there has been some speculation as to how many
clients/servers still use this. Also note that this patch is still a
preformance gain in cpu time. But if the stream is not already
compressed, we do indeed lose some size efficiency.

> I'm not sure what controls the use of svndiff1 in the DAV protocol and
> BDB repositories off the top of my head.
>
> My guess (not based on actual experimentation) is that your patch
> would be a performance regression for cases that svndiff0 is
> transmitted. If that's the case (can you test it? One way is to
> create an FSFS repository with --pre-1.4.x-compatible and compare
> sizes)
Ok I did this.

Sizes For repository 1 (using du -s, on ext3):
"protocol" without Patch With Patch
svndiff0 49720 79384
svndiff1 31920 30692

for repository 2
"protocol" without Patch With Patch
svndiff0 37416 37520
svndiff1 37416 37400

Conclusion: Yes it seems it makes a lot of difference on normal text
repositories. On repositories with compressed data it makes no real
difference.

btw: Is there some sort of official suite of benchmark repositories? I
feel a little stupid making conclusions on the basis of just 2
repositories, altough they confirm theories and logical thinking :)

>, we'd probably want to rev a bunch of functions to add a
> boolean flag "output_will_be_compressed" or something which gets
> passed all the way down to your one-line change.
Well my, admittedly simple, test does seem to support that, given that
there are still a significant number of client/servers using svndiff0.
(I don't know the policies regarding this.)

Regards,
Niels

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-01-22 16:25:03 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.