[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Best practise for long term repository size management

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2003-05-14 06:53:51 CEST

On Tuesday, May 13, 2003, at 10:06 PM, Daniel Patterson wrote:

> On Wed, 2003-05-14 at 11:54, cmpilato@collab.net wrote:
>> Switch to RTF instead of native Word docs? At least you have a
>> fighting chance of worthwhile deltification. :-)
>
> *sigh* that's kind of what I figured. Has anyone investigated using
> xdelta/xdelta2 to do binary diffs (although I'm not sure that it'd help
> for most binary formats)....

It won't really help for any formats.
Our encoding is a vdelta algorithm output into basically a VCDIFF
subset.

In fact, xdelta3 is using an xdelta style algorithm that will do better
(but i couldn't imagine more than maybe 10% better), but be slower, and
they output into a real VCDIFF based encoding, which will be a bit more
compact.

There is an issue to track svndiff version 1 that i wrote, which made
up just about all of the difference by doing the VCDIFF style address
encoding and range encoding compression of the strings data (It's been
so long, i might not remember the exact details of what we are range
encoding anymore). The remaining VCDIFF encoding pieces aren't worth
the cost for our purposes, but they will complexify our code incredibly.

Algorithmically, you aren't likely to get more than 10% smaller diffs
using the xdelta algorithm. I remember running all kinds of tests
against it and vdelta when working on svndiff 1.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 14 06:54:42 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.