[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Size of revs file when deleting lines in a big text file

From: Martin Scharrer <mailinglists_at_madmarty.de>
Date: 2006-12-12 01:38:17 CET

Hi Malcolm,

thanks for the explanation in this detail. This makes most things much clearer
for me.

On Monday 11 December 2006 21:33, Malcolm Rowe wrote:
> You can see this if you remove the calls to the other_revs function
> (there's no change in the output) or if you initially commit your
> 'short' file and then continue 'complete', 'short', ... (you'll see then
> that the _added_ files now have larger revisions: everything is now
> delta'd against the short file).
Yes, I checked that and that it's what happens. I also made a script with
three slightly different files which resulting in different revs sizes
depending on the

> In order that it can deal with arbitrarily large files, Subversion's
> delta algorithm deals with each source file as a series of windows, each
> of 100k. The delta algorithm reads 100k of source data and 100k of
> target data and constructs the delta.
>
> [...]
>
> You have a 'short' file that's 5451960 bytes, or 54 100k windows. The
> first 27 windows will be simple copies (about 500 bytes in total), and
> the second 27 will be dominated by the 9105 bytes that were brought
> forward from the next source window. 27 x 9105 is 245835 bytes,
> slightly below what I see for each revision's size in an svndiff0
> repository.
>
> How can we improve this? The best way would be to increase the delta
> window size significantly, though we have some backward-compatibility
> issues (and probably memory constraints) to work around in order to do
> that. Even so, unless we were able to change the delta logic so that we
> use variable-sized windows and resynchronise where possible, we're always
> going to take a hit on this kind of change with a window-oriented delta
> algorithm.
Ok I understand the operation of the window-oriented delta algorithm now and
there need for binary files. But it is not possible to fall-back on a
diff-like patch in the case of small changes of a large text file, or is this
not compatible with the overall delta-format etc.?

> > I now wrote a test script in perl so you can reproduce this easily. The
> > script generates a test repository and a testfile and then makes a couple
> > of check-ins and prints the resulting sizes
>
> Thank you _very_ much for giving us a reproduction recipe: it made it
> significantly easier to see what was going on.
You are welcome. I know myself how hard it is to reproduce something like this
appropriate without a script or similar.

Thanks a lot,
Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 12 01:38:42 2006

This is an archived mail posted to the Subversion Dev mailing list.