Re: Very large rev files

From: Roy Franz <roy.lists_at_gmail.com>
Date: 2007-03-23 00:16:35 CET

On 3/21/07, Rick Yorgason <rick@ldagames.com> wrote:
> Ryan Schmidt wrote:
> > I assume this is because Subversion does not necessarily delta against
> > the immediately previous revision. See this document describing
> > "skip-deltas":
> >
> > http://svn.collab.net/repos/svn/trunk/notes/skip-deltas
>
> Okay, but wouldn't that imply that at least *some* of the revs would
> have ideal file sizes?
>
> Here's the sizes of all the revs that this file this file touches in a
> real-world repository. In most of these cases, (*all* of the cases that
> are 37/38M) the commit contains this 55M file, with 20 files that are 1M
> big, and no other files.
>
> Because of the data set these extra 20 files should also delta compress
> exceedingly well (they're 512x512 uncompressed R8G8B8A8 images,
> containing greyscale data which is mostly white, and completely white
> for the last two thirds of the file, in every revision). However, even
> if you subtract 20M from each of these revisions, almost all of them are
> still much larger than the ideal size.
>
> By the explanation provided in the notes on skip-deltas, you would
> expect there to be a wide variety of file sizes that are logarithmically
> smaller, with every odd-numbered one being ideal (with a delta that's
> less than 1M), but you clearly don't see that.
>
> The few that are closer to ideal are r2011, r2091, and r2143. If it
> matters at all, these three revisions happened when we still used
> svnserve instead of Apache (those are all dated 'Nov 1' because I dumped
> and reloaded the repository for some reason or other). All the other
> small revs are lazy copies.
>
> For interest case, revs 1 2 and 3 in my last email were created from
> revs 5682 5691 and 5700 in the real-world repository.
>
> > Administrator@bender /cygdrive/e/Server/SVN/db/revs
> > $ ls -lh `svn log -q file:///e:/Server/SVN/Hegemony/trunk/Release/Resources/Data/Greece_Full.map | grep -v '^-' | sed 's/r$[0-9]*$.*/\1/' | tr '\n' ' '`
> > -rwxrwxrwx 1 Administrators None 207M Nov 1 14:51 105
> > -rwxrwxrwx 1 Administrators None 3.6K Nov 1 14:51 1698
> > -rwxrwxrwx 1 Administrators None 37M Nov 1 14:52 1766
> > -rwxrwxrwx 1 Administrators None 40M Nov 1 14:53 1774
> > -rwxrwxrwx 1 Administrators None 37M Nov 1 14:53 1776
snipped....

I just did some experiments along these lines to see how well the
xdelta compression works for large files. (We have some large binary
files, and we want
to know what the impact on the repository size would be if we check them in.)

In general, it seemed to do an OK job, but I also see the xdelta program doing
a much better job. I was using a test repository, and adding the
file, then checking in
one (or more) updates so that I could easily tell what revision in
subversion the delta should be based upon. The first update will
always be a delta from the original
checkin if I understand the skip-delta documentation correctly.

For one of the files, xdelta -0 -p produced a 7 Mbyte delta file, while the
subversion revision was 27 Mbytes. Other revisions that I checked had
sizes that were 2-3 times as big as the xdelta output.

Rick - you should try using xdelta with the -0 option, as this
prevents zlib compression of the output. Since your data sets sound
very compressible, this could be part of what is making xdelta produce
much smaller results.

Since my test data was a big tar file, I also found that the -p option
(makes xdelta not peek inside gzip compressed data) made my deltas
much bigger. (I am comparing the subversion revision size to the
worst case output from the xdelta program.)

I also tried a few experiments with gzipped files, and found that gzip
with the --rsyncable flag did better than uncompressed by a small
margin, and that normal gzipping did much worse.

It looks like there may be (significant) room for improvement in the
binary delta handling in subversion. Do any lurking developers know
what xdelta and subersion might be doing differently?

Roy

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Mar 23 00:16:57 2007

This message: [ Message body ]
Next message: Andy Levy: "Re: Is SVN based on SCCS like CVS?"
Previous message: D.J. Heap: "Re: SSPI authentication with 1.4.3 command line client"
In reply to: Rick Yorgason: "Re: Very large rev files"
Next in thread: Rick Yorgason: "Re: Very large rev files"
Reply: Rick Yorgason: "Re: Very large rev files"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]