[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Binary differencing performs poorly (erractically) on very large text file

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Fri, 15 Feb 2008 19:50:36 -0600

Karl Fogel wrote:
> Raman Gupta <rocketraman_at_fastmail.fm> writes:
>>> I have a large text file (around 47 MB) which is a database dump
>>> (created by msqldump). I periodically commit it to an SVN repo.
>>> Sometimes the binary differencing works just fine and I get a small
>>> sized revision in the repo. Other times I get a "full" sized revision
>>> in the repo, that is revision that is compressed, but essentially the
>>> same size I get when committing the file to a virgin repo.
>>>
>>> Doing a "diff" on the client side files always generates a "relatively"
>>> small set of differences.
>> First, why are you talking about "binary" differencing if this is a
>> text file? See the FAQ entry at [1] although for the purposes of this
>> question I don't think it really matters.
>
> No, Charles is right to use that term -- the binary differencing (that
> the repository uses to store revisions) doesn't know or care whether a
> file is text: it just takes differences on the raw bits. He's not
> talking about "svn diff", he's talking about repository storage.

I think the cause of the large diffs is our delta windowing scheme, well
explained in this thread:
http://svn.haxx.se/dev/archive-2006-12/0158.shtml

To make a long story short, Subversion uses a fixed window size[1] when
storing deltas in the repository. If a file is larger than this size,
even if the overall changes to the file are small, the change could
touch several different windows, creating a larger delta.

For example, imagine adding 100 bytes at the start of a 50 MB file. The
first window, grows by 100 bytes, and pushes its last 100 bytes to the
next window. The next window's delta is then expressed by adding these
100 bytes, and losing 100 bytes from the end of it, etc. Thus, a fairly
small change, can actually create large-ish deltas in the repository.

-Hyrum

[1] I think the current window size is 100k, so files larger than this
will be more likely the see this behavior (the larger, the more so.)

Received on 2008-02-16 02:51:25 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.