[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Binary differencing performs poorly (erractically) on very large text file

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Fri, 15 Feb 2008 19:38:54 -0500

Raman Gupta <rocketraman_at_fastmail.fm> writes:
>> I have a large text file (around 47 MB) which is a database dump
>> (created by msqldump). I periodically commit it to an SVN repo.
>> Sometimes the binary differencing works just fine and I get a small
>> sized revision in the repo. Other times I get a "full" sized revision
>> in the repo, that is revision that is compressed, but essentially the
>> same size I get when committing the file to a virgin repo.
>>
>> Doing a "diff" on the client side files always generates a "relatively"
>> small set of differences.
>
> First, why are you talking about "binary" differencing if this is a
> text file? See the FAQ entry at [1] although for the purposes of this
> question I don't think it really matters.

No, Charles is right to use that term -- the binary differencing (that
the repository uses to store revisions) doesn't know or care whether a
file is text: it just takes differences on the raw bits. He's not
talking about "svn diff", he's talking about repository storage.

> Second, FSFS uses a skip-delta approach when storing diffs in the
> repository. This is at least partly to improve performance for files
> whose revision histories are long. This is probably the cause of what
> you are seeing as "inconsistent" diff sizes. See [2].
>
> [2] http://svn.collab.net/repos/svn/trunk/notes/skip-deltas

I think you're probably right, though I wouldn't swear to it (I
haven't investigated closely).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-16 01:39:14 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.