[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: A simple (?) suggestion from a svn fan :)

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2003-02-24 20:09:39 CET

On Monday, February 24, 2003, at 01:27 PM, Sander Striker wrote:

>> From: sussman@collab.net [mailto:sussman@collab.net]
>> Sent: Monday, February 24, 2003 8:55 PM
>
>> Alessandro Polverini <polverini@nibbles.it> writes:
>>
>>> So, I'm wondering: would it be possible to implement this behaviour
>>> in
>>> svn, when checking in files that have a certain property set:
>>> - gunzip the file
>>> - diff it with the previous (gunzipped) file
>>> - store differences (diff works well enough with xml files)
>>
>> Internally, Subversion uses a binary diff algorithm to express all
>> file differences, regardless of whether a file contains text or binary
>> data. So when you store successive versions of a binary file in a
>> Subversion repository, you *are* getting differential (compressed)
>> storage.
>
> Yes, but the size of the diff is increased tremendously because gzip
> messes it up. Try gzipping one file, make a small change to the file,
> gzip again and compare. This isn't something we can trivially solve
> I think.

Errr, actually, it's not that simple to say "the diff size is increased
tremendously".
In the case of gzip, it depends on whether the changes cause more
matches to occur within the window size.
GZIP limits match distances to 2^windowbits, and ZLIB actually limits
match distances to 2^windowbits - 262.
If we don't have more matches, the majority of the gzipped files should
look the same.

Thus, in a large original file, with small changes, a diff between the
gzip'd files shouldn't be much larger than a diff (if at all) between
the non-gzipped files.

As to how often this occurs, if the XML files in question weren't
large, why would gnumeric/whatever be gzipping them?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 24 20:10:30 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.