On Monday, February 24, 2003, at 01:27 PM, Sander Striker wrote:
>> From: sussman@collab.net [mailto:sussman@collab.net]
>> Sent: Monday, February 24, 2003 8:55 PM
>
>> Alessandro Polverini <polverini@nibbles.it> writes:
>>
>>> So, I'm wondering: would it be possible to implement this behaviour
>>> in
>>> svn, when checking in files that have a certain property set:
>>> - gunzip the file
>>> - diff it with the previous (gunzipped) file
>>> - store differences (diff works well enough with xml files)
>>
>> Internally, Subversion uses a binary diff algorithm to express all
>> file differences, regardless of whether a file contains text or binary
>> data. So when you store successive versions of a binary file in a
>> Subversion repository, you *are* getting differential (compressed)
>> storage.
>
> Yes, but the size of the diff is increased tremendously because gzip
> messes it up. Try gzipping one file, make a small change to the file,
> gzip again and compare. This isn't something we can trivially solve
> I think.
Errr, actually, it's not that simple to say "the diff size is increased
tremendously".
In the case of gzip, it depends on whether the changes cause more
matches to occur within the window size.
GZIP limits match distances to 2^windowbits, and ZLIB actually limits
match distances to 2^windowbits - 262.
If we don't have more matches, the majority of the gzipped files should
look the same.
Thus, in a large original file, with small changes, a diff between the
gzip'd files shouldn't be much larger than a diff (if at all) between
the non-gzipped files.
As to how often this occurs, if the XML files in question weren't
large, why would gnumeric/whatever be gzipping them?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 24 20:10:30 2003