[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Autoexpanding ZIP archives?

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2005-12-07 18:26:10 CET

On Wed, 2005-12-07 at 10:17 -0600, Ben Collins-Sussman wrote:
> On 12/7/05, Hadmut Danisch <hadmut@danisch.de> wrote:
>
> > If I have a presentation with 10 MB of graphics, and enter just a
> > single word, SVN wastes another 10 MB just for every little change.
>
> I think you're exaggerating, this is the extreme worst case it could
> possibly ever be. It's what CVS does, in fact; it doesn't try to
> examine differences between binary files, it just stores each one in
> full.

The compression result of two similar plaintexts doesn't generally look
similar after the first difference, and sometimes before the first
difference. So I would expect at least a 5MB delta on average; quite
likely there would be some changed metadata at the beginning which would
make the entire new zipfile dissimilar to the old one.

Still, Hadmut, just because it's possible to do better here doesn't mean
it's worthwhile. How would you feel if Subversion corrupted your data
because it went through an obscure, poorly-tested code path to optimize
storing it? Some version control systems are moving in the direction of
not delta-compressing data at all, because disk space is cheap and code
complexity is expensive. I think that's too extreme (and git, the most
widely-used system which initially did this, has to the best of my
knowledge backtracked and started doing delta compression), but
"optimize every use case and damn the code size" is not the best
philosophy.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 7 18:30:16 2005

This is an archived mail posted to the Subversion Dev mailing list.