[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Delta performance

From: Bryan O'Sullivan <bos_at_serpentine.com>
Date: 2000-10-07 03:30:58 CEST

k> We know, for sure, that we can do better than that anyway. Diff
k> format includes data deleted from the src string, therefore it
k> includes more data than we want. Gzipping makes it smaller, but
k> then we could gzip whatever we produce too

On the other hand, diff has the advantage of being a familiar format,
which may make it somewhat easier for people to read or wallop if they
need to.

As far as gzipping is concerned, I recently had occasion to pay
attention to the file format of a repository for a project I'm working
on. I have my own seriously bummed binary format for the repository,
along with a textual representation that contains considerably more
redundancy. The binary version of a large repository is considerably
smaller than the text version (about 20% of the size), but if I gzip
both files, the *difference* between the compressed file sizes drops
to less than 20%.

In other words, gzip is good at eliding redundant data. I don't think
the fact that a diff contains unnecessary information will affect its
overall size significantly.

(Where the real difference showed up between the two file formats was
in read/write performance, in which the binary format had the
advantage by almost an order of magnitude. I don't know if this is
applicable to the SVN diff format.)

k> Diff performs unacceptably on binary data.

This is definitely a more compelling argument for some kinds of binary
data.

        <b
Received on Sat Oct 21 14:36:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.