[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Binary diffs: real-world differencing?

From: Konrad Rosenbaum <konrad_at_silmor.de>
Date: 2006-02-04 14:10:27 CET

On Friday 03 February 2006 19:08, Daniel Griscom wrote:
> Using xdelta3, I've done some simple tests on differencing various
> types of binary files. The results so far:
> - A 45kB MSWord document compressed quite well (three different
> trivial changes each resulted in <2kB of diff).

logical: MSWord stores blocks of data in distinct streams, the data you did
not change should yield identical streams.

> - An 85kB JPEG file didn't compress at all, even when I saved at 100%
> quality, opened the copy, saved THAT at 100% quality, and then
> compared the first and second copies.

either your program did not save at 100% or you used two different JPEG
implementations for this.

> - An 8kB GIF file also didn't compress at all.

What did you change? GIF compresses each line separately, so if each line
changes then the whole file changes.

> - Adding a character to a text file in a ZIP archive compressed
> badly, with a diff almost as large as the archive itself.

If it was the only file in there: this should be about correct - the
compression state engine changes at the changed character and yields a
different stream from there on.

> - A 500kB ZIP archive of 19 files, compared with an archive of 18 of
> the same 19 files, compressed extremely well, yielding a diff even
> smaller than a ZIP archive of the removed file.

Also very logical: ZIP compresses each file independently. The overhead in a
ZIP file is quite big, so it is possible for the diff to be much smaller
than a ZIP of the added file.

> - Adding a single character to a string of a 64kB executable
> compressed very well, giving a diff file of a few hundred bytes.
> Exchanging two lines of source code had the same results.

Also very clear.


  • application/pgp-signature attachment: stored
Received on Sat Feb 4 14:14:37 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.