[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Binary diffs: real-world differencing?

From: Daniel Griscom <griscom_at_suitable.com>
Date: 2006-02-03 19:08:49 CET

Using xdelta3, I've done some simple tests on differencing various
types of binary files. The results so far:

- A 45kB MSWord document compressed quite well (three different
trivial changes each resulted in <2kB of diff).

- An 85kB JPEG file didn't compress at all, even when I saved at 100%
quality, opened the copy, saved THAT at 100% quality, and then
compared the first and second copies.

- An 8kB GIF file also didn't compress at all.

- Adding a character to a text file in a ZIP archive compressed
badly, with a diff almost as large as the archive itself.

- A 500kB ZIP archive of 19 files, compared with an archive of 18 of
the same 19 files, compressed extremely well, yielding a diff even
smaller than a ZIP archive of the removed file.

- Adding a single character to a string of a 64kB executable
compressed very well, giving a diff file of a few hundred bytes.
Exchanging two lines of source code had the same results.

- Adding three pixels to a multi-layer 2MB Photoshop document (no
compression, I believe) compressed fairly well; the delta was about

So, I'll take a stab at generalizing what happens when you change
various types of binary file (again, with xdelta3):

- Image files that use compression will difference badly or not at all

- Image files that don't use compression will difference fairly well

- Compiled executables may difference well (the tested changes
differenced very well, but others may not do as well)

- MSWord documents will difference well

- ZIP archives with changes to most/all their files will difference badly

- ZIP archives with a small subset of their files changed or
added/removed will difference very well

Any other thoughts?


Daniel T. Griscom             griscom@suitable.com
Suitable Systems              http://www.suitable.com/
1 Centre Street, Suite 204    (781) 665-0053
Wakefield, MA  01880-2400
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Feb 3 19:25:53 2006

This is an archived mail posted to the Subversion Users mailing list.