[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Optimizing files for binary diffs

From: Koert van der Veer <kvdveer_at_playlogicgames.com>
Date: Mon, 9 Mar 2009 09:55:22 +0100

For our current project, we store a lot of binary data in a subversion
repository. The file size averages at 10Mb, and contains only binary
data. With team size increasing, the daily commit size increases, and at
the moment we spend up to 15 minutes a day, just updating our working

We hope to improve this by optimizing our binary file format for the SVN
native binary diff format.


The header of the file contains a table of 200-1000 entries of 10-50
bytes each. These entries are mostly invariant, except for additions and
deletions. An average commit has 3 insertions and 2 deletions. Apart
from these insertions and deletions, relocations inside these tables
never happen (tables are sorted to guarantee that).

The bulk of the content consists of large invariant chunks (1Kb-10Mb)
that are just copied from the previous version of the file. Because of
changes in the tables and other content, these large chunks are
frequently relocated inside the file. These large chunks are interlaced
with references to the tables mentioned earlier on.


Question 1:

Is svndiff-test.c a good tool to test the diffing characteristics of our


Question 2:

I'm assuming that svndiffs are used for client-to-server and
server-to-client transfers. Is this assumption correct?


Question 3:

I obviously need to fix the indices to the tables, so the indices won't
change (too much). Once I've done that, will the binary differ be able
to find the relocated binary chunks?


To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-03-10 06:21:12 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.