Optimizing files for binary diffs

From: Koert van der Veer <kvdveer_at_playlogicgames.com>
Date: Mon, 9 Mar 2009 09:55:22 +0100

For our current project, we store a lot of binary data in a subversion
repository. The file size averages at 10Mb, and contains only binary
data. With team size increasing, the daily commit size increases, and at
the moment we spend up to 15 minutes a day, just updating our working
copies.

We hope to improve this by optimizing our binary file format for the SVN
native binary diff format.

The header of the file contains a table of 200-1000 entries of 10-50
bytes each. These entries are mostly invariant, except for additions and
deletions. An average commit has 3 insertions and 2 deletions. Apart
from these insertions and deletions, relocations inside these tables
never happen (tables are sorted to guarantee that).

The bulk of the content consists of large invariant chunks (1Kb-10Mb)
that are just copied from the previous version of the file. Because of
changes in the tables and other content, these large chunks are
frequently relocated inside the file. These large chunks are interlaced
with references to the tables mentioned earlier on.

Question 1:

Is svndiff-test.c a good tool to test the diffing characteristics of our
files?

Question 2:

I'm assuming that svndiffs are used for client-to-server and
server-to-client transfers. Is this assumption correct?

Question 3:

I obviously need to fix the indices to the tables, so the indices won't
change (too much). Once I've done that, will the binary differ be able
to find the relocated binary chunks?

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1294787

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-03-10 06:21:12 CET

This message: [ Message body ]
Next message: Pulkit Kumar: "Errors with svnperms.py and svnperms.conf"
Previous message: Bolstridge, Andrew: "deleting files from the repository"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]