Hi, I wanted to summarize some discussions I've been having with some
of the SVN developers offline as well as the discussion we've started
to have here.
The issue:
Subversion management of large binary files can be very slow
The Subversion assumption:
Subversion assumes that the network bandwidth requirements of
handling large binary files is such that its more efficient to diff
the current and current-1
versions and transmit the delta, the argument being that if you're on
a slow modem connection or a flakey US-India cable you'd prefer to
deal with the
diff compute times than the time taken to transmit the large files.
Why this doesn't make sense in many situations:
1. Unfortunately a lot of binary db's don't diff very well with
incremental changes to the user input. The diffs can often be as
large as the original file
2. Often all the users are on a local network or the repositories are
mirrored between sites and the available network bandwidth is very good.
3. For large files the diffs can take an extremely long time, much
much more time than the time taken to transmit the entire file, even
under high
network load situations.
Another complication:
After talking to various developer types it seems that Subversion is
actually doing a binary diff at the client and the server end, which
is redundant.
so we have 2x the number of (time consuming) diffs we need.
The fix:
a) we need to remove the redundant diff operations that currently occur
b) one of the developers needs to profile Subversion and determine
the bottle necks under the following scenarios
i) doing an initial import of a large binary file to a fresh repository
ii) committing a new version of a large binary file to an existing
repos
this will likely throw up a list of other possible improvements.
c) On the user group list, Talden suggested that a new prop gets
added to subversion that allows users to designate files that
shouldn't be diffed
"svn:diffasnew" was his suggested keyword, and instruct both client
and server to treat the file as a complete replacement. This seems
like a
good suggestion.
I'm willing to pay a bounty for developers that are interested in
working on this. Please contact me for more details.
Subversion is my preferred tool for my software activities, I'd love
to be able to use for all my design data, but unfortunately its just
not an option
at the moment..
Received on Sun Aug 27 14:46:28 2006