[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: large binary data repositories with many files

From: Jared Hardy <jhardy_at_highimpactgames.com>
Date: 2006-04-12 03:53:23 CEST

ListMan wrote:
> raw file size = 2.1G
> # files(dirs) = 5331(2061)
> unix copy = 6m
> commit = 68mins
> repos size = 1.2G
> workArea size = 4.2G
> # files(dirs) = 44011(22671)

        Do these stats correspond to a commit of the whole repository from
zero, or the size changes from a single isolated commit? I'm having
trouble reading the context.
        We have been working with multiple large binary files in our repository
(multimedia project) for just short of a year. Commits are relatively
fast, even over https connections. The working copies are big (I think
it's storing two copies of each file -- one for editing and one for
working copy base comparison, for diff and revert operations), but disk
is cheap. Extra workstation disk is much cheaper than high-speed network
connections and components, and other server parts.
        We are at about 7000 revisions, with 3 branches, yet the whole
repository store (8GB) is still smaller than the working copy area (9GB,
). I think Subversion made the right trade-off here. Though maybe they
should use FSFS in the working copy cache? :)
        The first optimization to go for is always client disk speed -- under
Windows we turn off "Fast Indexing" and automatic virus scanning for WC
directories. Frequent Defragment Drive passes also help with NTFS.
> are there certain binary characteristics of files that lend themselves
> to efficient diff generation?

From what little I know of xdelta, I would say always avoid storing
compressed files in the repository. Raw formats like bmp are bigger to
start from, but deltas between multiple versions of the same bmp image
are much smaller than detlas between jpg's of the same image. Same goes
for aif vs mp3. I think there are even merge tools out there for raw
image formats like BMP (http://www.ionforge.com/products/imagediff/).
Definitely avoid checking in zip or gz files -- just keep anything in
the repository expanded/raw. Compression of any form should be
considered a "build" step that happens outside the repository store.
        When you have the choice between a binary format and a text one
(usually XML, like AI or PDF vs SVG, or FBX vs COLLADA), go for the text
one when you can. Text diffs are easier, and you open the option for
future merge capabilities, even if no specific merge tool *currently*
exists.

        - Jared

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Apr 12 03:54:37 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.