[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

A text-base penalty solution (without a working copy rewrite)

From: Chris Frost <chris_at_frostnet.net>
Date: 2007-05-24 04:40:59 CEST

Preface: this email summarizes my understanding of the text-base penalty,
introduces an (implemented, http://scord.sf.net) approach that solves the
problem now and without rewriting the working copy API, and asks for any
feedback people may have.

There has been at least a fair amount of discussion regarding the
subversion working copy's text-base penalty on this and the user lists
since at least October 2001. For what it is worth, I have tried to
compile a list of related discussions: <http://scord.sf.net/#why>.

At the high level, text-base saves network bandwidth and supports the
offline operation of several commands, while incurring an additional
100% (of working files) disk space. Subversion is certainly upfront
about this tradeoff, explicitly stating subversion's plentiful disk
space assumption, but the tradeoff does unfortunately limit
subversion's use cases. Specifically, large source code repos (e.g. when
SCO wanted to move all their source code from SCCS to subversion and
checkout all code on each developer's computer) and media repos.
Disks may grow in size, but people keep up with more and/or larger files.
(Additionally, disk bandwidth tends to increase more slowly than disk
sizes increase.) This tradeoff is certainly well understood here; I
believe that it remains because any solution appears to require a
working copy API rewrite (breaking backwards compatibility) and (to a
lesser extent?) because no single solution has shown through as the
obvious end solution (don't store any pristine files? compress them?
follow SVK and store the entire repo? and others).

I do feel that subversion has had fair discussions on this topic and
that the long-term plan will probably lead to a solid client metadata
storage plan. However, given that this issue has persisted since at
least 2001, I have a suggestion for (at least) an interim solution:

Do not modify subversion. Instead, enhance the underlying filesystem
to detect and exploit file redundancies in working copy-style layouts.

I have implemented this approach, calling it "scord" (Subversion Check Out,
Reduced Disk), and believe it is now ready for use: <http://scord.sf.net>.
I am announcing scord here, and as v0.9.0, to hopefully hear back
feedback from any interested subversion developers? I would very much
appreciate any thoughts on scord's design or its implementation! scord
is currently implemented as a userspace overlay filesystem (using FUSE)
and runs on Linux. scord is designed with the same file manipulation
techniques as libsvn_wc to avoid working copy corruption in the face of
scord or system crashes (details in HACKING).

The biggest drawback I see with a filesystem-based approach is its
cross-platform limitations. scord currently supports only Linux. With a
little work scord should be portable to Mac OS X or FreeBSD. But
subversion runs on scores of platforms that scord does not. An NFS
server or preloaded library-based solution might support more (unix)
platforms, but afaik neither is a good match with Windows. Nonetheless,
Linux (plus hopefully OS X and FreeBSD) makes up a sizable portion of
subversion's installbase.

From my list reading, it seems that many subversion developers may see
scord's use of an LGPLed library (FUSE) to be unfortunate. Though not
much of a suggestion, perhaps a future user for which this is an issue
could rewrite (the fairly thin) FUSE library.

Note that scord limits some file operations; keep these in mind in
using scord: <http://scord.sf.net/#why>. However, all but one of the
limitations exist only because the limitation simplified development
(and does not affect my scord use).

The scord release also includes a small libsvn_wc patch that enhances
libsvn_wc's file modification detection to query scord before doing a
file diff if the working file's timestamp differs from its entry record.
This is helpful for projects with many and/or large build-generated files
that are included in the repo. This patch calls a system call directly
(APR does not abstract extended attributes), so I do not expect there to
interest in importing this patch into subversion proper.

It may be interesting for scord to track the working copy's modified
file set so that libsvn_wc need not crawl the entire working copy to
do each status query, checkin, etc.

Also, I would be grateful for any porting assistance from interested
OS X or FreeBSD developers. (I believe the work will primarily involve
porting extended attribute and sub-second time resolution support.)

Lastly, while on topic of saving disk I/O and space, I wanted to say
thanks for subversion 1.4's property file improvements! My personal
motivator for scord was to store my photo album in subversion (6GB in
22k image files, plus metadata files). Cached props significantly speedup
common subversion commands. And storing only a single copy of each
working file's property file saves 90MB.

Chris Frost  |  <http://www.frostnet.net/chris/>
PGP: <http://www.frostnet.net/chris/about/pgp_key.txt>
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 24 04:41:11 2007

This is an archived mail posted to the Subversion Dev mailing list.