[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

A forward-thinking thought about deltas in FS design

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2003-07-05 10:40:19 CEST

Tonight I had a fascinating thought. We've always taken it as
axiomatic that the most recent rev of a file should be stored in plain
text, and earlier revs should be stored as deltas against it, so that
checkouts of the most recent rev are fast. That's what CVS does (on
the mainline, anyway), and it was certainly important back when we had
no delta combiner and no skip-deltas.

But maybe that's not necessary, with those advanced techniques. Maybe
we should just store new revs as deltas against older revs, using the
skip-delta technique to ensure that you only need to go through
O(lg(n)) deltas to get to any revision. (The simplest application of
the technique is: rev 0 is plain-text or self-compressed; for any
other rev, flip the least significant 1 bit in the revnum and store as
a delta against the resulting rev.)

Pros:

  * Beautiful, sweet simplicity. All data entered into the database
    stays there for all time (modulo "svn obliterate"), and is never
    replaced. There is no post-commit step where you go back and
    redeltify stuff. This simplicity is particularly interesting if
    you're trying to write an FS back end which doesn't use a
    database--for instance (not necessarily the best idea), you could
    imagine implementing the FS as a single file which is only
    appended to.

  * Commits becomes faster, though our commits are already pretty fast
    in theory. Commit performance is also more predictable for large
    files. (You don't have this redelfication step which might be
    slow because some earlier rev of the file was big.)

Cons:
 
  * Checkouts become slower--not by a huge factor, but maybe by a
    noticeable constant factor. For most repositories, checkouts are
    much more common than checkins, so this is a big deal. (However,
    it's easier to make a simpler system perform well, so the tradeoff
    may not be so obvious.)

  * Balancing space and time is harder for the all-important first few
    revs. But I think it's not too bad for normal workloads.

I'm not seriously suggesting that anyone run out and try implementing
this (although it would probably be quite easy, and performance
numbers would be intriguing), especially since it is not much more
painful to try after 1.0. But it's an interesting idea and I wanted
to get it into the list archives for future reference. (In
particular, I think the benefits may be highly compelling for
libsvn_fs_fs, if that ever happens.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 5 10:41:11 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.