Tonight I had a fascinating thought. We've always taken it as
axiomatic that the most recent rev of a file should be stored in plain
text, and earlier revs should be stored as deltas against it, so that
checkouts of the most recent rev are fast. That's what CVS does (on
the mainline, anyway), and it was certainly important back when we had
no delta combiner and no skip-deltas.
But maybe that's not necessary, with those advanced techniques. Maybe
we should just store new revs as deltas against older revs, using the
skip-delta technique to ensure that you only need to go through
O(lg(n)) deltas to get to any revision. (The simplest application of
the technique is: rev 0 is plain-text or self-compressed; for any
other rev, flip the least significant 1 bit in the revnum and store as
a delta against the resulting rev.)
Pros:
* Beautiful, sweet simplicity. All data entered into the database
stays there for all time (modulo "svn obliterate"), and is never
replaced. There is no post-commit step where you go back and
redeltify stuff. This simplicity is particularly interesting if
you're trying to write an FS back end which doesn't use a
database--for instance (not necessarily the best idea), you could
imagine implementing the FS as a single file which is only
appended to.
* Commits becomes faster, though our commits are already pretty fast
in theory. Commit performance is also more predictable for large
files. (You don't have this redelfication step which might be
slow because some earlier rev of the file was big.)
Cons:
* Checkouts become slower--not by a huge factor, but maybe by a
noticeable constant factor. For most repositories, checkouts are
much more common than checkins, so this is a big deal. (However,
it's easier to make a simpler system perform well, so the tradeoff
may not be so obvious.)
* Balancing space and time is harder for the all-important first few
revs. But I think it's not too bad for normal workloads.
I'm not seriously suggesting that anyone run out and try implementing
this (although it would probably be quite easy, and performance
numbers would be intriguing), especially since it is not much more
painful to try after 1.0. But it's an interesting idea and I wanted
to get it into the list archives for future reference. (In
particular, I think the benefits may be highly compelling for
libsvn_fs_fs, if that ever happens.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 5 10:41:11 2003