Re: Whither delta combiner

From: Zack Weinberg <zack_at_codesourcery.com>
Date: 2002-02-12 08:43:20 CET

On Mon, Feb 11, 2002 at 09:37:18PM -0500, Scott Lenser wrote:
> > I suspect we'll want to implement weave storage of revisions
> > eventually, but that's a much bigger job than will fit just now.
>
> For those of us that don't know what weave storage is, could one of the
> gurus please enlighten us with a brief description or a link to more
> details?

It's the format used by SCCS (hence Bitkeeper) and I think ClearCase
for storage of text files. The basic principle is that instead of
storing fulltext of the latest revision and a chain of deltas, you
"weave" all the revisions together. There is embedded control
information which tells you how to unpack any given revision. It
looks something like this (the lines with a leading ^ are control
information; obviously there is an escape mechanism).

^I 2
line added in rev 2
^E 2
^I 1
^D 3
line added in rev 1 and taken out in rev 3
^E 3
line added in rev 1
^E 1

Elsewhere is information which tells the system that the text of
revision 1 belongs in revision 2 as well. That part gets hairy.
I do not understand the complete details and I haven't been able
to find online documentation of the algorithm. I do know why it
is a good idea: it has two primary advantages over fulltext plus
deltas. First, no matter how complicated the file history gets,
any revision at all can be extracted in one linear pass over the
storage file. Second, the graph structure of the history can be
manipulated without any modification to the weave text; only the
higher-level metadata needs to change. For instance, when files
have been edited by two people simultaneously, their changes can
both be committed to micro-branches created on the fly. If they
do not conflict (as calculated by diff3) they can be merged with
only another metadata entry. The example here is line oriented,
but there's nothing preventing it from working byte by byte.

There are disadvantages as well. The most obvious is that the time to
extract any revision grows with the number of revisions already
stored. I suspect this of being a nonissue; the algorithm should be
dominated by I/O cost, and in a highly branched environment RCS style
deltas wind up being more expensive anyway.

There's a paper by Marc J. Rochkind on SCCS which should detail the
algorithm: IEEE Trans. on Software Engineering, SE-1(4):364--370,
December 1975. It doesn't seem to be online anywhere.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:06 2006

This message: [ Message body ]
Next message: Jay Freeman \(saurik\): "debugging from test logs (was: [PATCH] Win32 File Permission Fixes)"
Previous message: Greg Stein: "Re: [PATCH]: Revised cvs2svn.py patches"
In reply to: Scott Lenser: "Re: Whither delta combiner"
Next in thread: Zack Weinberg: "Re: Whither delta combiner"
Reply: Zack Weinberg: "Re: Whither delta combiner"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]