Re: [subversion-dev] Subversion design document up
From: Zack Weinberg <zack_at_wolery.cumb.org>
Date: 2000-06-03 23:03:11 CEST
On Sat, Jun 03, 2000 at 12:28:14AM -0500, Karl Fogel wrote:
I have a number of suggestions. Some of them may already be covered
There are two separate merge schemes described. One handles merges
A better arrangement would be to cause an implicit branch to occur
B --- Z
Then merge Y into Z using the branch-merge algorithm. There are
B --- Z -- M
then I can pick up Y2 using the ancestor-set algorithm to avoid
------ Z2 .... Zn
Obviously a scheme like this is going to use lots of implicit
You may or may not have noticed that this requires "working
Another reason to do working repositories immediately is that it gets
linus
... All the lower nodes being people who have maintained diffs
4.4
which doesn't convey either the exact history or the extent of the
And another reason why you want working repositories is it cuts the
Optionally, source can send back a list of nodes that destination
Creating a mirror or personal repository is just the special case
--- There isn't a lot of discussion of how data and metadata are going to be stored on disk. (One offhand mention of berkeley DB and/or a full blown SQL database - either one seems like overkill to me, frankly.) I would suggest that each node in the tree be two files. One contains metadata and is only ever appended to. Another contains the file data (if any - directory nodes don't need this) and is updated by writing a new copy and renaming it into place. You may or may not need a per-file locking protocol. Everything should be checksummed. Exercise: Pick any large CVS repository. Scan through all of the files. I guarantee you will find at least one file which has lost at least one disk block: 2^n bytes will have been overwritten with nulls. Odds are that happened so long ago that no backup records the original contents of that block. No one has noticed because no one's wanted a revision back that far - yet. For storing file data, I'd recommend looking at the "weave" algorithm used by SCCS. It stores the file as an array of lines with annotations which say whether or not a given line belongs in some revision. This has two major wins over the current+reverse diffs format used by RCS. First, all revisions take the same time to extract, and you do not have to edit the working file in memory. A single linear scan over the archive file, selecting lines and writing them out, suffices. This is particularly desirable if you have many branches. Second, it provides the concept of included and excluded deltas, which let you do many merge operations just by adding an entry to the file index - no modification to the archive is needed. The trouble with the weave algorithm is it's line oriented and therefore won't play nice with vcdiff. SCCS itself handles binary files horribly (it uuencodes them and then tries to diff the results, yuck). But SCCS' real problem with binaries is it uses \n\001 as an escape sequence. You could trivially avoid that, and with a bit more effort I bet weave could be made to work on arbitrary byte runs instead of lines. zwReceived on Sat Oct 21 14:36:05 2006 |
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.