The revision model of Subversion

From: Bjarke Dahl Ebert <bebert_at_worldonline.dk>
Date: 2002-01-27 14:42:58 CET

Hello,

A friend of mine pointed me to the Subversion project page a couple of weeks ago.

Earlier, we had both thought about how a replacement for CVS could look, especially with respect to the data model / revision model. We even thought about beginning some "proof-of-concept" implementation of the data storage (probably what corresponds to the Subversion filesystem) and maybe also some "transport" protocol.

I was very happy to see that another related project was already (long) under way, and of course I immediately compared it to the "design" that me and my friend had been building up in our heads :-).

Therefore, in this mail I'll try to explain what we had in mind and try to compare it to Subversion.

First, let me describe our envisioned data storage model:

The data store should contain only immutable values. You cannot change a value, only store a new one. This is of course a fundamental requirement of a Revision Control System (but nevertheless, CVS doesn't provide it!).
"Values" are meant to be very general/abstract items - not necesserily representing files or directories. We thougt about having only two kinds of primitive values: (1) byte arrays ("raw data"), and (2) aggregates/compositions, which should be a dictionary from name to value (list of name-value pairs). This is an inductive definition: The aggregates contain values themselves, thus forming trees.
This model was thought to be general enough to represent files and directories (and they could also capture the Subversion notion of "property lists"). A third datatype, tuple of values, could either be a primitive type in its own right, or be emulated by aggregates (just mapping "0", "1", "2", etc. to the corresponding values).
We even had a name for this storage model: IDO Store (Immutable Data Objects) :-). And we even though about support for mutable data by the use of a designated root tree and what Subversion calls the bubble up method. And we also planned using Berkeley DB as low-level storage. This was before we discovered Subversion, and it seems that our planned IDO-Store would be similar to Subversion-FS in many ways :-).

The data objects were ment to be uniquely identifyable, even in distributed systems, simply by using a cryptographic hash algorithm (e.g., SHA-1). Then, two different processes on the network would identify the same value by the same ID (hash), even when the value was obtained by different channels or calculated in different ways. So one process receiving an aggregate from another process could say "Oh, wait, I've got that one already - don't send it". And then we thought about all kinds of applications of this: documents should really be aggregates, containing their own fonts, etc., and when sending such an aggregated document, you only have to send the fonts if the receiver don't already have them (even when named differently, or obtained from a different URL, and so on). The cryptographic hash will ensure that you've got the correct font. This would also, in general, be useful for web-caching: Don't rely on the URL, only on the SHA-1.
Oh well, enough of this IDO-store and IDO-transport.

On top of this data storage model, one could build a revision control system. A particular revision of a versioned directory tree would just be a value in the IDO-store. Some meta-data would be needed to relate different versions to each other. We thought that each "versioned item" should correspond to a DAG of revisions (IDOs), not necessarily a linear line-of-development. In this way, a file would for example have its own DAG of revisions, independent of the different projects that it might be part of. So when you change foo/bar.c, you actually change both the independent 'bar.c', as well as 'foo'. In this way, another project containing "the same" 'bar.c' (e.g. baz/bar.c) would also be offered the possibility of updating its bar.c to the new revision (thereby updating also baz). This should of course not depend on the naming (bar.c), but on common ancestry (foo/bar.c and baz/bar.c sharing some history), at least because names can change.

Since we don't have any linear line-of-development, but only DAGs of revisions, everyone can commit new versions of any value. There's never any conflict (except when merging). If Alice and Bob both change foo.c, they can both commit it. Then we just have two different updates of foo.c (and therefore also two different updates of its containing project!), and anyone asking for an update must be asked which one of them they want. An update is actually a merge of two IDO values: Your local revision and some repository revision, and it should be possible to commit your local revision before merging. I can see that Subversion also sees an update as a special kind of merge - but why now allow the user to commit before merging - thereby even allowing someone else to actually perform the merge?

This is, I think, the biggest difference in the revision models of our envisioned "IDO Revision Control" (IDORC ;-) and Subversion. We see all changes as symmetrical (anyone can commit their change), while Subversion only allows the first lucky person to commit, the others have to merge first.

Also, in our IDO-model, we can check out foo/bar.c, change bar.c, and then commit only bar.c without committing foo. Then the new bar.c would be a "dangling revision" without a name (path) in any repository root. I.e., you cannot name the revision without including it in some project, but of course it has some ID in the system (e.g., its hash), and the GUI can offer to update your baz/bar.c because it can see that the new one has baz/bar.c as ancestor.

To conclude my comparison, I would like to hear the opinion on the following:

Could anything like what we were planning be "emulated" by the Subversion model?
For example, if I check out MyProject and make some changes, and then were not allowed to commit it because someone else committed before me, could I then commit to some other part of the repository (e.g. /localrevisions/users/MyProject)?
Is this something that the planned "svn switch" can be used for?

If so, then it might also enable me to do something like the above commit of foo/bar.c without committing foo?

Will there by any client command that enables me to ask "what branches/updates of foo/bar.c exist (in other projects)?" - like we had in mind for IDO-rc?

If it is possible to emulate "IDO-rc" using Subversion, I can see a clear advantage over IDO-rc: Every revision will have a name in some revision of the repository. In IDO-rc, one could imaging a nightmare identifying the different "dangling revisions", and managing multiple updates to the same item.

Finally, I want to say that after discovering Subversion, I have put my private IDO-rc project in the background, because Subversion is a giant step in the right direction. If possible, I would like to contribute to the project. This *has* to succeed. A replacement for CVS is long over-due :-)

Kind regards,
Bjarke Ebert from Denmark
Received on Sat Oct 21 14:37:00 2006

This message: [ Message body ]
Next message: Marcus Comstedt: "RE: First impressions..."
Previous message: Barry Scott: "RE: First impressions..."
Next in thread: Karl Fogel: "Re: The revision model of Subversion"
Reply: Karl Fogel: "Re: The revision model of Subversion"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]