The revision model of Subversion
From: Bjarke Dahl Ebert <bebert_at_worldonline.dk>
Date: 2002-01-27 14:42:58 CET
Hello,
A friend of mine pointed me to the Subversion project page a couple of weeks ago.
Earlier, we had both thought about how a replacement for CVS could look, especially with respect to the data model / revision model. We even thought about beginning some "proof-of-concept" implementation of the data storage (probably what corresponds to the Subversion filesystem) and maybe also some "transport" protocol.
I was very happy to see that another related project was already (long) under way, and of course I immediately compared it to the "design" that me and my friend had been building up in our heads :-).
Therefore, in this mail I'll try to explain what we had in mind and try to compare it to Subversion.
First, let me describe our envisioned data storage model:
The data store should contain only immutable values. You cannot change a value, only store a new one. This is of course a fundamental requirement of a Revision Control System (but nevertheless, CVS doesn't provide it!).
The data objects were ment to be uniquely identifyable, even in distributed systems, simply by using a cryptographic hash algorithm (e.g., SHA-1). Then, two different processes on the network would identify the same value by the same ID (hash), even when the value was obtained by different channels or calculated in different ways. So one process receiving an aggregate from another process could say "Oh, wait, I've got that one already - don't send it". And then we thought about all kinds of applications of this: documents should really be aggregates, containing their own fonts, etc., and when sending such an aggregated document, you only have to send the fonts if the receiver don't already have them (even when named differently, or obtained from a different URL, and so on). The cryptographic hash will ensure that you've got the correct font. This would also, in general, be useful for web-caching: Don't rely on the URL, only on the SHA-1.
On top of this data storage model, one could build a revision control system. A particular revision of a versioned directory tree would just be a value in the IDO-store. Some meta-data would be needed to relate different versions to each other. We thought that each "versioned item" should correspond to a DAG of revisions (IDOs), not necessarily a linear line-of-development. In this way, a file would for example have its own DAG of revisions, independent of the different projects that it might be part of. So when you change foo/bar.c, you actually change both the independent 'bar.c', as well as 'foo'. In this way, another project containing "the same" 'bar.c' (e.g. baz/bar.c) would also be offered the possibility of updating its bar.c to the new revision (thereby updating also baz). This should of course not depend on the naming (bar.c), but on common ancestry (foo/bar.c and baz/bar.c sharing some history), at least because names can change.
Since we don't have any linear line-of-development, but only DAGs of revisions, everyone can commit new versions of any value. There's never any conflict (except when merging). If Alice and Bob both change foo.c, they can both commit it. Then we just have two different updates of foo.c (and therefore also two different updates of its containing project!), and anyone asking for an update must be asked which one of them they want. An update is actually a merge of two IDO values: Your local revision and some repository revision, and it should be possible to commit your local revision before merging. I can see that Subversion also sees an update as a special kind of merge - but why now allow the user to commit before merging - thereby even allowing someone else to actually perform the merge?
This is, I think, the biggest difference in the revision models of our envisioned "IDO Revision Control" (IDORC ;-) and Subversion. We see all changes as symmetrical (anyone can commit their change), while Subversion only allows the first lucky person to commit, the others have to merge first.
Also, in our IDO-model, we can check out foo/bar.c, change bar.c, and then commit only bar.c without committing foo. Then the new bar.c would be a "dangling revision" without a name (path) in any repository root. I.e., you cannot name the revision without including it in some project, but of course it has some ID in the system (e.g., its hash), and the GUI can offer to update your baz/bar.c because it can see that the new one has baz/bar.c as ancestor.
To conclude my comparison, I would like to hear the opinion on the following:
Could anything like what we were planning be "emulated" by the Subversion model?
If so, then it might also enable me to do something like the above commit of foo/bar.c without committing foo?
Will there by any client command that enables me to ask "what branches/updates of foo/bar.c exist (in other projects)?" - like we had in mind for IDO-rc?
If it is possible to emulate "IDO-rc" using Subversion, I can see a clear advantage over IDO-rc: Every revision will have a name in some revision of the repository. In IDO-rc, one could imaging a nightmare identifying the different "dangling revisions", and managing multiple updates to the same item.
Finally, I want to say that after discovering Subversion, I have put my private IDO-rc project in the background, because Subversion is a giant step in the right direction. If possible, I would like to contribute to the project. This *has* to succeed. A replacement for CVS is long over-due :-)
Kind regards,
|
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.