Subversion features queries (long!)

From: Stephen Warren <swarren_at_paramanet.com>
Date: 2004-03-30 01:19:56 CEST

Hi.

I've just installed 1.0.1 and been very impressed - simple to setup and looks rock solid. Very nicely done!

Now, I have some questions on the way to do things in Subversion, coming from a CCM/Continuus/CMSynergy background (and StarTeam and Visual Sourcesafe).

Symbolic links.

I want to create a work-area (working copy) using symlinks instead of copying the files. Note that I'm *not* talking about symlink storage in the repository itself (although that would be very useful too)

The background is as follows: The repository is huge (even a copy of the trunk that is, not just because there are a large number of repository versions or branches). So huge, that giving each developer a copy of the part of the repository they need access to is prohibitive, even with today's cheap disks.

Continuus has a system called link-based work-areas. Your working copy consists of the normal directory structure. However, for each file in those directories, you get a symbolic link instead. This symlink points to a centralized (probably network-shared) directory that can potentially have a copy of any revision of any file in the repository. This centralized store is called a cache, and is periodically purged of files that no work area includes a copy of.

Is anything like this planned for Subversion? I believe it would not affect the repository itself too much - just whatever part of the client API performs working copy management (create symlink instead of copy file data & "svn status" etc. should expect symlinks instead of unmodified data)

Of course, the repository would probably have to grow a property giving the location of the cache directory, and there would have to be some server-side process to copy data into the cache, and somehow clean the cache of unused files periodically.

Note that the cache directory is not writable by users at all - just the CCM daemons. Also note, that CCM actually models the user's working copies as objects in its database, so it always has metadata indicating which versions of which files are in which working copies, so it knows when it's safe to delete files from the cache.

CCM users will either over-write the symlink with new file data, or the "checkout" operation will do this (note: CCM has a reserved-checkout style model)

Pre-emptive tinderbox.

Tinderbox, as I've seen it implemented on CVS, Perforce and I believe also Subversion has some limitations relative to our implementation on CCM.

With a CVS-style SCM, tinderbox will work in one of two ways:

Developers check in directly to the trunk. Periodically (e.g. hourly, triggered by commit, in a loop), tinderbox will check out the trunk and attempt to build it.

Advantage: Tinderbox quickly catches build/test failures (assuming builds/tests are quick). Simple developer model (no need to know about anything beyond regular SCM usage)

Disadvantage: When a developer checks in something broken, any other developer who simply picks up the head revision of the trunk will pick up the breakage. People have to be pretty careful about what they check in to avoid pissing off other developers and/or developers have to watch tinderbox and only update to revisions that tinderbox has tested if they want some level of safety.

Even if you have good confidence in your developers' abilities, it is guaranteed that everyone will screw up now and then. This will cause other developers to loose time. If you have sloppy developers, you're pretty screwed!

Every developer gets their own branch. Tinderbox periodically picks up changes from a developer's branch, merges them into the tinderbox branch and if build/test passes, then the changes are also merged into the integration branch, from which all developer branches are branched from.

Advantage: Tinderbox gates promotions to integration branch, so broken checkins don't propagate. It's much more safe to blindly update to the head revision, sinces it's been built/tested.

Disadvantage: An extra level of branching - whenever a developer wishes to update their working copy, they must first merge changes from the integration branch into their developer branch, then update their working copy to the head of their own branch. This requires developers to perform more operations *and* understand a lot more about SCM usage.

Even if you have good developers who can understand SCM, it's still a hassle to perform the extra steps each time you want to update. If you have poor developers, they're going to be pretty damn confused about all this merging/branching going on.

CCM allows us to merge the best parts of both the above approaches, due to its use of tasks... Hmmm. CCM is complicated to explain in a short email, but if you have questions later, I can answer them in depth if you ask!

I read in the Subversion documentation that during a commit operation, a temporary transaction is created that derives a new version of updated files. This temporary transaction is then rolled into the trunk when it is committed. One could easily imagine creating 10 temporary transactions in parallel, all starting with the same initial head revision of the repository. This is pretty much the equivalent of creating tasks relative to a baseline in CCM. The difference being that these tasks are intended to be developer-visible long-lived entities in CCM, but are a temporary implementation detail in Subversion.

In CCM, each work are has a set of "reconfigure properties" telling the client which tasks to pull into the working copy on an update. This is a list of tasks. What tinderbox does is take the previous tested release, pick a newly committed task, update its working copy with the merge of the two, build/test, then add the new task number into each developer's reconfigure properties. This, a regular update of a developer's working copy will see the new changes (NOTE: The developer doesn't have to merge the integration branch down, because there are no two branches to merge between... Tinderbox essentially merged the commit which passed to all users (that wanted to be at the head).

This way, no change gets seen by any developer until it has been verified by tinderbox, *and* the presence of tinderbox doesn't mean that developers need greater SCM knowledge to do work.

The Subversion documentation hints that in the future, transactions could be longer lived (i.e. the completion of a commit operation doesn't get rid of the transaction, nor automatically make the new transaction the head release). If this were true, we could do the same tinderbox implementation - tinderbox would pick up newly committed transactions, and if they pass build/test, then make the tested transaction the new head.

Note: Yes, with this, it is entirely possible to create "parallel conflicts" - two developers commit against the same release and edit the same file. The first tinderbox run makes developer A's transaction the head. Now, a merge is required on developer B's checkin before it can be made head, since there are essentially two branched versions of the same file, causing a "conflict". This is already part of the CCM tinderbox we have created, and CCM has builtin commands for "conflict detection", which we run before performing the working copy update stage in the tinderbox script.

So, my question is (finally!):

a) Is there a better way to implement tinderbox in today's Subversion, than either of the two schemes I mentioned above (i.e. that doesn't contain those disadvantages)

b) The subversion documentation hints at longer-running transactions, with some script/trigger/... gating the promotion of a transaction to being head. Is this a feature that's likely to be implemented? If so, is it a short-term or long-term goal.

Thanks for any responses!

--
Stephen Warren, Software Engineer, Parama Networks, San Jose, CA
http://www.wwwdotorg.org/work_contact/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Received on Tue Mar 30 01:20:29 2004

This message: [ Message body ]
Next message: Doug Porter: "Re: Running svnserve automatically"
Previous message: Andrew A. Raines: "Re: svnlook error message"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]