Philosophical question: Tagging & Structure

From: Robert Dailey <rcdailey_at_gmail.com>
Date: Thu, 2 Jul 2009 16:29:45 -0500

Hey everyone,
I've been using Subversion for a really long time and I've always wondered
what the best structure would be for tagging. Ever since sparse checkouts
came around in 1.5, a couple of more options have become available. Granted,
being able to fully recommend a specific structure has a lot to do with the
requirements one has on version control, but hopefully there is a pattern
that is more commonly accepted than the others. So far here are a couple of
scenarios I see. We'll assume a very simple use case:

We have a project (Project1) that is fully contained in a single directory
(It may also have sub directories to further categorize the different parts
of that project. Secondly, we must be able to easily tag this project at
various intervals in the development cycle. Project1 also has 2
dependencies: Library1 and Library2. In order for Project1 to be compiled,
linked, or otherwise represent a complete product, it must know where
Library1 and Library2 are so it may have access to its source. Furthermore,
Library1 and Library2 are also used by several other projects in the
repository not mentioned here. One of our requirements is that we do not
duplicate Library1 or Library2 in the repository itself in any way.

Before I go any further, I want to discuss what requirements I have on
tagging. Technically, a tag is nothing more than a copy or a branch.
However, conceptually, a tag should be much more. A tag may not necessarily
be a copy of a single branch, but perhaps multiple branches. A tag must
capture the full dependency tree of a project in order to become a true tag.
While (technically) tags in Subversion behave at a very low-level and
primitive way, we can structure our repository in such a way that makes
tagging a bit less painful. Additionally, we can also create scripts that
assist in creating tags (More on this below).

Having described what is expected of a tag (conceptually) and what our
particular use case is, I can think of a couple of options (listed below).
Note that some of these may be inferior to other solutions. My goal is
simply to prove all of the possible ways to solve the problem. The point of
this discussion is to figure out which of those solutions is the most
reasonable given a certain set of requirements.

   1. As we discussed earlier, Project1 has two dependencies: Library1 and
   Library2. One way to make sure Library1 and Library2 are available to
   Project1 (without duplicating them in the repository) is to use externals
   (Relative externals). We can create an svn:externals property and apply it
   to Project1's root directory, so when we do a checkout we pull down both
   Library1 and Library2.
   2. Sparse Checkouts. We can keep all projects and dependencies in the
   repository in one "flat" structure. In other words, at the root of the
   repository we would have Project1, Library1, Library2, and any other
   projects or other libraries. When we do a checkout and we only want Project1
   and its dependencies, we rely on checkout depth to choose only the parts of
   the repository we want. This would not have been possible in pre-1.5 builds
   of subversion.
   3. Scripts. We can check out Project1 and each of its dependencies
   independently. We could use a script for this, which would make sure that
   each subsequent checkout is placed in a correct relative location. This
   requires each project to reach outside of its own workspace (Outside of the
   root of that working copy) in order to step into other working copies for
   the files it needs.

These are the 3 main techniques I've had to use in the past. #1 was used
mostly before 1.5 was released. After 1.5, I had been using 1.5. Here are
some issues I find with all 3 of the above methods:

*Technique #1:*

   - Externals are slow. They actually have a noticeable affect on the
   performance of various working copy operations, such as updates.
   - It complicates tagging. Creating a tag of Project1 alone isn't enough,
   since the current state (current revision) of Library1 and Library2 are not
   captured. One remedy for this I've used in the past is a "tagging script",
   which essentially turned each external into a physical copy of that
   dependency. It then branched the working copy into a tag. However, for
   externals not local to the repository, it needlessly increases the size of
   the repository. Additionally, this method has no way of knowing which
   externals are supposed to be turned into copies or not (We may not want all
   externals in the entire working copy to be converted!)
   - Commits are complicated. When you change some items in Library1 and
   Project1, you cannot commit them all simultaneously. You will require 1
   commit per external. This breaks the "atomic" nature of the commit and
   leaves the history of the repository in a broken state at various points.
   Additionally it causes the global revision number to increase faster for no
   good reason at all.

*Technique #2:*

   - I can't just tag Project1 and its dependencies in this case. If there
   exists a Project2 and a Project3, neither of which I am working on or intend
   to tag, those will show up in the tag regardless. This complicates the tags.
   - I am not aware of how this technique will work if I'm using a client at
   version 1.6 and a server at 1.4. I've had issues of updates grabbing over a
   gigabyte of data (With no status information being output to the user) when
   I do an update from a directory with sparse checkouts. I expect anomalies
   and weird behavior to happen in this case.
   - It's harder to tell people how to do checkouts. If a new programmer
   joins my team, I can't just send him a repository URL and tell him to check
   that out. I have to tell him HOW to use sparse checkouts (If he's not
   familiar with them) and also tell him which parts of the repository to
   explicitly grab.
   - Dependencies are not automatically managed (This also plays a big part
   in the previous point). You can forget to grab dependencies and you may not
   find out till much later. This is error prone.

*Technique #3:*

   - This requires additional work. Instead of having the repository manage
   the dependency links for me, I have to write a (potentially) complicated
   script to do that work.
   - This still doesn't help make tagging a one-step process.
   - It requires no consistent repository structure. It actually could
   result in an arbitrary repository structure, which could potentially be bad.

Basically, there are serious negatives to all 3 solutions. Furthermore, none
of the 3 techniques make tagging a project and its dependencies intuitive,
simple, and functional.

I realize there are some real guru subversion users out there. What would
you recommend for this particular example?

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2367609

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-07-02 23:30:41 CEST

This message: [ Message body ]
Next message: Ronie Gilberto Henrich: "Re: subversion 1.6.2 (svn+https) copy/commit "Could not read status line: Secure connection truncated""
Previous message: Bob Archer: "RE: Running Subversion on a Windows VM"
Next in thread: Bolstridge, Andrew: "RE: Philosophical question: Tagging & Structure"
Reply: Bolstridge, Andrew: "RE: Philosophical question: Tagging & Structure"
Reply: Johan Corveleyn: "RE: Philosophical question: Tagging & Structure"
Maybe reply: Robert Dailey: "Re: Philosophical question: Tagging & Structure"
Maybe reply: Bolstridge, Andrew: "FW: Philosophical question: Tagging & Structure"
Maybe reply: Les Mikesell: "Re: Philosophical question: Tagging & Structure"
Reply: John Waycott: "Re: Philosophical question: Tagging & Structure"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]