Hey everyone,
I've been using Subversion for a really long time and I've always wondered
what the best structure would be for tagging. Ever since sparse checkouts
came around in 1.5, a couple of more options have become available. Granted,
being able to fully recommend a specific structure has a lot to do with the
requirements one has on version control, but hopefully there is a pattern
that is more commonly accepted than the others. So far here are a couple of
scenarios I see. We'll assume a very simple use case:
We have a project (Project1) that is fully contained in a single directory
(It may also have sub directories to further categorize the different parts
of that project. Secondly, we must be able to easily tag this project at
various intervals in the development cycle. Project1 also has 2
dependencies: Library1 and Library2. In order for Project1 to be compiled,
linked, or otherwise represent a complete product, it must know where
Library1 and Library2 are so it may have access to its source. Furthermore,
Library1 and Library2 are also used by several other projects in the
repository not mentioned here. One of our requirements is that we do not
duplicate Library1 or Library2 in the repository itself in any way.
Before I go any further, I want to discuss what requirements I have on
tagging. Technically, a tag is nothing more than a copy or a branch.
However, conceptually, a tag should be much more. A tag may not necessarily
be a copy of a single branch, but perhaps multiple branches. A tag must
capture the full dependency tree of a project in order to become a true tag.
While (technically) tags in Subversion behave at a very low-level and
primitive way, we can structure our repository in such a way that makes
tagging a bit less painful. Additionally, we can also create scripts that
assist in creating tags (More on this below).
Having described what is expected of a tag (conceptually) and what our
particular use case is, I can think of a couple of options (listed below).
Note that some of these may be inferior to other solutions. My goal is
simply to prove all of the possible ways to solve the problem. The point of
this discussion is to figure out which of those solutions is the most
reasonable given a certain set of requirements.
1. As we discussed earlier, Project1 has two dependencies: Library1 and
Library2. One way to make sure Library1 and Library2 are available to
Project1 (without duplicating them in the repository) is to use externals
(Relative externals). We can create an svn:externals property and apply it
to Project1's root directory, so when we do a checkout we pull down both
Library1 and Library2.
2. Sparse Checkouts. We can keep all projects and dependencies in the
repository in one "flat" structure. In other words, at the root of the
repository we would have Project1, Library1, Library2, and any other
projects or other libraries. When we do a checkout and we only want Project1
and its dependencies, we rely on checkout depth to choose only the parts of
the repository we want. This would not have been possible in pre-1.5 builds
of subversion.
3. Scripts. We can check out Project1 and each of its dependencies
independently. We could use a script for this, which would make sure that
each subsequent checkout is placed in a correct relative location. This
requires each project to reach outside of its own workspace (Outside of the
root of that working copy) in order to step into other working copies for
the files it needs.
These are the 3 main techniques I've had to use in the past. #1 was used
mostly before 1.5 was released. After 1.5, I had been using 1.5. Here are
some issues I find with all 3 of the above methods:
*Technique #1:*
- Externals are slow. They actually have a noticeable affect on the
performance of various working copy operations, such as updates.
- It complicates tagging. Creating a tag of Project1 alone isn't enough,
since the current state (current revision) of Library1 and Library2 are not
captured. One remedy for this I've used in the past is a "tagging script",
which essentially turned each external into a physical copy of that
dependency. It then branched the working copy into a tag. However, for
externals not local to the repository, it needlessly increases the size of
the repository. Additionally, this method has no way of knowing which
externals are supposed to be turned into copies or not (We may not want all
externals in the entire working copy to be converted!)
- Commits are complicated. When you change some items in Library1 and
Project1, you cannot commit them all simultaneously. You will require 1
commit per external. This breaks the "atomic" nature of the commit and
leaves the history of the repository in a broken state at various points.
Additionally it causes the global revision number to increase faster for no
good reason at all.
*Technique #2:*
- I can't just tag Project1 and its dependencies in this case. If there
exists a Project2 and a Project3, neither of which I am working on or intend
to tag, those will show up in the tag regardless. This complicates the tags.
- I am not aware of how this technique will work if I'm using a client at
version 1.6 and a server at 1.4. I've had issues of updates grabbing over a
gigabyte of data (With no status information being output to the user) when
I do an update from a directory with sparse checkouts. I expect anomalies
and weird behavior to happen in this case.
- It's harder to tell people how to do checkouts. If a new programmer
joins my team, I can't just send him a repository URL and tell him to check
that out. I have to tell him HOW to use sparse checkouts (If he's not
familiar with them) and also tell him which parts of the repository to
explicitly grab.
- Dependencies are not automatically managed (This also plays a big part
in the previous point). You can forget to grab dependencies and you may not
find out till much later. This is error prone.
*Technique #3:*
- This requires additional work. Instead of having the repository manage
the dependency links for me, I have to write a (potentially) complicated
script to do that work.
- This still doesn't help make tagging a one-step process.
- It requires no consistent repository structure. It actually could
result in an arbitrary repository structure, which could potentially be bad.
Basically, there are serious negatives to all 3 solutions. Furthermore, none
of the 3 techniques make tagging a project and its dependencies intuitive,
simple, and functional.
I realize there are some real guru subversion users out there. What would
you recommend for this particular example?
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2367609
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-07-02 23:30:41 CEST