Re: SCM, Content-Management and cherry-picking in big project

From: David Weintraub <qazwart_at_gmail.com>
Date: Mon, 1 Mar 2010 11:49:06 -0500

Consider the popularity of Subversion, I'd definitely count it as one
of the big boys. Unlike most other packages, Subversion versions the
changes to the repository rather than individual files. This can seem
confusing at first. You have a file that's been changed twice, but
somehow you can refer to revision 5694 of that file.

Then again Subversion's revision method, it makes roll back much, much
easier. For example, I checked in a change that involved four
different files. In most revision systems, each file is revisioned
independently. If I get a log of the changes, I'd get a log of each
file that was changed. I could locate one file, then using the date
stamp, try to locate the other files changed at the same time with the
same revision comment by the same user and assume they're all part of
the same set of changes.

In Subversion, the log is in the order of revision. I can quickly
track down the change, and immediately see all the files that were
changed by that user at the same time. Rolling back is simple. A
single command will roll back all of the changes at once.

So, when it comes to rolling back changes, there's nothing that can do
it better and easier than Subversion.

Now comes some of the "issues" with Subversion that you'll have. First
of all, it is not standard practice to store binary files --
especially files that were built from the source code in Subversion.
One of the issues is that you tend to do this when you have sloppy
build management practices and are afraid of not being able to
duplicate your build.

The other issue (which I feel is more pertinent) is that you cannot
(easily) remove a revision from your source repository. For source
code, this isn't an issue since it is usually stored in delta format.
Removing revision 5 of foo.cpp doesn't save you anything in the terms
of storage. However, if you are storing binary files, diff format
doesn't save a whole lot of space, and removing obsolete revisions of
binaries can save a ton of space. Since there is no command to remove
a revision from the source archive, storing binary files can quickly
take up a lot of space that cannot be recovered.

The truth is that revision control systems are a terrible way to store
release binaries. You spend an awful amount of time trying to keep
your binary releases trimmed from your source repository while at the
same time trying not to destroy important information. And, in the
end, you rarely get any advantage from using your source control
system for storing releases. Non-development teams have a hard time
trying to find the release because they have to know how a version
control system works and then have to have permission to use the
system even if its only for reading.

When it comes to storing binary releases, you are much better off
using a release management system. We use Nexus which is a Maven
repository system, but we use it even for our non-Maven projects. We
also store our built releases in Hudson, our build system. This makes
it much easier for tech services and QA to find the releases they're
looking for.

So, although I was once critical of Subversion's poor ability to store
releases, I'm now a convert to this type of thinking. Even with other
version control systems like Perforce and ClearCase which can remove
obsolete binary stored files, I've taken to introducing a release
repository system. This even includes things like intermediate
libraries that one project will build, but will be used by other
projects, but are not themselves released to the customer.

Now comes tagging. Tagging in Subversion is awful. It is terrible. It
stinks. In Subversion, a "tag" is just a special branch that you call
a "tag". It is very easy to checkout a "tag", modify the files
involved in this "tag", and then commit those changes without anyone
else realizing it. Most version control systems have a way of locking
a tag to prevent any files from being changed, but not Subversion.
(NOTE: The pre-commit Python permission script does allow you to setup
a permission to allow users to create a tag, but not allow a user to
modify the tag. This does make tagging much more useful).

The other problem with tags is that a tag is merely a copy of another
branch. For example, I am working off the trunk, like what I have, and
I copy that trunk revision to the "tag" branch. In most other revision
control systems, a tag is a common label used to match a group of
files, each with a separate revision, to a way that I can refer to
them as a group. File foo.cpp might have the tag REL-1.0 on revision 6
of that file while bar.cpp might have the REL-1.0 tag on revision 13
of that file. This makes it easy to change a single file in a tag. For
example, I could easily move the REL-1.0 tag from version 13 of
bar.cpp to version 12 of bar.cpp. There is no easy way to do this in
Subversion.

That being said, tagging in Subversion is extremely quick. What use to
take us 30 to 40 minutes in CVS can be done in less than a second in
Subversion. That's because copying a branch or trunk in Subversion to
a tag takes a single operation internally in the repository itself.
You only need a single reference (the URL and revision you're tagging)
and you don't have to mark thousands of individual files. That makes
tagging dirt cheap.

Another thing is that tagging itself is not as necessary as it use to
be because Subversion tracks changes on revisions of the entire
repository and not on individual files. If you know the repository
revision, you automatically know how to pull up all of those files as
a set. In other words, the repository revision itself becomes a tag.
Many sites no longer have build tags, but instead refer to the
repository number itself. Sites can now talk about revision 43483
instead of the tag BUILD_0453.

It is possible, since tags are branches, to change one or two files on
a "tag" if you need to. The advantage is that you do have a history of
the changes that took place on a "tag". Under most revision control
systems, you'd have to check some sort of system log (which might
limit the length of the history you can pull), or simply don't have a
record of tag changes. So, this is a nice feature in Subversion.

However, the complete picking of choosing (I choose revision #4 of
file foo.cpp and revision #9 of file bar.cpp and revision #38 of file
foobar.cpp...) is very difficult to do in Subversion.

The truth is that picking and choosing individual files for a release
without having those files together as a testable revision is not a
very good way to do builds anyway. I remember an early tool called
Sable (used by AT&T) that was a version control system with a built in
defect tracking system. Every change in your repository had to do with
a MR number (Modification Request). In Sablime, a new release was an
old release, plus all the MRs that you wanted included in the next
release.

This sounds wonderful in practice, but became a nightmare to
implement. One bug fix actually would be dependent upon an earlier bug
fix, but that bug fix might not been something I wanted to release. It
would take us several days to figure out what was releasable and what
wasn't. If we weren't careful, we would actually checkout a file that
contained source code that a developer never wrote. Hilarity ensued.

Your much better off thinking of a release as a layer. Certain fixes
are in one layer while other fixes are in the next layer. If you
release the second layer, it has to include all the fixes in the first
layer. You can pick the layer you're releasing, but it would have to
include all the previous layers. You can't pick and choose layers.
(NOTE: You can backout a change, but this isn't the same as choosing
to leave out a layer).

So, the short answer is: No, what you want to do isn't easy to do in
Subversion, but would pretty much be a mess in any version control
system. You shouldn't pick and choose bug fixes and hope they create a
releasable project. It's messy and it rarely works well. You're much
better off prioritizing defects and features before your developers
work on them than afterwards.

Subversion is an excellent tool, but it doesn't work for everyone.
However, the way you want might be difficult for almost any version
control system. You're spreading changes on multiple branches (If I
understand you correctly), trying to pick and choose changes, and
you'll get lost in a forest of branches.

If you really want to work this way, I highly recommend you look at
Sablime (http://bit.ly/aDAQkK) which is a revision control system that
specifically is designed to allow you to pick and choose what you want
to release.

However, this is probably the first time in my long history of CM
management that I am recommending Sablime to anyone. You're really
better off rethinking your release strategy.

-- 
David Weintraub
qazwart_at_gmail.com

Received on 2010-03-01 17:49:44 CET

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: "svnlook pl --revprop" does not work on transactions?"
Previous message: B Smith-Mannschott: "Re: SCM, Content-Management and cherry-picking in big project"
Maybe in reply to: pacco: "SCM, Content-Management and cherry-picking in big project"
Next in thread: Pacco: "Re: SCM, Content-Management and cherry-picking in big project"
Reply: Pacco: "Re: SCM, Content-Management and cherry-picking in big project"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]