We're about 90% done with our migration from using the SourceOffSite
client with a Visual SourceSafe backend to Subversion. I figured it was
a good time to jot down some of the ups and downs of the process.
Our VSS repository is about 20GB in size with 193,000 files across
16,000 folders. We've been using SOS+VSS for 6+ years. Most of our
files only have a handful of revisions due to the rapid nature of how we
move from project to project (lifespans of projects can be measured in
days, but they're then left in the repository for future reference).
1) VSS2SVN didn't work for us. Either the export process would die, or
the import file would fail to load into SVN. VSS has slowly accumulated
enough errors over the years (even with using only SOS which prevents
99% of the glitches) that VSS2SVN had a rough go of it. After spending
a week on attempting the migration, we chose to go with Plan B.
Plan B is to leave the old SOS/VSS repository up in a read-only mode and
simply start with a fresh snapshot of the current version of all files.
We also created an "archive" repository where we restored older
snapshots of the VSS repository (roughly one per year) which can be
referred to if needed.
The advantage of starting fresh and doing a clean import was that we
took the time to restructure portions of our repository. Some projects
were split off into side repositories and we now have around a dozen
repositories instead of one massive repository.
2) Hardware: We're running SVN inside a Xen DomU on a dual-core
Athlon64 X2 AM2 CPU, 4GB ECC, with (4) 750GB SATA drives in a RAID10
configuration with a hot-spare. Very low-end hardware, but we'll be
achieving fault-tolerance by setting up 2 of these machines with DRBD,
heartbeats, Xen DomU migration and other tricks. The Xen DomU for
Subversion was only granted 1 CPU and 512MB of RAM. Which is plenty to
keep up with activity over a T1 line.
If our users were all in the office, we'd have wanted to dedicate at
least 2 CPUs to the DomU and possibly upped the memory grant to 1GB.
But the current bottleneck is the networking issues in stock Xen 3.0.2
which we haven't tackled yet. Our SVN DomU would run at 50% CPU time
and max out around 2-3MB/s when using SVN+SSH. It's fast enough for our
small 4-person group, but I'll need to fix the networking issue and beef
up the CPU power a bit before expanding to a 40-person setup.
Sometime in 2007/2008, we plan on moving to 10k or 15k SATA/SAS drives
in a 4 to 8 drive RAID10 configuration with at least 4 CPU cores for the
Xen host OS so we can give the SVN DomU OS more cores to play with.
3) Importing the old respository. Because of how I did the migration,
we were able to slowly lock down the SOS/VSS repository to read-only
mode a tree at a time, push the latest to a fast hard drive and then do
an in-place import into the SVN repository. Other then being tedious,
it went fairly well. The approximate cycle was:
- Create the base folder structure for the repository. In the case of
our Jobs repository, this meant creating folders for each client, along
with creating sub-folders. We have dozens of client folders and
hundreds of job folders so we ended up with a slightly more complex
structure then other SVN users might encounter. An example of our base
layout is:
repos-jobs/A/AcmeWidgets/ACWI010xx/ACWI01030
Where "AcmeWidgets" is the client name, and the individual job folders
are clustered into groups of ~100. Otherwise we would end up with
folders that contained hundreds of other folders, which becomes a
usability issue when browsing the file system. By limiting folders to
only having ~100 sub-folders, it's still possible to use graphical,
tree-based file managers with some efficiency.
That folder structure ended up as revision #1, with zero files but a few
hundred folders. Whenever we setup a new user's working copy, we always
checkout revision #1 so that the user can more easily access
sub-projects without having to create all the folders by hand.
Note: Under SourceOffSite / VisualSourceSafe, it was extremely easy to
quickly grab the contents of a project that is 3+ folders deep in a
hierarchy without having to create all of the intermediate folders.
SOS/VSS would simply walk back up the tree until it knew the base
working folder for a particular branch, then would walk back down the
tree and create the intermediate folders.
- Step #2 was to pull across a single job letter (i.e. all clients that
start with the letter "G") into the new working copy for SVN.
- Step #3 would be to add all of the files under that job letter to SVN
(but not commit the changes).
- Step #4 would be to use TortoiseSVN to commit only the empty foldrers
under that job letter to the SVN repository. This gives us a revision
with just folder structure that we can use to build a new user's working
copy. (Eventually, we'll write a batch script that will have a list of
all of those "folder-only" revisions and perform the SVN
Update-to-Revision within each job letter.)
- Step #5 was to commit the rest of the files in small batches. My
threshold for each commit was ~150 files or ~25-50MB. Any large 50+ MB
files were committed as individual revisions and any projects with > 500
files were broken down into smaller commits.
- Rinse-repeat for the rest of the repository. We're up to a few
hundred revisions in the repository, and will probably reach 1000+
before the end of the migration.
4) Concerns about size / file count within a single revision.
I do worry a bit about some of the larger commits that I made while
doing the in-place import of the latest snapshot. I think I chose a
middle road between having too many revisions versus packing too much
into a single revision. I ended up with a few revisions in the
1000-2000 file count range where I was too lazy to fine it down.
5) Things we like / dislike.
- Efficiency in storage of binary files. We do a lot of work with
MS-Office documents. Under VSS/SOS, every time you would check-in a
binary file, the entire thing would travel over the wire and the
repository size would grow because VSS simply stored the binary files as
a new snapshot. We would ZIP up MSAccess databases before storing them
in the VSS repository to save space.
SVN simply does a better job then VSS. Between compressed storage of
binary files (saving us the step of ZIP'ing up large databases) and
transmitting only the deltas, it's a lot easier and faster to work with
these files now. That's a big win for us and lessens the strain on our
T1 line.
- The repository-wide revision # allows us to peek at the root of the
repository to see where activity is occurring across all projects.
- We're still evaluating locks. At the start, we're going to try to get
used to working without them.
- The command-line client and the TortoiseSVN client make it difficult
to grab an arbitrary folder that is 2-4 levels deep in the hierarchy
without wasting time pulling down other projects that we don't need on
our local working copies. We're waiting for RapidSVN to release their
1.4 version to see if that does a better job for that particular usage
scenario.
The alternative would've been to create a few hundred repositories (one
per job#) which is a management nightmare and even worse from the user's
perspective. Plus, we're operating under the constraint that some of
the tools that we use require working copy folder locations to be
identical across each user's machine. (This happens on the Mac side of
the business as well. There are many tools that use absolute paths and
cause issues when files change location between working copies.)
- Not having the ability to share individual files between projects on
non-unix systems is going to cause issues. We're still evaulating how
to deal with that issue and whether "externals" can meet our needs.
- Being able to use multiple SVN clients on the same working copy is a
big bonus. For some operations, the command-line client wins out over
TortoiseSVN and it's nice to have choice.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Oct 20 15:26:10 2006