[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

My observations and experience with cvs2svn.py on Windows

From: Bern McCarty <Bern.McCarty_at_bentley.com>
Date: 2004-02-17 19:11:00 CET

I thought I'd share the results of my first experience with cvs2svn.py which
happened to be on Windows.

I found RCS 5.7 binaries for the PC someplace and grabbed them because I
didn't want to build it. The first problem that I encountered related to
the way that cvs2svn.py uses the RCS 'co' command. co.exe was unable to
locate the RCS file even though it was being given a fully qualified path to
the file. After reading the RCS doc I could see why. Changing cvs2svn.py
to add -x,v to the co.exe command-line seemed to fix the problem for me.

The other change I had to make to cvs2svn.py to get it to work on Windows
involved binary files. I changed the access-mode string from 'r' to 'rb'
where cvs2svn.py opens the pipe to read from the launched RCS co.exe
process. Prior to making this change I got IOError exceptions when the pipe
was closed on a co.exe that was extracting a binary file from RCS. These
were the only two changes I had to make to get it to run to the end.

There were some things that I observed that I didn't expect. I used the
--trunk-only option the usage help for which says "convert only trunk
commits, not tags nor branches". But in the end I got the expected /trunk
and /tags directory but no /branches directory at all even though I
definitely have branches in my CVS repository. I figured maybe the behavior
implied by this options' name was more accurate than it's usage help until I
looked in the source; from what I can tell this command-line option is
ignored completely. Am I wrong? I have revision 8628 of cvs2svn.py. Then I
took a look into my /tags directory and it looks to me like some of the tags
in there are indeed branch tags, but there are branches that I know that I
have that are not there (or anywhere since I didn't get a /branches
directory at all). Is there a probable explanation for why the script failed
to recognize any of my CVS branches as branches?

I knew that my CVS repository was littered with tons of tags that had
accumulated over the years. I expected that the resulting Subversion
repository (in particular the strings file) would be smaller than the total
size of all of the RCS files in CVS, partly due to what I figured would be a
much more efficient way to represent tags. What I observed was quite the
opposite. My db/strings file ended up being nearly 5 times larger than the
sum total size of my CVS repository files. Seeing that all of the tag
related transactions appeared to be at the end of the DUMPFILE, I cracked it
open in a binary file editor, found the very beginning of the tag related
stuff and truncated it. Then I ran svnadmin load on it to another new
repository. It worked fine. The resulting strings file now was only
slightly larger than the sum total size of my CVS repository. Apparently
about 80% of the size of my first Subversion repository was tag related. In
trying to sell the idea of CVS to Subversion migration in my organization I
used the "tags are so much more efficient" argument among many others. They
do not appear to be at all space efficient. The other thing that I noted
during this experiment was that my "svnadmin load" execution time was
thoroughly dominated by processing the couple of thousand tags found.

A nice enhancement to cvs2svn.py would be an option to ignore tags separate
from ignoring branches, and/or to be able to provide a list of the
tags/branches that you want migrated (and to ignore the rest).

Cvs2svn.py execution time (for me at least) was thoroughly dominated by pass
4 to the point where the ability to restart at as of a given pass did not
seem helpful. My guess is that it is the repeated launching of co.exe as an
external process that is responsible for the bulk of the total execution
time. I wonder if someone took the RCS source and made a python module so
that RCS co could be merely an in-process cvs2svn.py call if that would make
it substantially faster?

Several times I had to restart the whole migration because bad/corrupt RCS
files were encountered. I realize that full-blown incremental execution has
already been suggested but, short of that, I would have loved to just have
the option to tell cvs2svn.py to pretend that bad/corrupt RCS files weren't
there, log their names someplace and to continue on. I'm not suggesting that
it would be trivial to implement, I don't know, I'm just saying that it
would have been useful to me. It turns out the problematic files in my case
ones that I would have lived without or dealt with specially provided
everything else turned out as desired.

Bern McCarty
Bentley Systems, Inc.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Feb 17 19:41:35 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.