[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: A failed attempt to introduce Subversion

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2004-03-21 07:11:39 CET

Thanks for the feedback, lengthy reply follows.

On Sat, 2004-03-20 at 15:40, Alexander L. Belikoff wrote:

> The very first showstopper was the import. Cvs2svn import process for the
> repository to a local SVN repository has been running for 4 (sic!) days after
> which period it was mercilessly put to death. By the kill time, the target
> SVN repository grew to 1.8Gb (again, coming from 40Mb), of which about 1.5Gb
> was taken by a single 'strings' DB.

cvs2svn isn't yet up-to-speed; it's a now a separate project under very
active development. Your report sounds like some similar reports we've
heard regarding bugs in cvs2svn's lack of scalability. Things should be
getting better in the very near future.

Still, you said your project was fairly "tool neutral"... yet somehow
there was a requirement that your SCM tool be able to convert all
CVS/RCS history? Just curious about this. That doesn't sound
particularly neutral to me... it's a very hefty prerequisite. :-)

>
> The second annoyance was the fact that SVN effectively mandates the user
> performing the merge to explicitly specify two revisions, the diff between
> which would be applied to the target revision. While merge is no trivial
> operation by any means, having a number of usable strategies implemented
> would definitely help.

Did you read chapter 4 in the book? Granted, it's occasionally annoying
to have to figure out the exact 2 revisions to compare in order to
perform a merge, but no more so than in CVS. ('cvs up -j N -j M
foo.c'). When I do a merge in CVS (a backport of a bugfix, for
example), it's actually *harder*, because I need to run 'cvs up -j -j'
on every single file in my non-atomic changeset(s), and discover the
exact revisions of every file to compare. With SVN, you find 2 versions
of the entire branch-tree, and a single merge command gives you all the
changes at once. It's much easier than CVS in that regard. (See
example further down this mail.)

> In this particular case, at least having an ability to
> merge the changes from another branch using common ancestor as a starting
> point would help.

Absolutely. This is a planned feature. The book explicitly discusses
this fact, and explicitly discusses strategies for working around it.

>
> Finally, there was also an issue of exaggerated expectations. For example, a
> very large number of CVS users I had to deal with were confident that
> Subversion would track merge operations.

Where on earth did they get that idea? Subversion's global revnums
certainly make it *easier* to perform CVS-style merges, but we've never
had merge tracking or ever made such claims. I've never seen any
document making those claims either. Can you ask your users where they
got this idea? It would be useful to know.

> I heard numerous times "Well, if
> Subversion doesn't track merges, it is basically a CVS with a DB and atomic
> commits."

Right. Atomic commits. And versioning of directories. And copies and
renames. And efficient compression of binary data. And order 1
branching and tagging. And versioned metadata. I wouldn't exactly call
that "just CVS with atomic commits." :-)

>
> 1. Import process seems to be absolutely inadequate. I don't even want to
> estimate how much time it would take for the entire body of our source code
> (about 35 Mil. LOC). Unless it is a clear bug, fixing which would make the
> process several orders of magnitude faster (ClearCase import from CVS for the
> same code takes about 45 minutes) I am not sure it is usable.

Please be careful here. cvs2svn is not "import", and cvs2svn is a
separate project from Subversion, with known scalability problems.
Please don't judge Subversion by the failure of a 3rd party tool; it
has nothing to do with Subversion.

If you don't need history, a straight 'svn import' of your dataset
should prove easy and fast. But it's not clear whether you tried that.
Maybe that's just not an option for your project, dunno.

>
> 2. Repository size seems to be too big. 40Mb of RCS data resulted in 1.8Gb of
> SVN data.

Again, you're describing a bug with cvs2svn, not Subversion. cvs2svn
sometimes needs to make a lot of extra copies when deducing and
re-creating heavy CVS branching and tagging operations. A 1.5GB strings
table for a 40MB dataset seems a bit crazy, I agree. I recommend you
report this problem to the dev@cvs2svn.tigris.org list. It sounds
similar to other cvs2svn bug reports.

> Also, I am not sure how well BDB will scale for databases
> that large. I am afraid to think how big this file would be for our entire
> source code ;-) ClearCase has learned it over the time so now they have
> guidelines for splitting the source code across multiple VOBs.

Actually, there are SVN repositories out there that are 30GB+, no
problem. It shouldn't be something you need to worry about.

>
> 3. Another troubling thought is the one about tags. As I said, we had a lot of
> tags (about 150 per file). I wonder it was the tags that resulted in all that
> bloat, then something is wrong with the implementation of lazy
> tagging/branching.

Notthing's wrong with Subversion's implementation of O(1) copies. But
cvs2svn might be heavily abusing it, or rather, not making *enough* use
of it -- copying whole trees (really duplicating data) rather than
making lazy links. Again, another thing to discuss on the cvs2svn list.

>
> 3. Merging needs some enhancements
> Immediate lack of ability to merge the entire branch into the current revision
> makes it very painful to handle merging and in fact it actually makes SVN
> *worse* than CVS for merging.

Huh? I'm getting the distinct impression that you're misunderstanding
merging. If you created your branch in revision 100, then to merge the
"entire branch" to your working copy of trunk, you simply run

   svn merge -r100:HEAD MyBranchURL

One command, that's it. The only "pain" here is that the user has to
discover the number 100. But that's easily done by running 'svn log -v'
on the branch URL or working copy. Again, this is explicitly discussed
in chapter 4.

> Anyway, we are now moving the project to CVS. I am still an avid SVN user at
> home but it will take some time before we try using it at work. ;-)

Sorry to hear about your frustrations, and we appreciate your feedback.

My overall impression is that somehow you've discovered that cvs2svn is
not scaling properly, and somehow confused it with SVN itself. I'm not
trivializing the importance of cvs2svn here; I understand that for many
people it's an absolute requirement to make the switch to SVN. I just
want to make it clear that cvs2svn's current problems are not those of
Subversion. If you had been using Subversion from day 1 to make all
those zillions of branches and tags, I guarantee you wouldn't see a
multi-gigabyte sized repository!

And there also seems to be a bit of confusion about SVN's merging
abilities as well. I recommend you take a close read of chapter 4. We
have all sorts of plans for 'smart merging' features in SVN someday in
the future, but I'll argue anyday that SVN's merging is easier than
CVS's.

In the future, I might recommend you ask for help on the users@ list
when you're actually having problems, rather than sending us a
post-mortem report. :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Mar 21 07:12:06 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.