Re: A failed attempt to introduce Subversion

From: Alexander L. Belikoff <abel_at_vallinor4.com>
Date: 2004-03-21 21:25:32 CET

On Sunday 21 March 2004 01:11 am, Ben Collins-Sussman wrote:
> Thanks for the feedback, lengthy reply follows.
>
> On Sat, 2004-03-20 at 15:40, Alexander L. Belikoff wrote:
> > The very first showstopper was the import. Cvs2svn import process for the
> > repository to a local SVN repository has been running for 4 (sic!) days
> > after which period it was mercilessly put to death. By the kill time, the
> > target SVN repository grew to 1.8Gb (again, coming from 40Mb), of which
> > about 1.5Gb was taken by a single 'strings' DB.
>
> cvs2svn isn't yet up-to-speed; it's a now a separate project under very
> active development. Your report sounds like some similar reports we've
> heard regarding bugs in cvs2svn's lack of scalability. Things should be
> getting better in the very near future.
>

I understand. The issue here is that it is usually the first acitvity new SVN
users are subjected to. I went over the book several times in the past - is
there another way to import changes (obviously, other than doing manual
checkin for each revision)? Making it work is fairly important to prevent the
"version 1 syndrom" where people try version 1, immediately find a
showstopper, say "It sucks", and never try this program again. ;-)

> Still, you said your project was fairly "tool neutral"... yet somehow
> there was a requirement that your SCM tool be able to convert all
> CVS/RCS history? Just curious about this. That doesn't sound
> particularly neutral to me... it's a very hefty prerequisite. :-)

I said, the SCM *process* was neutral. It is in fact very common sense and
documented in various places (see for example the IEEE article "The
importance of branching models in SCM" at
http://www.unilog-itservices.ch/Seapine/Documents/SCMBranchingModels.pdf )

As for the requirements to transfer the revision history, well it seems to be
so natural for me that there is no reason to even list it. In fact I doubt
any serious company would consider a tool that cannot import the the history
of the product, especially if this company is under SEC requirements and
such :-) After all we are not talking about some proprietary system like DEC
CMS importing from which could be a major pain. We are talking about most
open and widely documented format - RCS.

>
> > The second annoyance was the fact that SVN effectively mandates the user
> > performing the merge to explicitly specify two revisions, the diff
> > between which would be applied to the target revision. While merge is no
> > trivial operation by any means, having a number of usable strategies
> > implemented would definitely help.
>
> Did you read chapter 4 in the book? Granted, it's occasionally annoying

Several times :-)

> to have to figure out the exact 2 revisions to compare in order to
> perform a merge, but no more so than in CVS. ('cvs up -j N -j M
> foo.c'). When I do a merge in CVS (a backport of a bugfix, for
> example), it's actually *harder*, because I need to run 'cvs up -j -j'
> on every single file in my non-atomic changeset(s), and discover the
> exact revisions of every file to compare. With SVN, you find 2 versions
> of the entire branch-tree, and a single merge command gives you all the
> changes at once. It's much easier than CVS in that regard. (See
> example further down this mail.)

In many times, *especially* in cases where two branches are directly related,
cvs up -j <BRANCH2> would do the right thing.

>
> > In this particular case, at least having an ability to
> > merge the changes from another branch using common ancestor as a starting
> > point would help.
>
> Absolutely. This is a planned feature. The book explicitly discusses
> this fact, and explicitly discusses strategies for working around it.
>

Hmm... Would it be possible I was reading the old version of the book? Gotta
check it out.

> > Finally, there was also an issue of exaggerated expectations. For
> > example, a very large number of CVS users I had to deal with were
> > confident that Subversion would track merge operations.
>
> Where on earth did they get that idea? Subversion's global revnums
> certainly make it *easier* to perform CVS-style merges, but we've never
> had merge tracking or ever made such claims. I've never seen any
> document making those claims either. Can you ask your users where they
> got this idea? It would be useful to know.
>
> > I heard numerous times "Well, if
> > Subversion doesn't track merges, it is basically a CVS with a DB and
> > atomic commits."
>
> Right. Atomic commits. And versioning of directories. And copies and
> renames. And efficient compression of binary data. And order 1
> branching and tagging. And versioned metadata. I wouldn't exactly call
> that "just CVS with atomic commits." :-)
>

Beauty is in the eye of the beholder. Unless people are working on a special
project using binary files, they wouldn't care for efficient binfile storage.
Similarly, if people work in a static directory structure (like us :-) they
will consider directory versioning a useful, but not critical feature. Same
with versioned metadata. That's the egoistic nature of people - they don't
want to use Subversion because it's a huge step overall over CVS. They want
to use it because it could solve their real puny needs. :-)

> Please be careful here. cvs2svn is not "import", and cvs2svn is a
> separate project from Subversion, with known scalability problems.
> Please don't judge Subversion by the failure of a 3rd party tool; it
> has nothing to do with Subversion.

You are wrong here. cvs2svn *is* part of SVN :-))))) It is perceived as such
by most of people I've dealt with. Otherwise, we are left with SVN as a tool
unable to migrate an existing repository into itself. If I proposed migrating
to SVN without being able to import the entire repository, the tool would've
been rejected before I had managed to finish the sentence.

I was not the one doing the design of SVN and I have not [yet, hopefully]
contributed to SVN progress, so I really don't feel like telling you guys how
do handle the project. I could only share something from my past experience
w/ commercial software. One of the most crucial things in my experience was
to stay compatible. Clients *absolutely* *hate* when things change. They want
changes to be absolutely seamless. Most of our clients didn't come from the
cold, they were already using competitors' products and they were not very
eager to change. If we were interested to get these people on board, we had
to do miracles to ensure that the migration was as seamless as possible.
Saying "you cannot migrate yet" in such an environment was simply not an
option - they would simply not take us seriously. Migration from competitor's
product is usually feature #1 in any commercial product out there. Just
imagine is MS Money came out with no Quicken conversion built in? Or, in even
more drastic case, if MS Word didn't offer reading of Wordstar and
WordPerfect, we wouldn't be using Microsoft Office today. ;-)

> If you don't need history, a straight 'svn import' of your dataset
> should prove easy and fast. But it's not clear whether you tried that.
> Maybe that's just not an option for your project, dunno.

We do need history. As do most of the development groups out there. Cases
where people can happily toss their history and start afresh are fairly
exotic and should be treated as such. Just ask some development group
whether they would be comfortable to prune their repository to the existing
version only. :-)

>
> > 2. Repository size seems to be too big. 40Mb of RCS data resulted in
> > 1.8Gb of SVN data.
>
> Again, you're describing a bug with cvs2svn, not Subversion. cvs2svn
> sometimes needs to make a lot of extra copies when deducing and
> re-creating heavy CVS branching and tagging operations. A 1.5GB strings
> table for a 40MB dataset seems a bit crazy, I agree. I recommend you
> report this problem to the dev@cvs2svn.tigris.org list. It sounds
> similar to other cvs2svn bug reports.

Got it. So it all goes back to the general problem w/ conversion. I'll see if
there is a similar bug report already. If there is none, I'll try to
replicate the problem the way I could report it.

> > Also, I am not sure how well BDB will scale for databases
> > that large. I am afraid to think how big this file would be for our
> > entire source code ;-) ClearCase has learned it over the time so now
> > they have guidelines for splitting the source code across multiple VOBs.
>
> Actually, there are SVN repositories out there that are 30GB+, no
> problem. It shouldn't be something you need to worry about.

That's comforting to know. Seriously. I generally become restless when dealing
with files more than 100Mb in size, esp. when all our assets are in this
file. :-)))

> > 3. Merging needs some enhancements
> > Immediate lack of ability to merge the entire branch into the current
> > revision makes it very painful to handle merging and in fact it actually
> > makes SVN *worse* than CVS for merging.
>
> Huh? I'm getting the distinct impression that you're misunderstanding
> merging. If you created your branch in revision 100, then to merge the
> "entire branch" to your working copy of trunk, you simply run
>
> svn merge -r100:HEAD MyBranchURL
>
> One command, that's it. The only "pain" here is that the user has to
> discover the number 100. But that's easily done by running 'svn log -v'
> on the branch URL or working copy. Again, this is explicitly discussed
> in chapter 4.

I've got to re-read that chapter. I remember we had some problems using info
from the log... BTW, I still consider having a symbolic name START or BIRTH
designating the start revision of the branch a good idea. :-) I'll try to
hack SVN to provide a patch for it. Ditto for $Header$ which I am missing a
lot :-)

>
> > Anyway, we are now moving the project to CVS. I am still an avid SVN user
> > at home but it will take some time before we try using it at work. ;-)
>
> Sorry to hear about your frustrations, and we appreciate your feedback.
>
> My overall impression is that somehow you've discovered that cvs2svn is
> not scaling properly, and somehow confused it with SVN itself. I'm not

I'm very glad to hear such a balanced and logical response to my feedback. The
fact that we failed now doesn't preclude us from trying it later on.

> In the future, I might recommend you ask for help on the users@ list
> when you're actually having problems, rather than sending us a
> post-mortem report. :-)

I shall definitely do that. I hope I'll be able to help at least a bit w/
patches and bug reports as well.

Thank you again for the most informative response.

-- 
Alexander L. Belikoff                      GPG f/pr: 0D58 A804 1AB1 4CD8 8DA9
Bloomberg L.P.                                       424B A86E CD0D 8424 2701
abel *at* vallinor4 *dot* com             (http://pgp5.ai.mit.edu for the key)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sun Mar 21 21:25:47 2004

This message: [ Message body ]
Next message: Philip Martin: "Re: [PATCH] [Issue 1751] file not switched if it is the same in source and destination"
Previous message: makl: "Re: [PATCH] [Issue 1751] file not switched if it is the same in source and destination"
In reply to: Ben Collins-Sussman: "Re: A failed attempt to introduce Subversion"
Next in thread: Folker Schamel: "Re: A failed attempt to introduce Subversion"
Reply: Folker Schamel: "Re: A failed attempt to introduce Subversion"
Reply: Martin Tomes: "Re: A failed attempt to introduce Subversion"
Reply: kfogel_at_collab.net: "Re: A failed attempt to introduce Subversion"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]