Re: cvs2svn and branches (long)

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2003-03-10 21:16:55 CET

Greg Stein <gstein@lyra.org> writes:
> > At the end of this pass, we have every branch name in memory, and we
> > have a dbm file for each branch, indicating, for each filepath in the
> > repository, the range of Subversion revisions from which that branch
> > *could* be copied.
> >
> > Now we just have to loop over the branches, finding the "ideal"
> > Subversion revision, that is, the revision which if used to create the
> > branch, will necessitate the smallest number of manual secondary
> > copies, perhaps even zero.
>
> Yup, and this algorithm is independent of the file names, which is why I
> suggest losing that part of the data structure and the need for the dbm.

We don't need the paths at this final stage; but we need them in order
to build up the information to get to this stage. To determine the
range of Subversion revisions in which a given CVS file qrevision
lives, we have to keep track of the individual file changes -- because
it's the arrival of the "next" CVS commit on that file that allows us
to fix the endpoint of the range.

Sure, after we have all these ranges, we can determine the "best"
Subversion revision for the copy. But to get the ranges, we need to
essentially use the paths as keys in a hash table.

(I can explain the process in more detail, if it's not clear.)

> Your algorithm increases the count in all appropriate ranges for a given
> file. In mine, a file would only go into one range. This implies that the
> algorithm might select the "wrong" one (it went into a range referring to 20
> items rather than the big 1000 item range). The second pass would shift
> items into the larger range. Your algorithm does it in one pass.

Yah. The two-pass might still be faster than the one-pass, though.
Can you describe it briefly in writing? That will probably be enough
to stimulate my memory. I tried to re-derive it, but I was unable to
come up with something that would work correctly. Probably I was
forgetting some crucial step.

> Let's also say that /trunk/some/dir is a directory with 11 items in it --
> one item from bucket (A) (named "fname0") and the other ten ("fname1"
> through "fname10") are all (B). The algorithm would perform the following
> operations:
>
> svn cp /trunk@103 /branches/BRANCH
> svn cp /trunk/some/dir/fname1@107 /branches/BRANCH/some/dir
> ...
> svn cp /trunk/some/dir/fname10@107
>
> But the ideal behavior is:
>
> svn cp /trunk@103 /branches/BRANCH
> svn cp /trunk/some/dir@107 /branches/BRANCH/some/dir
> svn cp /trunk/some/dir/fname0@103 /branches/BRANCH/some/dir
>
> Beats the crap out of me how to do *that* algorithm :-), but I wanted to
> describe the scenario so that we can document the potential occurrence.

Yup :-). There will probably always be incremental improvements
possible. But in the meantime, a few extra copies won't hurt so much,
as long as we get the big initial copy right.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 10 21:53:58 2003

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: checksumming weird in 0.18"
Previous message: Branko ÄŒibej: "Re: 0.19 tarball"
In reply to: Greg Stein: "Re: cvs2svn and branches (long)"
Next in thread: Marko Macek: "Re: cvs2svn and branches (long)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]