[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs2svn patch in the works...

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-10-24 04:36:04 CEST

On Mon, Oct 21, 2002 at 08:27:53PM +0200, Marko Macek wrote:
>...
> - the CONFLICTS: output is strange.

Yah... I'm seeing that now when I run the tool, too. I think something in
the bindings changed and broke it (meaning something in the .i files needs
to be fixed).

> - needs an option to convert a single branch only, to a specified PATH.
> - it would be nice to have an option to specify a module, resulting in
> paths like $MODULES/{trunk,tags,branches}

My vision for this is to define a .conf file that can be used to fine-tune
the cvs2svn process. The tool should be able to provide defaults for
everything, but allow a human to refine what happens via a .conf file.

>...
> +def get_key(f, r):
> + key = f + "--" + r
> + if g_keys.has_key(key):
> + return g_keys[key]
> + g_keys[key] = key
> + return key

What is this about? Why not:

  def get_key(f, r):
    return f + '--' + r

?? Are you trying to do string interning? That a given key will use the same
object? If so, then you could just as well do:

  def get_key(f, r):
    return intern(f + '--' + r)

which works all the way back to Python 1.5.

However, both approaches will not scale properly. It seems that you'll be
creating keys for practically every file *and* revision that is found in the
conversion process. That isn't going to work :-)

>...
> @@ -315,12 +386,14 @@
> for f, r in self.changes:
> # compute a repository path. ensure we have a leading "/" and drop
> # the ,v from the file name
> - repos_path = '/' + relative_name(ctx.cvsroot, f[:-2])
> + base_name = get_branch_path(ctx, f, r)
> + repos_path = base_name + '/' + relative_name(ctx.cvsroot, f[:-2])
> print ' changing %s : %s' % (r, repos_path)
> for f, r in self.deletes:
> # compute a repository path. ensure we have a leading "/" and drop
> # the ,v from the file name
> - repos_path = '/' + relative_name(ctx.cvsroot, f[:-2])
> + base_name = get_branch_path(ctx, f, r)
> + repos_path = base_name + '/' + relative_name(ctx.cvsroot, f[:-2])

These changes, where the base paths are set, would be good to have a
separate patch; we could get that applied straight away.

>...
> @@ -444,6 +519,87 @@
> print ' CONFLICTS:', `conflicts`
> print ' new revision:', new_rev
>
> + # make a new transaction for the tags
> + rev = fs.youngest_rev(t_fs, c_pool)
> + txn = fs.begin_txn(t_fs, rev, c_pool)
> + root = fs.txn_root(txn, c_pool)
> +
> + for f, r in self.changes:
> + tag_key = get_key(f, r)
> + if g_tags.has_key(tag_key):
> + for tag in g_tags[tag_key]:
>...
> + fs.copy(t_root, repos_path, root, tag_path, f_pool)

This algorithm implies that each and every file within a specified tag
directory originates as a copy from somewhere else. That is, if you have
100,000 files in your CVS repository, and you have 100 tags, then you are
constructing 10 million copies in the SVN repository.

While SVN might (technically) be all right with that, I find it to be a bit
less than ideal... (I have this bad feeling that certain types of log
operations, or tracking change history, or something might be adversely
affected).

I believe the right approach to generate a tag is to find the single SVN
revision that most closely matches the set of files/revisions specified by
the CVS tag. Hopefully, that would be one revision. But since it is possible
that a CVS tag can be arbitrarily composed of files/revs from various points
in time, this won't always happen. For these cases, then the ideal is to
find the fewest number of copies of files/dirs/revs to construct a tag, and
to commit that.

I'm not entirely sure what the algorithm is, but part of the reason for the
multiple passes is to enable analytic processes like this, yet avoid a huge
monolithic operation.

>...
> + for f, r in self.changes:
> + branch_key = get_key(f, r)
> + if g_branches.has_key(branch_key):

Note that the branches have the same pattern of tons of copies.

>...
> + if g_tags != {}:
> + print "warning: remaining tags: ", repr(g_tags)
> + if g_branches != {}:

The comparison isn't necessary. Simply saying "if g_tags:" means "is this
dictionary non-empty?"

I'm glad you're tackling this! It is a very difficult problem... :-(

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Oct 24 04:36:06 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.