[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs2svn: raw or cooked?

From: Greg Stein <gstein_at_lyra.org>
Date: 2000-09-26 16:29:53 CEST

I've got a Python script that pulls apart a ,v file, and Jay Painter has
been threatening to write one in C. These are BSD licensed.

But given that Bob has already written a bunch of Perl, I don't see that the
Python code will be too handy :-)

btw, I agree with Jim: crawl the ,v files to get all the data; using cvs
commands just doesn't feel right and it doesn't seem like you could really
get at all the data. (but I'll admit I don't know that for sure)

It might also be nice to take this moment to define the XML interchange
format for an SVN repository. If the tool was cvs2xml, then we
could do "svn import-xml my-cvs-repository.xml". And anybody could do "svn
export-xml ..." followed by an svn import-xml somewhere else.

Just to make your job harder... :-)

Cheers,
-g

On Mon, Sep 25, 2000 at 11:38:39AM -0500, Karl Fogel wrote:
> And note that the copyright on cvs2svn doesn't have to be the same as
> the copyright on SVN itself, because it's an independent program on
> which the rest of Subversion does not depend. If it helps a lot to
> use GPL'd code, such as the librarified RCS in CVS, then that should
> be okay.
>
> -K
>
> Jim Blandy <jimb@savonarola.red-bean.com> writes:
> > I remember that RCS was "librarified" for use in CVS. Perhaps you
> > could use that library to parse the RCS files. I don't know how rich
> > its interfaces are, though.
> >
> > It seems to me that the process should be something like:
> >
> > 1) Extract the log from every ,v file in every directory in the CVS
> > repository.
> >
> > 2) Sort all those entries by commit time, preserving the filename,
> > revision, and log entry.
> >
> > There's no way to avoid building this huge sorted list, if you want
> > to be able to recognize commits made across several directories.
> > But it'll be big. If you don't want to keep it all in memory, you
> > could certainly put them in any database that supports in-order
> > traversal. Berkeley DB does, and it has a Perl interface.
> >
> > 3) Working your way from oldest to youngest, look at commits that
> > occur at approximately the same time that have approximately the
> > same log message --- each such group constitutes a single commit.
> >
> > Figuring out exactly what "approximately" means will be an
> > interesting challenge. I think a time fuzz of at least twenty
> > minutes would be good, or however long a commit can take. Your log
> > entry fuzz should refuse to draw any comparison between trivial log
> > entries (empty or very short), to avoid grouping things into
> > commits that don't belong together. It should probably ignore
> > whitespace differences, etc. Cvs2cl has logic for this that people
> > like.
> >
> > I'd suggest operating directly on the Subversion repository, using the
> > FS library. It'll be faster, and you'll have fewer components to
> > provide extraneous errors.
> >
> > You'll need to recognize branches by comparing branch tags.
> > Challenging.

-- 
Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:09 2006

This is an archived mail posted to the Subversion Dev mailing list.