Re: cvs2svn: raw or cooked?

From: Jim Blandy <jimb_at_savonarola.red-bean.com>
Date: 2000-09-23 19:09:45 CEST

I remember that RCS was "librarified" for use in CVS. Perhaps you
could use that library to parse the RCS files. I don't know how rich
its interfaces are, though.

It seems to me that the process should be something like:

1) Extract the log from every ,v file in every directory in the CVS
repository.

2) Sort all those entries by commit time, preserving the filename,
revision, and log entry.

   There's no way to avoid building this huge sorted list, if you want
   to be able to recognize commits made across several directories.
   But it'll be big. If you don't want to keep it all in memory, you
   could certainly put them in any database that supports in-order
   traversal. Berkeley DB does, and it has a Perl interface.

3) Working your way from oldest to youngest, look at commits that
occur at approximately the same time that have approximately the
same log message --- each such group constitutes a single commit.

   Figuring out exactly what "approximately" means will be an
   interesting challenge. I think a time fuzz of at least twenty
   minutes would be good, or however long a commit can take. Your log
   entry fuzz should refuse to draw any comparison between trivial log
   entries (empty or very short), to avoid grouping things into
   commits that don't belong together. It should probably ignore
   whitespace differences, etc. Cvs2cl has logic for this that people
   like.

I'd suggest operating directly on the Subversion repository, using the
FS library. It'll be faster, and you'll have fewer components to
provide extraneous errors.

You'll need to recognize branches by comparing branch tags.
Challenging.
Received on Sat Oct 21 14:36:08 2006

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: berkeley db header files"
Previous message: Jim Blandy: "Re: berkeley db header files"
Maybe in reply to: Bob Miller: "cvs2svn: raw or cooked?"
Next in thread: Karl Fogel: "Re: cvs2svn: raw or cooked?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]