[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs2svn: raw or cooked?

From: Jim Blandy <jimb_at_savonarola.red-bean.com>
Date: 2000-09-23 19:09:45 CEST

I remember that RCS was "librarified" for use in CVS. Perhaps you
could use that library to parse the RCS files. I don't know how rich
its interfaces are, though.

It seems to me that the process should be something like:

1) Extract the log from every ,v file in every directory in the CVS
   repository.

2) Sort all those entries by commit time, preserving the filename,
   revision, and log entry.

   There's no way to avoid building this huge sorted list, if you want
   to be able to recognize commits made across several directories.
   But it'll be big. If you don't want to keep it all in memory, you
   could certainly put them in any database that supports in-order
   traversal. Berkeley DB does, and it has a Perl interface.

3) Working your way from oldest to youngest, look at commits that
   occur at approximately the same time that have approximately the
   same log message --- each such group constitutes a single commit.

   Figuring out exactly what "approximately" means will be an
   interesting challenge. I think a time fuzz of at least twenty
   minutes would be good, or however long a commit can take. Your log
   entry fuzz should refuse to draw any comparison between trivial log
   entries (empty or very short), to avoid grouping things into
   commits that don't belong together. It should probably ignore
   whitespace differences, etc. Cvs2cl has logic for this that people
   like.

I'd suggest operating directly on the Subversion repository, using the
FS library. It'll be faster, and you'll have fewer components to
provide extraneous errors.

You'll need to recognize branches by comparing branch tags.
Challenging.
Received on Sat Oct 21 14:36:08 2006

This is an archived mail posted to the Subversion Dev mailing list.