[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs2svn

From: Bob Miller <kbob_at_jogger-egg.com>
Date: 2001-04-16 23:58:11 CEST

Greg Stein wrote:

> I believe the sorting of individual revisions into groups of commits will be
> the slowest part. I'm sure they've optimized GNU sort quite a bit, but I've
> got to believe it will shudder when fed a file hundreds of megabytes in
> length. However, the primary key for that is a (hash, userid, time) tuple.
> We can do a preliminary bin-sort on the hash, using an arbitrary number of
> digits from it. For large repositories, you could end up dividing the
> average log size using three hex digits, which maps to 4096 bins. Your
> 400meg log file is now just a bunch of 100k files. Pump each through
> sort(1). The log scan process can then, effectively, do an insertion sort as
> it reads the N log files for processing.

Sort by time as primary key. You want to build the SVN repository in
chronological order, anyway. As you traverse the sequence of CVS
commits in chrono order, group those which match the grouping
heuristic into a single SVN commit.

-- 
Bob Miller                              K<bob>
kbobsoft software consulting
http://kbobsoft.com                     kbob_at_jogger-egg.com
Received on Sat Oct 21 14:36:28 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.