[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs2svn

From: Bob Miller <kbob_at_jogger-egg.com>
Date: 2001-04-17 00:26:57 CEST

Greg Stein wrote:

> When you're sorting the information into bins, you have no concept of a
> group at that point, so you cannot see something spanning bins. You would
> need to detect the case in the next step, where you're identifying groups.
> But then you have the nasty situation that your changeset is in two
> different bins.

Nah. Simply fetch all the commits through an iterator that knows how
to move on to the next bin. The commit grouping logic won't even
be aware that there's more than one bin.

> But all of this is probably moot. It is premised on gsort rolling over on a
> file hundreds of megabytes in size. 1) we don't know what final logs sizes
> will be for large repositories, 2) we don't know if gsort truly barfs (in
> fact, jimb just posted that he doesn't think it will).

My experience was that the logs were similar in size to the repository
they came from. But:

        a) I was doing it in memory, with overhead for pointers,
           malloc cruft, perl strings, etc.

        b) I was storing both the user names and the log messages
           in plain text. Using hashes will make a big difference.
           (comparisons will be faster, too.)

-- 
Bob Miller                              K<bob>
kbobsoft software consulting
http://kbobsoft.com                     kbob_at_jogger-egg.com
Received on Sat Oct 21 14:36:28 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.