[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Speeding up cvs2svn (was Re: cvs2svn takes very long time to execute (days!))

From: Tobias Ringström <tobias_at_ringstrom.mine.nu>
Date: 2004-02-16 01:44:47 CET

Roland Dreier wrote:
> kfogel> I feel funny about using in-memory hashes. cvs2svn.py
> kfogel> should scale well by default. Do you plan to
> kfogel> automagically switch to a disk database if the hash count
> kfogel> exceeds a certain magic number?
> Yes, I agree. Keeping the disk database but using a much larger cache
> gives a huge performance boost while still handling big repositories.

Sure, but you used a ram disk to store the databases, which is *a lot*
less efficient than avoiding the db altogether. Using on-disk bsddb
with a large cache makes a lot of sense of course. Here's some numbers
from my smallish test case:

        Plain trunk cvs2svn : 558 s
         bsddb with large cache : 347 s
        In-memory hash : 110 s

This is just one repository. As soon as I'm convinced that the
in-memory code is correct, I'll post it so you can try it out yourself.

> I sort of feel that refinecvs is the "keep everything in memory" tool,
> and cvs2svn is the "handle repositories too big for memory" tool.

For clarity, it's just the repository structure that you need to keep in
memory, not the file contents.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 16 01:45:03 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.