[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Speeding up cvs2svn (was Re: cvs2svn takes very long time to execute (days!))

From: Tobias Ringström <tobias_at_ringstrom.mine.nu>
Date: 2004-02-16 01:44:47 CET

Roland Dreier wrote:
> kfogel> I feel funny about using in-memory hashes. cvs2svn.py
> kfogel> should scale well by default. Do you plan to
> kfogel> automagically switch to a disk database if the hash count
> kfogel> exceeds a certain magic number?
>
> Yes, I agree. Keeping the disk database but using a much larger cache
> gives a huge performance boost while still handling big repositories.

Sure, but you used a ram disk to store the databases, which is *a lot*
less efficient than avoiding the db altogether. Using on-disk bsddb
with a large cache makes a lot of sense of course. Here's some numbers
from my smallish test case:

        Plain trunk cvs2svn : 558 s
         bsddb with large cache : 347 s
        In-memory hash : 110 s

This is just one repository. As soon as I'm convinced that the
in-memory code is correct, I'll post it so you can try it out yourself.

> I sort of feel that refinecvs is the "keep everything in memory" tool,
> and cvs2svn is the "handle repositories too big for memory" tool.

For clarity, it's just the repository structure that you need to keep in
memory, not the file contents.

/Tobias

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 16 01:45:03 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.