I'm currently looking at converting our large (~350mb) CVS repository to
subversion, learning subversion along the way.
cvs2svn happily produces a dumpfile containing ~14000 transactions:
> -rw-r--r-- 1 oliver ocstaff 708319348 Jul 16 18:53 cvs2svn-dump
Loading it via 'svnadmin load' is hideously slow, taking almost 10 hours:
> oliver@cyclone:~/svn-test$ svnadmin create repo-sync
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-sync <cvs2svn-dump
> real 561m59.668s
> user 14m1.379s
> sys 2m4.799s
Ok, so I'll use --bdb-txn-nosync:
> oliver@cyclone:~/svn-test$ svnadmin create --bdb-txn-nosync repo-no-sync
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-no-sync <cvs2svn-dump
> real 146m49.972s
> user 13m3.273s
> sys 1m36.818s
Better but still very disk-bound. Some digging with lsof/strace showed
that some fsync() calls are still done on the DB log files.
I experimented a bit with other DB options and ended up with this:
> oliver@cyclone:~/svn-test$ svnadmin create --bdb-txn-nosync repo-no-log
> oliver@cyclone:~/svn-test$ echo "set_flags DB_TXN_NOT_DURABLE" >>repo-no-log/db/DB_CONFIG
> oliver@cyclone:~/svn-test$ svnadmin recover repo-no-log
> Please wait; recovering the repository may take some time...
> Recovery completed.
> The latest repos revision is 0.
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-no-log <cvs2svn-dump
> real 26m40.620s
> user 12m40.711s
> sys 1m9.318s
That's more like what I originally expected!
The system these all ran on (cyclone) is a dual Athlon/MP 2800+, 2GB
RAM. The OS is Debian stable with a 2.6.5 Linux kernel, and subversion
is 1.0.5 as packaged in Debian unstable:
> ||/ Name Version Description
> ii subversion 1.0.5-1 Advanced version control system (aka. svn)
> ii libdb4.2 4.2.52-16 Berkeley v4.2 Database Libraries [runtime]
The subversion repositories are on an ext3 filesystem on a commodity IDE
disk with the disk's write-caching disabled.
So, some questions:
1) Is using DB_TXN_NOT_DURABLE during the initial load a sane thing to
do? I don't care about recovery from failures during the load at all --
I'd just restart from scratch if something did go wrong.
2) Is it normal for fsync() to still be called when --bdb-txn-nosync in use?
3) Is an option to use DB_TXN_NOT_DURABLE for the duration of a
'svnadmin load' a good idea?
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Sat Jul 17 03:32:39 2004