[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn log slowness

From: Johan Corveleyn <johan.corveleyn_at_uz.kuleuven.ac.be>
Date: Wed, 29 Apr 2009 16:28:24 +0200

I know that this isn't earth-shattering news, and it's not really a show-stopper, but it's bugging me: "svn log" is slow.

We're in the process of migrating from CVS to SVN (1.5.4 on Solaris 10), and this is one of the issues that caught my attention. Don't get me wrong, I really like SVN, and I'm convinced it's a Good Thing for us to migrate. It's just one of those things that might annoy some people (especially since we request the log of some files quite often (or more particularly: "Show history" in IntelliJ)).

Now, before you all bombard me with "use svnserve, it's much faster" or "make sure SVNPathAuthz is off in your Apache config": yes, I've read the mailinglist archives, and no, that doesn't help. I have tried both svnserve and mod_dav_svn, both with SVNPathAuthz on and off (and with/without SSL). I have also tried local access via the file:// protocol. There are some small differences, but nothing remotely serious.

More concretely: I'm asking the log of a particular file that we change a lot in development. It's an XML file of about 2 Mb that's had about 5500 changes (!) over the years. It's been there since r1210, and has had its last change in r95848. The file is still changed several times a day. Asking the log for a file with a small number of changes (say < 100) is ok, it's just when the file has thousands of changes.

Some info about my repo:
SVN 1.5.4 on Solaris 10 (used package from sunfreeware.com)
Server hardware: Sun SPARC-Enterprise-T5120 with 32 processors (I don't think that matters in this test, since we'll exercise only 1 processor)
FSFS backend mounted via NFS from a NetApp device (supposedly very high-end equipment)
~95000 revisions
~70000 files
~3.5 Gb disk usage by the repository ("du -rsh <repo>")
Some statistics from a couple of tests (all executed on the SVN server itself with its svn command line client, or via SlikSVN command line client from my remote windows machine; doesn't make much of a difference):
1) file://
3m45s ("warmup", I'm guessing some of the IO gets cached hereafter)
2) http (SVNPathAuthz on) 
(executed shortly after file:// test, so we see no warmup effect)
3) https (SVNPathAuthz off) 
(executed shortly after http:// test, so we see no warmup effect)
4) svn+ssh
3m30s (warmup again, I tested this an hour later)
5) CVS log for the same file (for comparison)
- In each test, I redirected all output to /dev/null, to eliminate any performance impact of writing to stdout.
- About the "warmup": I guess this is because the operation is mainly IO bound on the server. The first time all the revisions (or revprops?) need to be read from disk. The second time, they are (partly) in disk cache.
- I'm guessing that the SVNPathAuthz accounts for the extra 12-15 seconds in the case of the http test (compared to the https test without SVNPathAuthz).
So the SVN record is 1m23s, over https (no SVNPathAuthz), and only after the "warmup" (i.e. if you're lucky that someone else has requested the same log less than an hour ago). If you get the "warmup" hit, you have to wait for at least 3,5 minutes to get the full log. Compare that to the 5 seconds max that we were used to with CVS. Just to re-iterate: that doesn't mean we're ditching SVN in favor of CVS (quite the contrary), but it still hurts :(.
Are these normal numbers? Anyone else seeing these sort of figures with files with thousands of changes in a large repo? Is there anything that I can do about this? Any suggestions on further diagnosis or options I can try? 
I guess I can still try with BDB, or FSFS with the repo on a local disk (this is not really an option for us on the longer term) to eliminate some potential bottlenecks. Or test with 1.6.1 and packed shards (or memcached maybe, anyone have any experience with that?). However, I feel that these will give me only minor improvements at best (never a factor 50 which I would like). Gut feeling tells me this is really a limitation of the way SVN works currently. Compared to CVS, which just has to extract and send a part of the RCS file, I guess SVN has to crawl the entire repository to get all the info. 
I seem to remember having read some discussion about buffering/caching in the SVN repository some of the metadata with the files that are affected to speed up things like this (sorry, can't find it again, but I think it was in the comments of some bug report). However, I think the idea of caching was rejected. But if such caching would solve this issue, I would really like to see such a feature appear in SVN. Or are there other dev ideas which could improve this?
Oh, and one more thing: some "workarounds" I've kind of eliminated:
- Using "--limit 100" and the like: not really an option for me, because we use it via the IntelliJ IDE (its subversion plugin uses SVNKit), so we have no direct access to the commands IntelliJ executes.
- I guess we could rename (svn mv) the current file to "file.old.2009" and start with a clean history with a new file. But this would be a pain in the beginning (you'll have to look sometimes at the log of file.old.2009 to get the right info; and after a year the new file will again have 1000 revisions, so its "svn log" will start to slow down again).
Thanks for reading, and sorry for the long post (I felt I could just as well give most details immediately).
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-29 16:29:45 CEST

This is an archived mail posted to the Subversion Users mailing list.