[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn log on FSFS - disk IO for revs/revprops

From: Johan Corveleyn <johan.corveleyn_at_uz.kuleuven.ac.be>
Date: Fri, 26 Jun 2009 01:42:50 +0200

Hi all,

Disclaimer: if this isn't the right forum to discuss this, please let me know. But I don't think this is a "users" kind of question, so I'm addressing the dev list. Also: I don't know much about svn internals, so forgive me for making any false assumptions or stupid suggestions ...

I have looked at some of the disk IO that the svn server performs when it's asked for an "svn log" (I wanted to see what we could do on OS/disk/NFS level to improve it). For this, we used rwsnoop (http://www.manpagez.com/man/1/rwsnoop/), while running "svn log" for a file with lots of changes.

Server:
1.5.4 on Solaris 10 (binaries from sunfreeware.com)
FSFS backend on NFS mounted volume

Exact client command: "svn log URL/file_with_lots_of_changes"
The file has had 5500 revisions. It takes ~4 minutes to retrieve its entire log.

Here is an excerpt of rwsnoop output:
------
UID PID CMD D BYTES FILE
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 64 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 276 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 276 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 412 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 572 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 3699 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 3871 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 131 /path/to/repos/db/revprops/58/58301
24435 9896 httpd R 64 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 275 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 275 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 411 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 570 /path/to/repos/db/revs/58/58258
24435 9896 httpd R 3694 /path/to/repos/db/revs/58/58258
...
24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 118 /path/to/repos/db/revprops/58/58258
24435 9896 httpd R 64 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 275 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 275 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 411 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 570 /path/to/repos/db/revs/58/58255
24435 9896 httpd R 3694 /path/to/repos/db/revs/58/58255
...
------

I have some questions about this:
- Why are the rev files read (multiple times over -> see below), although the log info is contained in the reprops files (which are also read)? These reads must be very costly. Is it because that's the only way for the server to reconstruct the line of history (find out which revision to look up next)? Note that we didn't run "svn log -v", so no need to look up related paths.

- I guess first some headers are read, because the first couple of reads seem small (apart from the very first one) and more or less fixed in size. Right? Maybe this can be optimized?

- It seems the read buffer is limited to 4K. Is there a way to increase this buffer (apart from patching source code and building my own custom server)? Is this somehow configurable? Note: NFS mount options were set to rsize=8192 - other commands (e.g. grep) read blocks of 8192 bytes.

- And then the biggest mystery for me: much more bytes are read from the rev files than necessary. I guess the rev files must be read multiple times (>= 5 times)??
Total bytes read for rev 58301, according to rwsnoop: 62418
-----
$ ls -l /path/to/repos/db/revs/58/58301
-rw-rw---- 1 owner grp 12137 May 9 08:30 58301
-----

Thanks for any insight anyone can offer.

Some context: I'm very much interested in improving performance of "svn log" (and also blame, but that's for another post). IMHO, the context/history of changes (especially with a large and long-lived codebase) is extremely important, and should be as accessible as possible. In my view, that's one of the most important responsibilities of a VCS. All in the spirit of "developers spend much more of their time reading (and understanding) code, than writing it".

Regards,
Johan

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365532
Received on 2009-06-26 08:11:39 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.