[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn log on FSFS - disk IO for revs/revprops

From: Hyrum K. Wright <hyrum_at_hyrumwright.org>
Date: Fri, 26 Jun 2009 10:15:12 -0500

On Jun 25, 2009, at 6:42 PM, Johan Corveleyn wrote:

> Hi all,
>
> Disclaimer: if this isn't the right forum to discuss this, please
> let me know. But I don't think this is a "users" kind of question,
> so I'm addressing the dev list. Also: I don't know much about svn
> internals, so forgive me for making any false assumptions or stupid
> suggestions ...
>
> I have looked at some of the disk IO that the svn server performs
> when it's asked for an "svn log" (I wanted to see what we could do
> on OS/disk/NFS level to improve it). For this, we used rwsnoop (http://www.manpagez.com/man/1/rwsnoop/
> ), while running "svn log" for a file with lots of changes.
>
> Server:
> 1.5.4 on Solaris 10 (binaries from sunfreeware.com)
> FSFS backend on NFS mounted volume
>
> Exact client command: "svn log URL/file_with_lots_of_changes"
> The file has had 5500 revisions. It takes ~4 minutes to retrieve its
> entire log.
>
> Here is an excerpt of rwsnoop output:
> ------
> UID PID CMD D BYTES FILE
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 64 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 276 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 276 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 412 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 572 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 3699 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 3871 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58301
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 131 /path/to/repos/db/revprops/
> 58/58301
> 24435 9896 httpd R 64 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 275 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 275 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 411 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 570 /path/to/repos/db/revs/58/58258
> 24435 9896 httpd R 3694 /path/to/repos/db/revs/58/58258
> ...
> 24435 9896 httpd R 4096 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 118 /path/to/repos/db/revprops/
> 58/58258
> 24435 9896 httpd R 64 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 275 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 275 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 411 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 570 /path/to/repos/db/revs/58/58255
> 24435 9896 httpd R 3694 /path/to/repos/db/revs/58/58255
> ...
> ------
>
> I have some questions about this:
> - Why are the rev files read (multiple times over -> see below),
> although the log info is contained in the reprops files (which are
> also read)? These reads must be very costly. Is it because that's
> the only way for the server to reconstruct the line of history (find
> out which revision to look up next)? Note that we didn't run "svn
> log -v", so no need to look up related paths.
>
> - I guess first some headers are read, because the first couple of
> reads seem small (apart from the very first one) and more or less
> fixed in size. Right? Maybe this can be optimized?
>
> - It seems the read buffer is limited to 4K. Is there a way to
> increase this buffer (apart from patching source code and building
> my own custom server)? Is this somehow configurable? Note: NFS mount
> options were set to rsize=8192 - other commands (e.g. grep) read
> blocks of 8192 bytes.
>
> - And then the biggest mystery for me: much more bytes are read from
> the rev files than necessary. I guess the rev files must be read
> multiple times (>= 5 times)??
> Total bytes read for rev 58301, according to rwsnoop: 62418
> -----
> $ ls -l /path/to/repos/db/revs/58/58301
> -rw-rw---- 1 owner grp 12137 May 9 08:30 58301
> -----
>
> Thanks for any insight anyone can offer.
>
> Some context: I'm very much interested in improving performance of
> "svn log" (and also blame, but that's for another post). IMHO, the
> context/history of changes (especially with a large and long-lived
> codebase) is extremely important, and should be as accessible as
> possible. In my view, that's one of the most important
> responsibilities of a VCS. All in the spirit of "developers spend
> much more of their time reading (and understanding) code, than
> writing it".

Have you tried this with 1.6.x? 1.6.0 introduced some additional
caching of information read from revision files, which should decrease
the number of i/o operations.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365737
Received on 2009-06-26 17:15:32 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.