[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn status benchmark

From: John Williams <williams_at_tni.com>
Date: 2004-12-15 23:03:38 CET

Here are some benchmarks of subversion on a large web project.
Note that I have not studied the svn internals at all, so any
implementation remarks below are pure uninformed speculation.

$ svn --version
svn, version 1.1.1 (r11581)
   compiled Oct 29 2004, 11:25:56

Linux 2.4.22-1.2197.nptl (Fedora Core release 1)

OK, here's my project:

$ find . -name .svn -prune -o -type f -print | wc -l
   40908

40,908 files, excluding .svn directories and contents.

$ find . -name .svn | wc -l
   5334

5,334 directories, each with it's own .svn directory and contents.

$ time svn status
real 5m46.091s
user 0m30.770s
sys 0m12.570s

I'm running over an iscsi link, which is a bit slow, so your results
may not be quite this bad.

$ strace -c svn status
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 84.21 75.583629 1341 56380 read
  7.92 7.108835 24 294815 21319 lstat64
  5.43 4.873536 27 182219 16016 stat64
  1.74 1.563369 28 56079 54 open
  0.37 0.332306 6 56026 close
  0.26 0.233075 22 10690 getdents64
  0.02 0.019950 4 5385 fstat64
  0.02 0.017812 19 926 brk
  0.02 0.017182 3 5346 fcntl64
  0.00 0.001284 15 86 old_mmap
  0.00 0.000184 14 13 munmap
  0.00 0.000160 13 12 mmap2
  0.00 0.000090 30 3 write
  0.00 0.000047 47 1 1 connect
  0.00 0.000042 42 1 socket
  0.00 0.000022 4 5 rt_sigaction
  0.00 0.000012 4 3 getuid32
  0.00 0.000011 11 1 set_thread_area
  0.00 0.000008 8 1 uname
  0.00 0.000006 2 3 getgid32
  0.00 0.000006 6 1 futex
  0.00 0.000004 4 1 lseek
  0.00 0.000003 3 1 getrlimit
  0.00 0.000003 3 1 set_tid_address
  0.00 0.000002 2 1 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 89.751578 668000 37390 total

This 89 seconds corresponds roughly to the 12.5 sys time above, but includes
benchmarking overhead, and probably some wait-for-io time which was not
included by the time command. The total real time for "strace svn status" is
about 7m30s.

Things to note:
  read calls (not stats) are the big bottleneck, according to this
  477034 stat calls (roughly 10x the number of files+dirs in the project)
  10% error rate on stat calls is checks for non-existing .svn/* files

Now for a closer look at the strace output:

$ strace -tt svn status
12:47:47.630083 execve("/usr/local/bin/svn"

... opening shared libs, etc, etc, etc...

This is where the work starts. 0.1 secs for the startup time above.

12:47:47.756639 open(".svn/format", O_RDONLY) = 3
12:47:47.756791 read(3, "4\n", 80) = 2
12:47:47.756933 close(3) = 0
12:47:47.757107 open(".svn/entries", O_RDONLY) = 3
12:47:47.757302 read(3, "<?xml version=\"1.0\" encoding=\"ut"..., 8192) = 1776
12:47:47.757466 read(3, "", 6416) = 0
12:47:47.758141 close(3) = 0

These 7 calls are repeated verbatim for every .svn directory in the project.
I suspect this is part of a locking procedure. Hopefully it is using the
information collected for something, but there is no corresponding
unlocking procedure at the end.

And here is where we start stat-ing files. Elapsed time: 2m 08s

12:49:56.616663 lstat64(".", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
12:49:56.616883 stat64(".svn/dir-props", {st_mode=S_IFREG|0444, st_size=123, ...}) = 0

I'll skip ahead to a more typical case...

This is typical for each directory.
.svn/dir-props is stat-ed 5 times (and it doesn't exist every time).

12:49:56.651402 lstat64("dir", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
12:49:56.651647 open("dir", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
12:49:56.651811 fstat64(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
12:49:56.651984 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
12:49:56.652118 getdents64(3, /* 9 entries */, 4096) = 304
12:49:56.652490 getdents64(3, /* 0 entries */, 4096) = 0
12:49:56.652627 close(3) = 0
12:49:56.652796 lstat64("dir/.svn/dir-props", 0xbff493ac) = -1 ENOENT (No such file or directory)
12:49:56.653007 lstat64("dir/.svn/dir-props", 0xbff493ec) = -1 ENOENT (No such file or directory)
12:49:56.653212 lstat64("dir", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
12:49:56.653455 stat64("dir/.svn/dir-props", 0xbff4937c) = -1 ENOENT (No such file or directory)
12:49:56.653696 stat64("dir/.svn/dir-props", 0xbff492dc) = -1 ENOENT (No such file or directory)
12:49:56.653923 stat64("dir/.svn/dir-prop-base", 0xbff492dc) = -1 ENOENT (No such file or directory)
12:49:56.654179 lstat64("dir/.svn/dir-props", 0xbff492fc) = -1 ENOENT (No such file or directory)
12:49:56.654381 lstat64("dir/.svn/lock", 0xbff4935c) = -1 ENOENT (No such file or directory)

Here is a typical file.
Most stats calls are repeated twice in a row for some reason.
The file itself is stat-ed 4 times.
.svn/props/file.html.svn-work is stat-ed 4 times.

12:49:56.654588 lstat64("dir/file.html", {st_mode=S_IFREG|0664, st_size=635, ...}) = 0
12:49:56.654825 lstat64("dir/file.html", {st_mode=S_IFREG|0664, st_size=635, ...}) = 0
12:49:56.655082 stat64("dir/.svn/props/file.html.svn-work", {st_mode=S_IFREG|0444, st_size=4, ...}) = 0
12:49:56.655359 stat64("dir/.svn/props/file.html.svn-work", {st_mode=S_IFREG|0444, st_size=4, ...}) = 0
12:49:56.655603 stat64("dir/.svn/prop-base/file.html.svn-base", {st_mode=S_IFREG|0444, st_size=4, ...}) = 0
12:49:56.655894 lstat64("dir/.svn/props/file.html.svn-work", {st_mode=S_IFREG|0444, st_size=4, ...}) = 0
12:49:56.656144 lstat64("dir/.svn/props/file.html.svn-work", {st_mode=S_IFREG|0444, st_size=4, ...}) = 0
12:49:56.656386 open("dir/.svn/props/file.html.svn-work", O_RDONLY) = 3
12:49:56.656568 read(3, "END\n", 4096) = 4
12:49:56.666778 close(3) = 0
12:49:56.666944 lstat64("dir/file.html", {st_mode=S_IFREG|0664, st_size=635, ...}) = 0
12:49:56.667194 lstat64("dir/file.html", {st_mode=S_IFREG|0664, st_size=635, ...}) = 0

And we are done. 5m 23s for the file stat-ing.

12:55:19.696356 exit_group(0) = ?

Recommendations:

Reduce the number of duplicated stat calls. Most stat calls are repeated twice
in a row. Many files are stat-ed 4 or 5 times.

Every .svn/format file contains the same 2 bytes, but we open and read every one.
Finding a way to eliminate this open/read (say, by putting the information
in the .svn/entries file) would speed up the first phase by 2x.

I'm not sure what that first phase is doing, but if it cannot be eliminated,
it might help the user's perception if the directory tree was only crawled
once, doing both tasks at the same time.

Every .svn/props/*.svn-work file is being opened and read. This contrasts with
the .svn/prop-base/*.svn-base file, which is only stat-ed.
At the very least, this open could be optimized away for the common case, by
checking if the file size (from the stat call) is 4 bytes (or 5 on windows),
and assuming it only contains the string "END\n". Or if the file simply
ended at the end of the file, one could just check for size==0.

~ John Williams

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Dec 15 23:06:11 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.