[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: First SVN performance data

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Fri, 14 May 2010 01:32:20 +0200

On Wed, May 12, 2010 at 8:07 PM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> Bottom line:
> * SVN servers tend to be CPU-limited
> (we already observed that problem @ our company
> with SVN 1.4)

Like Andrew Bolstridge said, if you have superfast I/O like you do in
your test setup (4 SSD's in RAID-0), I suppose it's normal that I/O
isn't the bottleneck anymore. But if SVN requires such storage
solutions to be fast, and that's the only way you can bring an SVN
server to stress the CPU, that's a strong indication to me that it's
very I/O sensitive.

If you're using an NFS connected SAN, it's a whole different ballgame.
Ok, that may not be the fastest way to run an SVN server, but I think
it's often set up that way in a lot of companies, for various
practical reasons. I'd like SVN to be fast in this setup as well (as
are many other server applications).

> * packed repositories are ~20% faster than non-packed,
> non-sharded

I think a better comparison would be between packed and non-packed
(but still sharded). And preferably same server version. Just to focus
on the difference between packed and non-packed (that's what I did in
my tests).

Also, I don't see those ~20% in your test numbers (more like ~5%), but
maybe you have other numbers that show this difference?

I must say that, when I tested on a server with SSD disk, I also saw
something like 5% improvement. OTOH, on my server with NFS/SAN, I saw
a ~5% performance decrease (maybe that's because of some extra file
opens, which are costly in this setup).

So again, it depends a lot on the storage part of the setup. All in
all, packing is not really a big win for performance (but it may be
better for backups and such).

> * optimal file cache size is roughly /trunk size
> (plus branch diffs, but that is yet to be quantified)
> * "cold" I/O from a low-latency source takes 2 .. 3 times
> as long as from cached data

Ok, but unless you can get almost the entire repository in cache,
that's not very useful IMHO. In my tests, I mainly focused on the
"first run", because I want my server to be fast with a cold cache.
Because that's most likely the performance that my users will get.
It's a busy repository, with different users hitting different parts
of the repository all the time. I just don't think there will be a lot
of cache hits during a normal working day.

Also, if the test with cached data is 2-3 times faster than from the
SSD RAID-0, that's another indication to me that there's a lot of time
spent in I/O. And where there's a lot of time spent, there is
potentially a lot of time to be saved by optimizing.

> * a fully patched 1.7 server is twice as fast as 1.6.9
>
> "Export" has been chosen to eliminate problems
> with client-side w/c performance.

I mainly focused on log and blame (and checkout/update to a lesser
degree), so that may be one of the reasons why we're seeing it
differently :-). I suppose the numbers, bottlenecks, ... totally
depend on the use case (as well as the hardware/network setup).

That said, I'm very happy that you are working on optimizing the code,
and I certainly encourage you to keep going. All performance
improvements are extremely welcome, I think.

I'll try to look up my old test numbers again, and post them here. It
might be interesting to compare notes :-).

Some more answers to Bert's post below ...

On Thu, May 13, 2010 at 4:31 PM, Bert Huijben <bert_at_vmoo.com> wrote:
> Michael Pilato and Hyrum Wright interviewed some enterprise users earlier
> this year and wrote some reports which indicated that the network latency
> and working copy performance were the true bottlenecks.

Let's assume WC performance will soon be a solved problem thanks to
the great development efforts going on now :-). So we're totally
ignoring that. Whether or not network latency is important depends
heavily on the situation, but I can understand it's a big bottleneck
for the larger companies out there (it's not a problem in our case,
all devs on a gigabit LAN).

> If I look at
> ^/subversion/trunk/notes/feedback, I see checkout, log, merging as primary
> performance issues and this matches the performance issues I see in my day
> to day use of repositories on the other side of the world.

Ok, so you agree log is one of the important performance issues. That
one is very much I/O bound on the server (as I described before,
opening and closing rev files multiple times).

Cheers,

-- 
Johan
Received on 2010-05-14 01:32:50 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.