On Wed, May 12, 2010 at 8:07 PM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> Bottom line:
> * SVN servers tend to be CPU-limited
> (we already observed that problem @ our company
> with SVN 1.4)
Like Andrew Bolstridge said, if you have superfast I/O like you do in
your test setup (4 SSD's in RAID-0), I suppose it's normal that I/O
isn't the bottleneck anymore. But if SVN requires such storage
solutions to be fast, and that's the only way you can bring an SVN
server to stress the CPU, that's a strong indication to me that it's
very I/O sensitive.
If you're using an NFS connected SAN, it's a whole different ballgame.
Ok, that may not be the fastest way to run an SVN server, but I think
it's often set up that way in a lot of companies, for various
practical reasons. I'd like SVN to be fast in this setup as well (as
are many other server applications).
> * packed repositories are ~20% faster than non-packed,
> non-sharded
I think a better comparison would be between packed and non-packed
(but still sharded). And preferably same server version. Just to focus
on the difference between packed and non-packed (that's what I did in
my tests).
Also, I don't see those ~20% in your test numbers (more like ~5%), but
maybe you have other numbers that show this difference?
I must say that, when I tested on a server with SSD disk, I also saw
something like 5% improvement. OTOH, on my server with NFS/SAN, I saw
a ~5% performance decrease (maybe that's because of some extra file
opens, which are costly in this setup).
So again, it depends a lot on the storage part of the setup. All in
all, packing is not really a big win for performance (but it may be
better for backups and such).
> * optimal file cache size is roughly /trunk size
> (plus branch diffs, but that is yet to be quantified)
> * "cold" I/O from a low-latency source takes 2 .. 3 times
> as long as from cached data
Ok, but unless you can get almost the entire repository in cache,
that's not very useful IMHO. In my tests, I mainly focused on the
"first run", because I want my server to be fast with a cold cache.
Because that's most likely the performance that my users will get.
It's a busy repository, with different users hitting different parts
of the repository all the time. I just don't think there will be a lot
of cache hits during a normal working day.
Also, if the test with cached data is 2-3 times faster than from the
SSD RAID-0, that's another indication to me that there's a lot of time
spent in I/O. And where there's a lot of time spent, there is
potentially a lot of time to be saved by optimizing.
> * a fully patched 1.7 server is twice as fast as 1.6.9
>
> "Export" has been chosen to eliminate problems
> with client-side w/c performance.
I mainly focused on log and blame (and checkout/update to a lesser
degree), so that may be one of the reasons why we're seeing it
differently :-). I suppose the numbers, bottlenecks, ... totally
depend on the use case (as well as the hardware/network setup).
That said, I'm very happy that you are working on optimizing the code,
and I certainly encourage you to keep going. All performance
improvements are extremely welcome, I think.
I'll try to look up my old test numbers again, and post them here. It
might be interesting to compare notes :-).
Some more answers to Bert's post below ...
On Thu, May 13, 2010 at 4:31 PM, Bert Huijben <bert_at_vmoo.com> wrote:
> Michael Pilato and Hyrum Wright interviewed some enterprise users earlier
> this year and wrote some reports which indicated that the network latency
> and working copy performance were the true bottlenecks.
Let's assume WC performance will soon be a solved problem thanks to
the great development efforts going on now :-). So we're totally
ignoring that. Whether or not network latency is important depends
heavily on the situation, but I can understand it's a big bottleneck
for the larger companies out there (it's not a problem in our case,
all devs on a gigabit LAN).
> If I look at
> ^/subversion/trunk/notes/feedback, I see checkout, log, merging as primary
> performance issues and this matches the performance issues I see in my day
> to day use of repositories on the other side of the world.
Ok, so you agree log is one of the important performance issues. That
one is very much I/O bound on the server (as I described before,
opening and closing rev files multiple times).
Cheers,
--
Johan
Received on 2010-05-14 01:32:50 CEST