[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: [Tony Butt: Trials with memcached]

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Sat, 09 Jul 2011 15:02:25 +0200

On 08.07.2011 01:56, Daniel Shahaf wrote:
> FYI from users@
> ----- Forwarded message from Tony Butt<Tony.Butt_at_cea.com.au> -----
> Date: Wed, 6 Jul 2011 15:20:27 +1000
>> We are running subversion 1.6.17 on a vmware hosted server. We recently
>> reconfigured the server to give 4 virtual CPUs (up from 1), and a
>> significant amount of memory.
>> In order to spruce up our performance a little, I looked into the use of
>> memcached with subversion again, found the correct config parameter, and
>> set it up. Our server is running Ubuntu 10.04, Apache 2.2. Access
>> mechanism is http (of course). The client used is running Ubuntu 11.04,
>> and svn commandline (1.6.17 also)
>> The results were interesting, to say the least.
And the sad thing that these results are in line with
what can be expected in 1.6. That's why the whole
caching code has been reworked in 1.7.
>> Checkout of a tree, about 250M in size:
>> Without memcached, 1 1/2 to 2 minutes, varies with server load
>> With memcached, 12 minutes (!)
>> Update of the same tree,
>> Without memcached, 9 seconds
>> With memcached, 14 seconds - repeated several times, similar results.
You can expect all similarly structured repositories
to show similar performance patterns. Only for very
different content and usage patterns, there might be
a performance improvement (see below).
>> I am not sure what anyone else's experience is, but we will not be
>> enabling memcached for subversion any time soon.
I will try to answer that with some indication towards
what you may try and when that might aid performance
in 1.6 and 1.7. But first, let me give you some technical
background because there is obviously no simple
recipe for making things fast in 1.6.

The key factors here is latency and trade-off. To read
a userfile_at_rev from the repository, the back-end has to
follow a short chain of objects (roughly: rev->offset in repo
file, userfile -> last change, last change -> offset in repo file,
chain of deltas to combine). All that data will ultimately
come from disk.

Most servers boast large amounts of file system cache
that can be accessed in < 0.1ms. SVN itself will cache
the index information used at the beginning of the lookup
chain in in its application memory. By default, only the
user file deltas need to be read (from file system cache),
decompressed and combined into the original content.

For typical files < 100k, only 5 or less of these delta
blocks need to be read while the index / admin info
at the beginning of the lookup chain contains also about
5 steps which can often be satisfied directly from internal
caches, i.e. are "for free". So, the default in 1.6 is < 1ms
for reading the data plus come CPU load from unzipping it.

With memcached, the picture changes and parts of that
can be considered a design flaw in 1.6. All index objects
will be stored in the memcached, i.e. accessing them is
no longer for free. OTOH, reconstructed user-file content
will no be cached, i.e. no need to reconstruct it from
deltas over and over again. So, we traded 3 or 4 file
cache reads plus unzip CPU load for ~5 memcached reads.

But the latter involves a TCP/IP communication between
processes with latencies 2+ times that of the file system
cache. To make things worse, memcached seems to
shut off for a few seconds when being hammered with
a large number of requests in a short period of time
(I observed that behavior under Ubuntu 9.04). To mitigate
that, i.e. have *some* process to answer your requests,
start 3 or 4 of them. They all will end up with redundant

So, when can memcached be useful in 1.6?

* when the file system cache in ineffective (repositories
   on NFS-like shares)
* disk (NAS) latency is higher than TCP/IP latency
* large external memcached servers are available
   compared to usable file system cache on the SVN
   server machine
* huge repositories where the combined amount of
   frequently requested information is larger than what
   the file system cache can buffer.

For 1.7, things are quite different. Memcached will only
be used for user file content - not the many admin objects
needed to access it. Hence, the trade-off should always
be 1 TCP/IP lookup vs. multiple file cache accesses.

Moreover, the svn server itself can cache those full texts
- effectively eliminating all latencies. Combined with
many improvements to the caching logic, all c/o
operations should be strictly limited by client I/O.

Hope that lengthy explanation helps!

-- Stefan^2.
Received on 2011-07-09 20:41:04 CEST

This is an archived mail posted to the Subversion Users mailing list.