[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: [Tony Butt: Trials with memcached]

From: Tony Butt <Tony.Butt_at_cea.com.au>
Date: Mon, 18 Jul 2011 12:19:59 +1000

On Sat, 2011-07-09 at 15:02 +0200, Stefan Fuhrmann wrote:
> On 08.07.2011 01:56, Daniel Shahaf wrote:
> > FYI from users@
> >
> > ----- Forwarded message from Tony Butt<Tony.Butt_at_cea.com.au> -----
> > Date: Wed, 6 Jul 2011 15:20:27 +1000
> >> We are running subversion 1.6.17 on a vmware hosted server. We recently
> >> reconfigured the server to give 4 virtual CPUs (up from 1), and a
> >> significant amount of memory.
> >>
> >> In order to spruce up our performance a little, I looked into the use of
> >> memcached with subversion again, found the correct config parameter, and
> >> set it up. Our server is running Ubuntu 10.04, Apache 2.2. Access
> >> mechanism is http (of course). The client used is running Ubuntu 11.04,
> >> and svn commandline (1.6.17 also)
> >>
> >> The results were interesting, to say the least.
> >>
> And the sad thing that these results are in line with
> what can be expected in 1.6. That's why the whole
> caching code has been reworked in 1.7.
> >> Checkout of a tree, about 250M in size:
> >>
> >> Without memcached, 1 1/2 to 2 minutes, varies with server load
> >> With memcached, 12 minutes (!)
> >>
> >> Update of the same tree,
> >> Without memcached, 9 seconds
> >> With memcached, 14 seconds - repeated several times, similar results.
> You can expect all similarly structured repositories
> to show similar performance patterns. Only for very
> different content and usage patterns, there might be
> a performance improvement (see below).
> >> I am not sure what anyone else's experience is, but we will not be
> >> enabling memcached for subversion any time soon.
> I will try to answer that with some indication towards
> what you may try and when that might aid performance
> in 1.6 and 1.7. But first, let me give you some technical
> background because there is obviously no simple
> recipe for making things fast in 1.6.
>
> The key factors here is latency and trade-off. To read
> a userfile_at_rev from the repository, the back-end has to
> follow a short chain of objects (roughly: rev->offset in repo
> file, userfile -> last change, last change -> offset in repo file,
> chain of deltas to combine). All that data will ultimately
> come from disk.
>
> Most servers boast large amounts of file system cache
> that can be accessed in < 0.1ms. SVN itself will cache
> the index information used at the beginning of the lookup
> chain in in its application memory. By default, only the
> user file deltas need to be read (from file system cache),
> decompressed and combined into the original content.
>
> For typical files < 100k, only 5 or less of these delta
> blocks need to be read while the index / admin info
> at the beginning of the lookup chain contains also about
> 5 steps which can often be satisfied directly from internal
> caches, i.e. are "for free". So, the default in 1.6 is < 1ms
> for reading the data plus come CPU load from unzipping it.
>
> With memcached, the picture changes and parts of that
> can be considered a design flaw in 1.6. All index objects
> will be stored in the memcached, i.e. accessing them is
> no longer for free. OTOH, reconstructed user-file content
> will no be cached, i.e. no need to reconstruct it from
> deltas over and over again. So, we traded 3 or 4 file
> cache reads plus unzip CPU load for ~5 memcached reads.
>
> But the latter involves a TCP/IP communication between
> processes with latencies 2+ times that of the file system
> cache. To make things worse, memcached seems to
> shut off for a few seconds when being hammered with
> a large number of requests in a short period of time
> (I observed that behavior under Ubuntu 9.04). To mitigate
> that, i.e. have *some* process to answer your requests,
> start 3 or 4 of them. They all will end up with redundant
> information.
>
Without understanding the details, that is similar to what I thought
must be happening. Since it is fairly simple to switch in/out memcached,
I will try again when we go to 1.7.

Thanks for the thoughtful response -

Tony Butt
> So, when can memcached be useful in 1.6?
>
> * when the file system cache in ineffective (repositories
> on NFS-like shares)
> * disk (NAS) latency is higher than TCP/IP latency
> * large external memcached servers are available
> compared to usable file system cache on the SVN
> server machine
> * huge repositories where the combined amount of
> frequently requested information is larger than what
> the file system cache can buffer.
>
> For 1.7, things are quite different. Memcached will only
> be used for user file content - not the many admin objects
> needed to access it. Hence, the trade-off should always
> be 1 TCP/IP lookup vs. multiple file cache accesses.
>
> Moreover, the svn server itself can cache those full texts
> - effectively eliminating all latencies. Combined with
> many improvements to the caching logic, all c/o
> operations should be strictly limited by client I/O.
>
> Hope that lengthy explanation helps!
>
> -- Stefan^2.

--
Tony Butt <tjb_at_cea.com.au>
CEA Technologies
Received on 2011-07-18 13:15:15 CEST

This is an archived mail posted to the Subversion Users mailing list.