[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r983474 - /subversion/branches/performance/subversion/libsvn_fs_fs/caching.c

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Sun, 15 Aug 2010 14:33:48 +0200

Philip Martin wrote:
> stefan2_at_apache.org writes:
>> Author: stefan2
>> Date: Sun Aug 8 19:41:11 2010
>> New Revision: 983474
>> URL: http://svn.apache.org/viewvc?rev=983474&view=rev
>> Log:
>> Memcached is often slower than a single file access.
> Doesn't that depend on the type of filesystem used and the resources
> allocated to memcached?
One might think so but if you analyze it a bit deeper, then there are
only a very few situations when memcached can be faster. I think
to remember a comment by Greg that he was actually disappointed
by the performance gain (or lack thereof) from his memcached use.

* A "ping localhost" takes about 50musecs, a file cache access in
  an open file takes about 10musecs. If you have enough memory
  to fit the memcached in it, you may as well use that memory for
  the file cache. The latter should be faster since fopen is a relatively
  rare operation now with the file handle cache in place.

* If neither memcached nor membuffer have been enabled, we fall
  back to the SVN request local caches that provide even better
  performance. Although they are relatively small, their hit count is
  generally high. So, an effective memcached setup would need to
  be at least 10 times faster than a single file read access.

* A remote memcached might be faster than file access, if the network
  latency is 10 times lower than the average file access. Even for non-
  buffered disk I/O, this translates into <1ms pings (possible in a LAN).
  However, if you can justify to set up a reasonably sized memcached
  server, you are also likely to have a file I/O system that provides ample
  data caches as well.

* Even more problematic in _my_ experiments was the general
  unreliability of the memcached setup: after about 10 seconds, a
  memcached instance would become irresponsive for some seconds.
  This effect could only be mitigated by setting up 3 servers. My theory
  is that the >10k requests / second somehow triggered a DOS attack

* The data might come out more on the side of memcached usage for
  very small requests (e.g. a single file export). Folder-level check-outs,
  exports and ls will generally benefit greatly from internal caching.

Bottom line: SVN internal caching requires memcached to be 10 times
faster than the file accesses required to fetch that data. For fulltexts,
this condition can often be satisfied, for other FSFS data, it cannot.
>> Thus, it is not inefficient
>> for most FSFS data structures: they can be read in a single request.
> I don't understand this. Does "not inefficient" mean "efficient"? I
> think you mean that for most FSFS data structures it is more efficient
> to read from a file than to read from memcached.
That's a mere typo. Please feel free to add or remove one level of
negation ;)
>> Use the
>> membuffer cache for them instead.
> I assume this is faster in your test environment. What sort of
> environment is that? How does it compare to the enviroments that are
> currently using memcached?
My environment is the usual LINUX workstation (2xXEON 5550,
24GB). All network I/O is measured via localhost.

When I restricted the use of memcached to fulltexts only, the original
code managed to be slightly faster then fully cached file I/O. All 3
memcached servers ran on the same machine as the svnserve process.

Since I have no access to large enterprisy server setups, I would
like to hear from performance measurements done on such systems.
It would be very easy to support any kind of cache configuration in
FSFS once the critical parameters have been identified because all
caches expose the same interface.

Currently, membuffer cache uses memory in the server process itself.
So, you need enough of it to hold at least a larger part of the hot FSFS
management structures. The 64MB default should do in most cases.
If you want it to be effective for fulltext caches as well, you should
GBs (just like for memcached).

For most requests, the membuffer cache has full access times in the
order of magnitude of 1 musec (half of that time is required to calculate
the hash sum). With concurrent svnserve requests, the cache currently
serves >200k accesses / sec.

-- Stefan^2
Received on 2010-08-15 14:34:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.