On Thu, Sep 20, 2012 at 11:37 PM, Stefan Sperling <stsp_at_elego.de> wrote:
> On Thu, Sep 20, 2012 at 08:34:52PM -0000, stefan2_at_apache.org wrote:
> > Author: stefan2
> > Date: Thu Sep 20 20:34:52 2012
> > New Revision: 1388202
> >
> > URL: http://svn.apache.org/viewvc?rev=1388202&view=rev
> > Log:
> > On 10Gb branch: Add --zero-copy-limit parameter to svnserve and pass
> > it down to the reporter.
>
> Should we really expose a switch like that at the UI?
>
It is the server UI. We might as well use some settings file.
Both new settings are similar to TCP/IP parameter tuning.
Most of the times, you won't need it but it can allow you to
handle specific workloadsmore efficiently.
> The description you give is:
>
> > @@ -235,6 +236,16 @@ static const apr_getopt_option_t svnserv
> > "Default is no.\n"
> > " "
> > "[used for FSFS repositories only]")},
> > + {"zero-copy-limit", SVNSERVE_OPT_ZERO_COPY_LIMIT, 1,
> > + N_("send files smaller than this directly from the\n"
> > + " "
> > + "caches to the network stack.\n"
> > + " "
> > + "Consult the documentation before activating this.\n"
> > + " "
> > + "Default is 0 (optimization disabled).\n"
> > + " "
> > + "[used for FSFS repositories only]")},
>
> Which to me looks like a scary flag I'd rather leave alone. :)
>
Yes and rightly so. It has side-effects and may hurt
performance when multiple clients are being served
concurrently and not all data is available in cache.
I plan to change the cache implementation such that
readers will no longer block write attempts. The side-
effects will then be much less problematic.
> Can't we use some heuristic to determine whether or not to enable
> this optimisation, so the user won't have to reason about whether
> or not to use this option? Is designing the heuristic just a matter
> of finding a good threshold by experimentation on sample data sets?
>
The technical background is that data buffers in the
cache will be directly handed over to the network stack
(if longer than 8k - otherwise is may go into the ra_svn
TX buffer). I.e. we cannot modify the respective cache
segment until the socked call returned.
So, even with the above change to the membuffer code,
the situation remains as follows:
* the setting is only relevant when fulltext caching has
been enabled
* the reporter will always push data out until the client
cannot keep up and the socket blocks
* short file contents (<<8k) are rare and will be collected
in our TX buffer. If they should be frequent in your use-
case, you may see some speed up. However, once the
TX buffer is full, pushing that to the socket will still block
cache writes.
* enable it only if you can assume that almost all data
comes from cache (after some warm-up)
* don't use it with long-latency connections
* enable it if concurrent server access isn't the norm.
It will be most efficient with 10Gb connected clients
fetching a few GB (or less) at a time. The request will
likely be completed before the next one comes in.
* enable it if your TCP/IP stack is in parts handled by
I/O hardware. You have a 10/40Gb NIC then and this
setting is your only chance to use this bandwidth.
Some of the above can be checked by observation (typical
server load etc.). Beyond that, you need to experiment.
A good threshold might be some 70 .. 80% percentile
(i.e. 70% of your files are below the threshold):
* performance gain on 40..50% of your data
* sockets block on contents that does *not* use the
zero-copy-code
* blocks on the shorter files become less likely and
the cache becomes writable approx. 50% of the time.
-- Stefan^2.
--
*
Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*
Received on 2012-09-23 13:53:14 CEST