[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Status of ra_serf

From: Justin Erenkrantz <justin_at_erenkrantz.com>
Date: 2006-02-18 00:29:24 CET

On 2/17/06, Phillip Susi <psusi@cfl.rr.com> wrote:
> I don't understand. Presumably you already have a list of files you
> want to fetch in memory. That memory is allocated anyhow, no matter if
> you are sending the request list or not, so why would take any more
> memory to start sending the requests to the server? I would think you
> would just start walking the list of files and write GET requests to the
> socket as long as the socket still has buffer space. If that's the case
> then the only extra memory uses is the kernel socket send buffer, which
> should be limited to a reasonable value. You're going to use the same
> amount of memory whether you write 100 get requests to one socket, or 25
> to each of 4.

No. Serf allocates a pool to the request when it decides to deliver
it. If we decided to not use a pool strategy, it would make my life
(and any one coding at serf's level) suck. So, that means each
request requires 8KB of memory at a minimum. For 1000 requests,
that's 8KB*1000 = ~8MB. If we remove a pipeline limit and write 5000
requests, we're talking ~40MB of overhead. Feasible, but there's a
steep cost for memory allocation - reducing our memory footprint makes
us faster than if we tried to be cute and write 5000 requests when we
know that we're not going to need them all. (Also, by delaying our
memory allocation, we get the benefits of pools such that we stabilize
our memory usage very quickly and it plateaus.)

> >> True, but doesn't quadrupling the connections to the server increase the
> >> load it on quite a bit? That's what I'm concerned about.
> >
> > No, it doesn't - compared to the large REPORT response ra_dav fetches.
> > The server isn't having to do any base-64 encoding - that's the real
> > room for opportunity that ra_serf has as we're just not going to be
> > more optimal on the number of connections. Therefore, breaking the
> > activity into smaller chunks helps ease the computational load on the
> > server (provided the disk I/O can keep up).
> >
>
> Hrm... you got me curious now... what's using base-64 encoding and why?

In order to stick the full-text in the XML response to the REPORT for
ra_dav, mod_dav_svn base-64 encodes every file such that it will be
valid XML parsable by expat. Base-64 encoding is moderately expensive
and increases the overall space; however, using mod_deflate lets us
recover some of the space at the cost of even more CPU.

> Also to be clear, you are saying that the increased multiple extra get
> connections increase the load on the server, but that increase is
> negligible compared to the load from the report request?

Correct.

> I think that in the vast majority of cases, the bottleneck is going to
> be the network, not the local disk. Most hard disks these days can
> handle 10 MB/s easily, but not many people have >= 100 Mbps connections
> to a svn server. Given that, splitting the data stream into multiple
> tcp sockets tends to lower the total throughput due to the competition,
> which will increase overall checkout times.

We'll need to let the data trump any hypotheses. =) -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Feb 18 00:29:44 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.