Re: Status of ra_serf

From: Phillip Susi <psusi_at_cfl.rr.com>
Date: 2006-02-17 18:20:17 CET

Justin Erenkrantz wrote:
> No. The problem with HTTP connections is that they have a very finite
> lifetime. The default configuration for Apache httpd is to only allow
> 100 requests on a connection then close it. This is tunable by the
> server admins and one of the factors we'll have to analyze is "what's
> the optimal number of requests on a connection?" We currently issue
> two HTTP requests for each file we request (a PROPFIND and GET). So,
> that means, in the default config, we can only get 50 files per TCP
> connection out of an out-of-the-box configuration. (Bumping it up to
> 1000 greatly increases the pipeline depth which increases the memory
> usage on the client as well.)
>
> Therefore, since the connections can go away, what we ideally want to
> do is stagger our requests across connections such that the network is
> always active and our 'pipeline' is full: this is why more than one
> connection is needed for maximum efficiency.
>

Ideally you want to open one connection and send the 100 requests to
prime the pipeline. Then JUST as the fist connection finishes with the
last file, you want the next 100 requests to hit the server on the other
connection. Having two connections each pulling 100 requests at the
same time causes disk thrashing and network packet collisions, which you
want to avoid.

It would also be nice if the server would just raise that stupid cap so
you don't have to waste time building a new TCP connection and getting
the window open.

> The problem is that the client is sitting around idle until the server
> finishes the REPORT response. The REPORT takes a *long* time to
> completely generate as there's a lot of data that it is sending back
> (the entire list of files to fetch + some properties for each file).
> (AFAICT, REPORT is largely constrained by the backend repository I/O
> speed.) Also, mod_deflate and mod_dav_svn interact in such a way that
> for REPORT requests that don't include full-text, it will be buffered
> on the server until the REPORT response is completed (this is why
> ra_serf won't request gzip compression on the REPORT - yes, this is a
> nasty server-side bug; I would like to track it down,
> time-permitting).
>

I would ask why does it take so long to generate the report? Maybe that
has some room for improvement. Given that the report request takes so
long to generate, and during that time the connection is blocked, yes,
it would be a good idea to open another connection to download files
that are more readily available.

You shouldn't need more than one extra connection though, and ideally it
would be great if you could ask the server to begin generating the
report in the background and spool it in a temp file, then download
other files you know you need, THEN fetch the report from the temp file.
That way you wouldn't need the extra connection.

> The real trick here is that there is no reason to wait for the REPORT
> response to finish to start acquiring the data we already know we
> need. There's no real I/O happening on the client-side - therefore,
> it is more efficient to open a new connection to the server and start
> pulling down the files as soon as we know we need it.
>

Aye... but again, one extra connection should be sufficient since only
the report connection is blocked; the other connection can be kept
pipelined.

> To put numbers behind it, for a checkout of an httpd 2.2.0 tarball, it
> takes roughly 7 seconds from the time the server starts writing the
> REPORT response until it is completely finished (with the client just
> saying 'yah, whatever' and ignoring it). If we maintain just one
> connection, ra_serf will indeed queue all of them up - but the server
> can't begin to respond until it's done generating and writing the
> REPORT.
>
> Without multiple connections, a checkout with ra_serf would take about
> 35 seconds. With multiple connections, we can currently do that same
> checkout in under 25 seconds. In some cases, the speed advantage is a
> factor of 2 or more by acquiring files as soon as we know we need
> them.
>
> HTH. -- justin
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 17 18:22:32 2006

This message: [ Message body ]
Next message: Phillip Susi: "Re: Status of ra_serf"
Previous message: Greg Stein: "Re: Status of ra_serf"
In reply to: Justin Erenkrantz: "Re: Status of ra_serf"
Next in thread: Justin Erenkrantz: "Re: Status of ra_serf"
Reply: Justin Erenkrantz: "Re: Status of ra_serf"
Reply: kfogel_at_collab.net: "Re: Status of ra_serf"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]