Justin Erenkrantz wrote:
>
> Remember that we may able to truly parallelize the requests if the
> server has multiple CPUs, etc.
>
> Also, if we try to 'flood' the server with an unlimited pipeline
> depth, we'll take up more memory on the client-side than needed as we
> have to 'manage' more concurrent requests. Some recent commits to
> serf changed the API to only allocate memory when we're about to write
> the request not when we create the request. It cut the memory
> consumption by half.
>
I don't understand. Presumably you already have a list of files you
want to fetch in memory. That memory is allocated anyhow, no matter if
you are sending the request list or not, so why would take any more
memory to start sending the requests to the server? I would think you
would just start walking the list of files and write GET requests to the
socket as long as the socket still has buffer space. If that's the case
then the only extra memory uses is the kernel socket send buffer, which
should be limited to a reasonable value. You're going to use the same
amount of memory whether you write 100 get requests to one socket, or 25
to each of 4.
>> True, but doesn't quadrupling the connections to the server increase the
>> load it on quite a bit? That's what I'm concerned about.
>
> No, it doesn't - compared to the large REPORT response ra_dav fetches.
> The server isn't having to do any base-64 encoding - that's the real
> room for opportunity that ra_serf has as we're just not going to be
> more optimal on the number of connections. Therefore, breaking the
> activity into smaller chunks helps ease the computational load on the
> server (provided the disk I/O can keep up).
>
Hrm... you got me curious now... what's using base-64 encoding and why?
Also to be clear, you are saying that the increased multiple extra get
connections increase the load on the server, but that increase is
negligible compared to the load from the report request?
>> It also isn't
>> as helpful for checkout times as keeping one ( or two if you know the
>> first will be blocked for a while doing processing on the server )
>> connections pipelined.
>
> Again, it's a function of what resources the server has to offer.
> Ideally, we'd like to make the bottleneck the *client* disk I/O not
> the network or the server disk I/O.
>
I think that in the vast majority of cases, the bottleneck is going to
be the network, not the local disk. Most hard disks these days can
handle 10 MB/s easily, but not many people have >= 100 Mbps connections
to a svn server. Given that, splitting the data stream into multiple
tcp sockets tends to lower the total throughput due to the competition,
which will increase overall checkout times.
>> Yes, the server will hang up after x requests, but when you are issuing
>> x requests at the same time on 4 connections, they will compete with
>> each other and prevent their tcp windows from opening fully. For very
>> low values of x ( 1-10 ) then 4 connections might give an improvement
>> because they are all so short lived anyhow that they can't reach full
>> open windows, and you get rid of the reconnect latency. For values of x
>> >= 100 though, I think 2 connections would give better results. Should
>> make an interesting test...
>
> There's a distinct need to profile our behaviors to ensure we're being
> as optimal as we can make it. ;-) -- justin
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Feb 18 00:13:35 2006