On Sat, Dec 1, 2012 at 2:31 PM, Justin Erenkrantz <justin_at_erenkrantz.com> wrote:
> On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>>
>> I'm wondering whether your concerns apply to both internet-wide
>> deployments and local (all on the same LAN) ones.
>
>
> That line is certainly a fair one to draw in the sand. That said, I think
> the internal use case cries out even *more* for the parallel updates as the
> internal server in that environment is often wildly over-provisioned on the
> CPU side - with a fairly low-traffic environment, you want to take advantage
> of the parallel cores of a CPU to drive the updates.
>
> Generally speaking, what I discovered years ago back in 2006 (yikes) and I
> believe is still true as we near 2013 (shudder), if everything else is
> perfectly optimized (disk, latency, bandwidth, etc.), you're going to
> eventually bottleneck on the checksumming on both client and server - which
> is entirely CPU-bound and is expensive. You can solve that by splitting out
> the work across multiple cores - for a server, you need to utilize multiple
> parallel requests in-flight; and for a client, you then need to parallelize
> the editor drive.
>
> The reason that disk isn't such a bottleneck as you might first expect is
> due to the OS's buffer cache - for reads on the server-side, common data is
> already going to be in RAM so hot spots in the fsfs repos will already be in
> memory, for writes on the client-side, modern client OSes won't necessarily
> block you until everything is sync'd to disk. But, once you exhaust the
> capabilities of RAM, your underlying disk architecture matters a lot and one
> that might not be intuitive to those that haven't spent a lot of time
> closely with them. (Hi Brane!) If you are using direct-attached storage
> locally on either server or client, then you will probably be bottlenecked
> right there. However, if your corporate environment has an NFS filer or SAN
> (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh
> so common), those large disk subsystems are geared towards parallel I/Os -
> not single-threaded I/O performance - Isilon/BlueArc-class storage is
> however; but I've yet to see anyone obsessed enough about SVN I/O perf to
> place either their repository or working copies on a BlueArc-class storage
> system! So, if you are not using direct-attached storage and are using NFS
> today in a corporate environment on either client or server, then you want
> to parallelize everything so that you can take advantage of the disk/network
> I/O architecture preferred by NetApp/EMC. Throwing more cores against a
> NetApp/EMC storage system in a high-available bandwidth environment allows
> for linear performance returns (i.e., reading/writing one I/O is 1X, two
> threads is 2X, three threads is 3X, etc, etc.).
>
> To that end, I'd eventually love to see ra_serf drive the update editor
> across multiple threads so that the checksum and disk I/O bottleneck can be
> distributed across cores on the client-side as well. Compared to where we
> were in 2006, that's the biggest inefficiency we have yet to solve and take
> advantage of. And, I'm sure this'll break all sorts of promises in the Ev1
> and perhaps Ev2 world and drive C-Mike even crazier. =) But, if you want
> to put a rocket pack on our HTTP performance, that's exactly what we should
> do. I'm reasonably certain that serf itself could be finely tuned to handle
> network I/O in a single thread at or close to wire-speed even on a 10G
> connection with a modern processor/OS - it's what we do with the file
> contents/textdeltas that needs to be shoved to a different set of worker
> threads and remove all of that libsvn_wc processing from blocking network
> traffic processing and get it all distributed and thread-safe. If we do
> that, woah, I'd bet that we are we going to make things way faster across
> the board and completely blow everything else out of the water when our
> available bandwidth is high - which is the case in an internal network.
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit. =P
>
> So, that's my long-winded answer of saying that, yah, even in an internal
> LAN environment, you still want to parallelize.
>
> However, I'm definitely not going to veto a patch that would add an httpd
> directive that allows the server to steer the client - unless overridden by
> the client's local config - to using parallel updates or not. -- justin
There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.
I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice further increasing the overhead.
IMHO in this scenario the server admin should be able to veto the use
of parallel requests.
And the same is true for https connections, where it's also the server
admin who can decide if the necessary caches have been put in place to
enable the benefits of parallel requests.
Lieven
Received on 2012-12-01 15:02:47 CET