[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 25 Sep 2012 16:29:18 +0200

On Sun, Sep 23, 2012 at 2:33 PM, Stefan Fuhrmann
<stefan.fuhrmann_at_wandisco.com> wrote:
> On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>>
>> On Sat, Sep 22, 2012 at 2:27 PM, <stefan2_at_apache.org> wrote:
>> > Author: stefan2
>> > Date: Sat Sep 22 12:27:49 2012
>> > New Revision: 1388786
>> >
>> > URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
>> > Log:
>> > On the 10Gb branch.
>> >
>> > * BRANCH-README: clarify goals and impact of this branch
>> >
>> > Modified:
>> > subversion/branches/10Gb/BRANCH-README
>> >
>> > Modified: subversion/branches/10Gb/BRANCH-README
>> > URL:
>> > http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
>> >
>> > ==============================================================================
>> > --- subversion/branches/10Gb/BRANCH-README (original)
>> > +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
>> > @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
>> > 10Gb/s for typical source code, i.e. becomes capable of
>> > saturating a 10Gb connection.
>> >
>> > +http:// will speep up by almost the same absolute value,
>> > +1 second being saved per GB of data. Due to slow processing
>> > +in other places, this gain will be hard to measure, though.
>>
>> Heh, next question: what are those "slow places" mainly, and do you
>> have any ideas to speed those up as well? Are there (even only
>> theoretical) possibilities here? Or would that require major
>> revamping? Or is it simply theoretically not possible to overcome
>> certain bottlenecks?
>
>
> It is not entirely clear, yet, where that overhead comes from.
> However,
>
> * the textual representation is not a problem - there is no
> significant data overhead in HTTP. Base64 encoding has
> been limiting in the past and may certainly be tuned much
> more if need be.
> * IIRC, we use the same reporter on the same granularity,
> the server pushes a whole file tree out to the client with no
> need for extra roundtrips. But I may be mistaken here.

With 1.8 there will only be ra_serf for http, and that does a separate
http GET for every file during checkout/update. These requests can go
in parallel. In most setups, with KeepAlive enabled, TCP connections
will be reused, but still there will be a certain overhead for every
http request/response. There is no giant streaming response with an
entire tree.

> Possible sources for extra load:
>
> * Apache modules packing / unpacking / processing
> the outgoing data (HTTP/XML tree?)
> * Apache access control modules - even if there is
> blanket access
> * Fine-grained network communication.
>
> The latter two are a problem because we want to transmit
> 40k files + properties per second.
>
> My gut feeling is that we can address much of the issues
> that we will find and doubling the performance is virtually
> always possible. A stateless protocol like HTTP also
> makes it relatively easy to create parallel request streams
> to increase throughput.
>
> Another thing is that svnserve would be just fine for many
> use-cases if only it had decent SSPI / ldap support. But
> that is something we simply need to code. Power users
> inside a LAN may then use svnserve and more flexible /
> complicated setups are handled by an Apache server on
> the same repository.

Ah yes. If somebody could "fix" the auth support in svnserve (in a way
that really works, as opposed to the current SASL support), that would
be great :-). That would open up a lot more options for deployment.

> Finally, 1.8 clients are much to slow to do anything useful
> with that amount of bandwidth. Checksumming alone limits
> the throughput to ~3Gb/s (for export since it only uses MD5)
> or even ~1Gb/s (checkout calculates MD5 and SHA1).
>
> Future client will hopefully do much better here.

Indeed. That would make the client again the clear bottleneck :-).
Besides, even if you checksum at 3Gb/s, you'll need some seriously
fast hardware to write to persistent storage at such a speed :-).

-- 
Johan
Received on 2012-09-25 16:30:12 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.