[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Sun, 23 Sep 2012 14:33:25 +0200

On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:

> On Sat, Sep 22, 2012 at 2:27 PM, <stefan2_at_apache.org> wrote:
> > Author: stefan2
> > Date: Sat Sep 22 12:27:49 2012
> > New Revision: 1388786
> >
> > URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
> > Log:
> > On the 10Gb branch.
> >
> > * BRANCH-README: clarify goals and impact of this branch
> >
> > Modified:
> > subversion/branches/10Gb/BRANCH-README
> >
> > Modified: subversion/branches/10Gb/BRANCH-README
> > URL:
> http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
> >
> ==============================================================================
> > --- subversion/branches/10Gb/BRANCH-README (original)
> > +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
> > @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
> > 10Gb/s for typical source code, i.e. becomes capable of
> > saturating a 10Gb connection.
> >
> > +http:// will speep up by almost the same absolute value,
> > +1 second being saved per GB of data. Due to slow processing
> > +in other places, this gain will be hard to measure, though.
>
> Heh, next question: what are those "slow places" mainly, and do you
> have any ideas to speed those up as well? Are there (even only
> theoretical) possibilities here? Or would that require major
> revamping? Or is it simply theoretically not possible to overcome
> certain bottlenecks?
>

It is not entirely clear, yet, where that overhead comes from.
However,

* the textual representation is not a problem - there is no
  significant data overhead in HTTP. Base64 encoding has
  been limiting in the past and may certainly be tuned much
  more if need be.
* IIRC, we use the same reporter on the same granularity,
  the server pushes a whole file tree out to the client with no
  need for extra roundtrips. But I may be mistaken here.

Possible sources for extra load:

* Apache modules packing / unpacking / processing
  the outgoing data (HTTP/XML tree?)
* Apache access control modules - even if there is
  blanket access
* Fine-grained network communication.

The latter two are a problem because we want to transmit
40k files + properties per second.

My gut feeling is that we can address much of the issues
that we will find and doubling the performance is virtually
always possible. A stateless protocol like HTTP also
makes it relatively easy to create parallel request streams
to increase throughput.

Another thing is that svnserve would be just fine for many
use-cases if only it had decent SSPI / ldap support. But
that is something we simply need to code. Power users
inside a LAN may then use svnserve and more flexible /
complicated setups are handled by an Apache server on
the same repository.

Finally, 1.8 clients are much to slow to do anything useful
with that amount of bandwidth. Checksumming alone limits
the throughput to ~3Gb/s (for export since it only uses MD5)
or even ~1Gb/s (checkout calculates MD5 and SHA1).

Future client will hopefully do much better here.

-- Stefan^2.

-- 
*
Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*
Received on 2012-09-23 14:34:02 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.