Re: HTTP protocol v2: rethunk.

From: Mark Phippard <markphip_at_gmail.com>
Date: Thu, 6 Nov 2008 10:34:42 -0500

On Thu, Nov 6, 2008 at 10:18 AM, Ben Collins-Sussman
<sussman_at_red-bean.com> wrote:
> On Thu, Nov 6, 2008 at 8:04 AM, Mark Phippard <markphip_at_gmail.com> wrote:
>> On Thu, Nov 6, 2008 at 7:55 AM, Ben Collins-Sussman
>> <sussman_at_red-bean.com> wrote:
>>> On Thu, Nov 6, 2008 at 1:35 AM, David Glasser <glasser_at_davidglasser.net> wrote:
>>>
>>>> There's a lock around appending to the proto-rev file, yes. And it
>>>> does write a bunch of little files for noderevs, directory listings,
>>>> and props, and gloms them on at finalization time, yes. But there are
>>>> no locks around editing the little files themselves; so for example,
>>>> if concurrent processes make two files in a directory at the same
>>>> time, there can be a race condition and only one will end up in the
>>>> listing. This situation isn't possible if you only access a
>>>> transaction from a single process (say, via the commit editor).
>>>
>>> But if all writes go through a single process, that sort of defeats
>>> the goal of saturating the bandwidth with parallel PUTs. Maybe it
>>> would be a worthy goal to make FSFS safe? I know it's something we'll
>>> have to do for libsvn_fs_bigtable.
>>
>> Do we know that "saturating the pipe" will give the best performance?
>> We (CollabNet) are frequently hearing complaints of WAN performance
>> lately and it has been suggested that a single request would perform
>> better in that environment because:
>>
>> a) the pipe is not that big
>> b) the latency on turnarounds is the biggest killer of performance
>
> I'd have to defer this question to the serf experts. There's an
> unspoken assumption that saturating the pipe with parallel (and/or
> pipelined) requests is always a speed gain, but I actually don't know
> of evidence to back this up. Maybe it's out there, though.

It occurred to me later that several PUT requests may not suffer the
latency problem anyway since it is largely a one-way action. Of
course if we had to do some other requests before each PUT it could
negate the benefit. It does not sound like that is the case.

I think getting rid of all the PROPFIND stuff will be the biggest win here.

> In the case of doing a checkout/update, my understanding is that
> ra_serf's parallelism is 'slightly' faster than ra_neon's single
> request/response. The response I've always gotten back from
> gstein/jerenkrantz about this is that ra_serf is going to really shine
> when caching proxies come into play. This makes sense to me: even if
> there's no obvious, immediate benefit to most users doing checkouts
> over serf, the design is a bit of investment in the future, and should
> be especially beneficial to corporations with caching infrastructure.

I agree this makes sense. I am not sure if it is true in practice
yet. At a minimum, there is no evidence that anyone is successfully
doing this today. I have tried to test this on a small scale and have
run into some significant barriers. The first is finding a good proxy
that works with Serf. As we know, Squid does not. When I have
brought this up in the past, it was just met with hand-waving about
how large corporations all use proxies from Cisco or some other large
vendor. Great, that makes it real easy to test the theory ...

The bigger issue, the one that seems close to insurmountable, is how
SSL and authentication plays into this. If the connection to
repository is via SSL then how can a proxy cache anything? gstein
implied there are ways to do this, but his answer was far from
concrete and it remains to be seen whether that answer would be
acceptable to large enterprises anyway. We also have customers that
use pkcs#12 certificates for their authentication to Subversion.
Again, how would this ever work with a caching proxy? Before we wave
these issues away, keep in mind that the users that use these features
are the very types of large enterprises that have the geographically
distributed teams that could benefit the most from these proxies. At
the same time, there is zero chance of getting them to not use SSL and
certificates.

I am not saying any of this means the Serf approach is not the right
approach. I am saying that before we just "toss out" phrases like
caching proxy servers it would be nice if the people that are
knowledgeable about this stuff would test that these things really can
be used. Perhaps there are some things we could do to make these more
of an option?

I am reminded of the introduction of SASL to svnserve which I largely
consider to be a failure in that it does not accomplish the main goal
that most of the users wants, which is to connect svnserve
authentication with their Windows Active Directory. I recall we waved
our hands at this issue when SASL was added and just implied it was a
matter of setting up SASL to work with it. Turns out, that is not so
easy and perhaps not even possible. Sorry for the sidetrack.

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-11-06 16:35:00 CET

This message: [ Message body ]
Next message: Garrett Rooney: "Re: HTTP protocol v2: rethunk."
Previous message: Greg Hudson: "Re: HTTP protocol v2: rethunk."
In reply to: Ben Collins-Sussman: "Re: HTTP protocol v2: rethunk."
Next in thread: Julian Reschke: "Re: HTTP protocol v2: rethunk."
Reply: Julian Reschke: "Re: HTTP protocol v2: rethunk."
Reply: Daniel Stenberg: "Re: HTTP protocol v2: rethunk."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]