[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r33366 - trunk/notes

From: Ben Collins-Sussman <sussman_at_red-bean.com>
Date: Tue, 30 Sep 2008 17:30:58 -0500

On Tue, Sep 30, 2008 at 3:24 PM, <gstein_at_tigris.org> wrote:

> Unfortunately, DeltaV is an insanely complex and inefficient protocol,
> and doesn't fit Subversion's model well at all. The result is that
> Subversion speaks a "limited portion" of DeltaV, and pays a huge price
> for this complexity: speed.
>
> +### GJS: doesn't fit? heh. IMO, it does... you should have seen it
> + *before* I grok'd the design and started providing feedback. it
> + just doesn't match it *precisely*

Sure, I remember. But still, we only implement "just enough" DeltaV
to look like a DAV server to a dumb DAV client. But there are no
3rd-party DeltaV clients that run against mod_dav_svn, and there are
no 3rd-party DeltaV servers that can talk to an svn client. Nearly
every single "interesting" svn_ra.h interface is done through a custom
REPORT -- checkout, update, log, etc. We've surrounded ourselves with
DeltaV formalities that provide a lot of complexity and zero value.

> +
> A typical network trace involves dozens of unnecessary turnarounds
> where the client keeps asking for the same information over and over
> again, all for the sake of following DeltaV. And then once the client
> @@ -33,6 +38,19 @@ has "discovered" the information it need
> custom REPORT request anyway. Most svn operations are at least twice
> as slow over HTTP than over the custom svnserve protocol.
>
> +### GJS: that is the fault of the client, not the protocol. somewhere
> + around the time that I stopped working on SVN for a while, I
> + identified several things that an RA layer could cache in order to
> + avoid looking them up again. nobody ever implemented that. thus,
> + it is hard to claim the fault lies in the protocol when we KNOW we
> + don't have to re-fetch certain items.

But this just makes our existing clients more and more complex. Why
are we doing PROPFINDs at all, ever? Unless we're actually trying to
implement svn_ra_get_prop(), every single PROPFIND we do before our
custom REPORTs are just weird formalities related to DeltaV discovery.
 Even if we completely remove all the redundant PROPFIND requests, we
still get absolutely nothing out of following the DeltaV rules in
these cases.

> +
> +### GJS: turnarounds don't have to be expensive if you can pipeline
> + them. that is one of the (design) reasons for Serf. Again,
> + unrelated to the protocol, but simply the implementation. And
> + proposing a *new* implementation rather than fixing an existing
> + one seems to be a much more monumental task.

I have to do a reality check here: the promise of serf is that
checkouts/updates would be faster than neon, because we could pipeline
a bunch of GET requests rather than suck down the tree in one
response. In practice, serf has proven to be no faster than neon in
this regard. (Or if faster, only by a tiny percent.)

I understand that the "real" promise here is that of caching proxy
servers, which will supposedly deliver serf's original speed promises
to us... but I've lost my faith in this idea:

  * nobody's actually done it yet and demonstrated it

  * given that BDB and/or FSFS is (typically) already serving popular
fs-nodes out of RAM (due to OS caching), I don't think a proxy-server
serving nodes out of RAM will be any faster.

> +
>
> PROPOSAL
> ========
> @@ -41,15 +59,22 @@ Write a new HTTP protocol for svn; map
> requests.
>
> * svn over HTTP would be much faster (eliminate turnarounds)
> +
> +### GJS: again: pipelining.

Are we going to pipeline every single request in svn_ra.h? Does it
even make sense to do so? I don't view it as a magic bullet.

>
> * svn over HTTP would be almost as easy to extend as svnserve.
>
> +### GJS: have we had an issue with extending our current HTTP use?

Let's ask folks who have implement the mod_dav_svn get-lock-report,
get-locations report, mergeinfo-report, etc.

But meanwhile, I maintain it's not so much about extending as it is
about maintaining. Anyone should be able to understand what's
happening in the HTTP layer, debug it, rev an interface by adding a
new parameter. Nobody is scared of doing this in svnserve. To do
this in mod_dav_svn, people need a 3 hour lecture in the architecture
of the system, DeltaV terminology, etc.

> +
> * svn over HTTP would be comprehensible to devs and users both
> (require no knowledge of DeltaV concepts).
>
> * We'd still maintain almost all of Apache's advantages
> (propositions A through E above).
>
> +### GJS: and with the right design, should be able to get the G that I
> + added above.
> +
> * We'd give up proposition F, which hasn't given us anything but the
> ability to mount svn repositories as network drives. (For those
> who *really* need this rarely-used feature, mod_dav_svn would
> @@ -58,6 +83,11 @@ requests.
> * We could freely advertise a fixed syntax for browsing older
> revisions, without worrying about violating DeltaV.
>
> +### GJS: we can do this today. DeltaV certainly doesn't proscribe what
> + we can do in our URL namespace. my reluctance to advertise one was
> + to give us maximum flexibility on the server, and to defer browing
> + to external tools like ViewVC or Trac.

Agreed, this feature isn't tied to a protocol rewrite at all. I'll
remove it from the doc.

> +
>
> MILE-HIGH DESIGN
> ================
> @@ -124,6 +154,9 @@ DESIGN
>
> GET /repos?cmd=get-file&r=23&path=/trunk/foo.c
>
> +### GJS: in order to allow for intermediate caches to work, the URLs
> + need to be well-designed up-front. I can help with that.

Is caching really a goal? Why is it so important on your list?

> +
> For requests which require large data exchanges, the big payloads go
> into request and/or response bodies. For example:
>
> @@ -131,6 +164,11 @@ DESIGN
>
> [body contains complete 'update report' describing working copy's
> revisions; response is a complete editor-drive.]
> +
> +### GJS: no. I've never seen a GET request with a body. You want to
> + avoid that, or you could end up being incompatible with many
> + proxies. If you need input data, then make it a POST, or use one
> + of the other HTTP methods defined by HTTP, WebDAV, or DeltaV.

Ah, ok, so any request bodies will probably need to be POSTs or somesuch.

>
> POST /repos?cmd=commit&keeplocks=true
>
> @@ -170,6 +208,12 @@ DESIGN
> embedded s-expressions, exactly like svnserve does. Heck, maybe we
> could even use svnserve's parsing/unparsing code!)
>
> +### GJS: personally, I'd recommend protocol buffers, or Apache Thrift
> + (a third-party reimplementation of PBs done at facebook). just
> + toss out the formats used in FS and the svn protocol, and let a
> + new/external library do the marshalling for us.
> + http://stuartsierra.com/2008/07/10/thrift-vs-protocol-buffers

Hmmm, interesting.

Honestly, it sounds to me like you might be willing to toss away all
the DeltaV junk, just as long as we still make individual GETs of
(rev, path) objects (1) possible, (2) cacheable, (3) pipelineable.
Those seem to be the things that are important to you.

If that's the case, I *still* think we need a new protocol, and a new
server and client to implement it. The idea is to start clean, build
the smallest possible protocol with the smallest amount of code,
directly corresponding with svn_ra.h as much as possible... and heck,
if we can get pipelining and cacheability in there too, great. But I
cannot comprehend achieving this goal by modifying the existing
codebase.... it would just be adding even more complexity to something
already incomprehensible to most of the svn developers.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-01 00:31:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.