[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r33366 - trunk/notes

From: Greg Stein <gstein_at_gmail.com>
Date: Tue, 30 Sep 2008 16:00:40 -0700

In general, I'm not crazy-opposed. You're entirely right: the vision
of WebDAV (or "WebDA") came to fruition. DeltaV did not, so attempting
to adhere strongly to DeltaV really makes little pragmatic sense.

Within the scope of the (new) design, I *do* think it would be
interesting to make it DAV-capable. i.e. is the URL namespace both
DAV-aware *and* svn-aware? Given that DAV does not use POST, then I
maintain you could probably mesh the two pretty easily. The "new"
client would do some interesting GETs and POSTs, and a DAV client (not
svn! a downlevel client) would throw in a couple PROPFINDs, and if we
reach a bit, then some autoversioning around PUT and DELETE.

IOW, what I might suggest is a mesh of your simplified protocol, with
the related DAV support for Windows, Mac, Linux, and other software
DAV-users. An admin could install mod_svn and get speed *and* DAV
capability.

And yes, I want cachability of (rev, path) nodes. If a request can be
satisfied locally, then that is a huge win. You say "serve out of
RAM", and both will do that. But a proxy can do it on your LAN rather
than the server via the WAN. You say "nobody has done it", but that is
an incorrect argument. Nobody has been ABLE to do it because we
glommed shit into a mother REPORT. And serf needs some more work
before people can TRY it. So I respectfully disagree with any argument
about cachability being a non-requirement.

Cheers,
-g

On Tue, Sep 30, 2008 at 3:30 PM, Ben Collins-Sussman
<sussman_at_red-bean.com> wrote:
> On Tue, Sep 30, 2008 at 3:24 PM, <gstein_at_tigris.org> wrote:
>
>> Unfortunately, DeltaV is an insanely complex and inefficient protocol,
>> and doesn't fit Subversion's model well at all. The result is that
>> Subversion speaks a "limited portion" of DeltaV, and pays a huge price
>> for this complexity: speed.
>>
>> +### GJS: doesn't fit? heh. IMO, it does... you should have seen it
>> + *before* I grok'd the design and started providing feedback. it
>> + just doesn't match it *precisely*
>
> Sure, I remember. But still, we only implement "just enough" DeltaV
> to look like a DAV server to a dumb DAV client. But there are no
> 3rd-party DeltaV clients that run against mod_dav_svn, and there are
> no 3rd-party DeltaV servers that can talk to an svn client. Nearly
> every single "interesting" svn_ra.h interface is done through a custom
> REPORT -- checkout, update, log, etc. We've surrounded ourselves with
> DeltaV formalities that provide a lot of complexity and zero value.
>
>
>> +
>> A typical network trace involves dozens of unnecessary turnarounds
>> where the client keeps asking for the same information over and over
>> again, all for the sake of following DeltaV. And then once the client
>> @@ -33,6 +38,19 @@ has "discovered" the information it need
>> custom REPORT request anyway. Most svn operations are at least twice
>> as slow over HTTP than over the custom svnserve protocol.
>>
>> +### GJS: that is the fault of the client, not the protocol. somewhere
>> + around the time that I stopped working on SVN for a while, I
>> + identified several things that an RA layer could cache in order to
>> + avoid looking them up again. nobody ever implemented that. thus,
>> + it is hard to claim the fault lies in the protocol when we KNOW we
>> + don't have to re-fetch certain items.
>
> But this just makes our existing clients more and more complex. Why
> are we doing PROPFINDs at all, ever? Unless we're actually trying to
> implement svn_ra_get_prop(), every single PROPFIND we do before our
> custom REPORTs are just weird formalities related to DeltaV discovery.
> Even if we completely remove all the redundant PROPFIND requests, we
> still get absolutely nothing out of following the DeltaV rules in
> these cases.
>
>> +
>> +### GJS: turnarounds don't have to be expensive if you can pipeline
>> + them. that is one of the (design) reasons for Serf. Again,
>> + unrelated to the protocol, but simply the implementation. And
>> + proposing a *new* implementation rather than fixing an existing
>> + one seems to be a much more monumental task.
>
> I have to do a reality check here: the promise of serf is that
> checkouts/updates would be faster than neon, because we could pipeline
> a bunch of GET requests rather than suck down the tree in one
> response. In practice, serf has proven to be no faster than neon in
> this regard. (Or if faster, only by a tiny percent.)
>
> I understand that the "real" promise here is that of caching proxy
> servers, which will supposedly deliver serf's original speed promises
> to us... but I've lost my faith in this idea:
>
> * nobody's actually done it yet and demonstrated it
>
> * given that BDB and/or FSFS is (typically) already serving popular
> fs-nodes out of RAM (due to OS caching), I don't think a proxy-server
> serving nodes out of RAM will be any faster.
>
>
>> +
>>
>> PROPOSAL
>> ========
>> @@ -41,15 +59,22 @@ Write a new HTTP protocol for svn; map
>> requests.
>>
>> * svn over HTTP would be much faster (eliminate turnarounds)
>> +
>> +### GJS: again: pipelining.
>
> Are we going to pipeline every single request in svn_ra.h? Does it
> even make sense to do so? I don't view it as a magic bullet.
>
>
>>
>> * svn over HTTP would be almost as easy to extend as svnserve.
>>
>> +### GJS: have we had an issue with extending our current HTTP use?
>
> Let's ask folks who have implement the mod_dav_svn get-lock-report,
> get-locations report, mergeinfo-report, etc.
>
> But meanwhile, I maintain it's not so much about extending as it is
> about maintaining. Anyone should be able to understand what's
> happening in the HTTP layer, debug it, rev an interface by adding a
> new parameter. Nobody is scared of doing this in svnserve. To do
> this in mod_dav_svn, people need a 3 hour lecture in the architecture
> of the system, DeltaV terminology, etc.
>
>
>> +
>> * svn over HTTP would be comprehensible to devs and users both
>> (require no knowledge of DeltaV concepts).
>>
>> * We'd still maintain almost all of Apache's advantages
>> (propositions A through E above).
>>
>> +### GJS: and with the right design, should be able to get the G that I
>> + added above.
>> +
>> * We'd give up proposition F, which hasn't given us anything but the
>> ability to mount svn repositories as network drives. (For those
>> who *really* need this rarely-used feature, mod_dav_svn would
>> @@ -58,6 +83,11 @@ requests.
>> * We could freely advertise a fixed syntax for browsing older
>> revisions, without worrying about violating DeltaV.
>>
>> +### GJS: we can do this today. DeltaV certainly doesn't proscribe what
>> + we can do in our URL namespace. my reluctance to advertise one was
>> + to give us maximum flexibility on the server, and to defer browing
>> + to external tools like ViewVC or Trac.
>
> Agreed, this feature isn't tied to a protocol rewrite at all. I'll
> remove it from the doc.
>
>
>
>> +
>>
>> MILE-HIGH DESIGN
>> ================
>> @@ -124,6 +154,9 @@ DESIGN
>>
>> GET /repos?cmd=get-file&r=23&path=/trunk/foo.c
>>
>> +### GJS: in order to allow for intermediate caches to work, the URLs
>> + need to be well-designed up-front. I can help with that.
>
> Is caching really a goal? Why is it so important on your list?
>
>
>> +
>> For requests which require large data exchanges, the big payloads go
>> into request and/or response bodies. For example:
>>
>> @@ -131,6 +164,11 @@ DESIGN
>>
>> [body contains complete 'update report' describing working copy's
>> revisions; response is a complete editor-drive.]
>> +
>> +### GJS: no. I've never seen a GET request with a body. You want to
>> + avoid that, or you could end up being incompatible with many
>> + proxies. If you need input data, then make it a POST, or use one
>> + of the other HTTP methods defined by HTTP, WebDAV, or DeltaV.
>
> Ah, ok, so any request bodies will probably need to be POSTs or somesuch.
>
>
>
>>
>> POST /repos?cmd=commit&keeplocks=true
>>
>> @@ -170,6 +208,12 @@ DESIGN
>> embedded s-expressions, exactly like svnserve does. Heck, maybe we
>> could even use svnserve's parsing/unparsing code!)
>>
>> +### GJS: personally, I'd recommend protocol buffers, or Apache Thrift
>> + (a third-party reimplementation of PBs done at facebook). just
>> + toss out the formats used in FS and the svn protocol, and let a
>> + new/external library do the marshalling for us.
>> + http://stuartsierra.com/2008/07/10/thrift-vs-protocol-buffers
>
> Hmmm, interesting.
>
> Honestly, it sounds to me like you might be willing to toss away all
> the DeltaV junk, just as long as we still make individual GETs of
> (rev, path) objects (1) possible, (2) cacheable, (3) pipelineable.
> Those seem to be the things that are important to you.
>
> If that's the case, I *still* think we need a new protocol, and a new
> server and client to implement it. The idea is to start clean, build
> the smallest possible protocol with the smallest amount of code,
> directly corresponding with svn_ra.h as much as possible... and heck,
> if we can get pipelining and cacheability in there too, great. But I
> cannot comprehend achieving this goal by modifying the existing
> codebase.... it would just be adding even more complexity to something
> already incomprehensible to most of the svn developers.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
> For additional commands, e-mail: dev-help_at_subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-01 01:01:03 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.