[PROPOSAL] Proxy support for svn://

From: David Anderson <david.anderson_at_calixo.net>
Date: 2005-08-24 00:42:39 CEST

Following events that took place at OSCon 2005 and subsequent
discussions with yks on the dev channel, I am proposing a design for
proxy support within the subversion inhouse protocol, as well as plans
for the associated 'svnproxy' daemon.

ABSTRACT
========

Svnproxy is a daemon that aggregates handles to many real Subversion
repository URLs (all using the svn RA method), and acts as a middle man,
forwarding and mangling communications back and forth between clients
operating on a proxy URL and the real svnserve server the proxy URL maps to.

The new 'proxy' capability added to the ra_svn protocol enables proxy
servers to announce themselves to proxy-savvy clients, who can then
respond in the manner appropriate to set up the proxy relaying with
maximum chances of a successful communication with the proxied server.

MOTIVATION
==========

The obvious: having something that can act as a virtual facade to many
real (and potentially vastly distributed) repositories, with minimal
visible changes for the end user.

A typical use case would be to set up the proxy server on a gateway
between a lan and the internet, to forward client operations to many
different real servers on the lan. In other words, a classic proxy setup.

Now, here is a slightly more contrived variation of this use case,
brought to us by the good folks working on svl. Svl wants to let users
who are behind a NATing firewall publish their local repository(ies) on
a public proxy via a tunnel (ssh port forwarding, whatever). In this
case, svnproxy would manage the actual proxy negociation and protocol
mangling, let svl deal with setting up the tunnel, and basically just
define proxy maps that redirect to localhost on nonstandard ports to go
through the tunnel.

STATE OF THE ART
================

At OSCon US 2005, the svl presentation was to include a live
demonstration, where people went to pull stuff from repositories on
internet. However, the OSCon network setup was firewalling the ports on
which the svnserve processes had been set up to listen on. Strangely
enough though, the official svnserve port was authorized.

So, one monday afternoon, yks joined #svn-dev and asked how the svn
protocol could be tortured into setting up a proxy, that would proxy
requests for svn://proxy_server/real-repos-uuid/path/in/repos to
svn://localhost:1234/path/in/repos. I was in the middle of dissecting
the svn protocol at that time, so we discussed the implications of such
a setup for a bit, and he then went off to "try something with Perl".

And indeed, for all that I know, the svl presentation featured a live
demo, with people going via a tiny perl script that masqueraded as
svnserve and did regular expression mangling of the data flowing through
it to switch URLs. As far as I can remember, there is even a slide
about how they hacked the proxy together in a couple of hours :-).

His implementation works in that it has been tested in public by OSCon
hackers. However, given the nature of its operation, it is not
recommended to breathe too heavily on the whole setup, lest it collapse
into a quantum singularity and destroy the universe. The two main
problems it has are:
- Blind regular expression mangling of the raw data stream. No
interpretation of the data, which means that data other than
command-parameter URLs could get accidentally mangled.
- Due to the reponse-request nature of the svn protocol, the proxy has
to send a greeting where it lists supported protocol versions and
capabilities before the client tells it which repository it wants
accessing. Practically the perl proxy adopts the ostrich tactic: send a
v2-protocol-only greeting, with edit-pipeline capability, then connect
to the proxied server once the client has replied and swallow it's
greeting, praying that it's settings are compatible with those announced
to the client.

The proxy put together for OSCon serves in my opinion as a proof of
concept that wouldn't be that difficult to implement proper proxy
capability right into the svn protocol and client behaviour.

PROPOSAL PART I : THE PROXY CAP
===============================

At first I had launched into a passionate design of ripping the svn
protocol to shreds and emulating a request-response protocol that would
be backcompatible, to enable graceful proxying. Thankfully for my
sanity (and overall reputation as designer), Greg Hudson stopped me in
mid-flight and dropped one of his famous One-Liners of Enlightenment:
"Just implement a 'proxy' capability!"

Based on this initial enlightenment, this is the proposed addition to
the svn RA protocol: Create a proxy capability :-).

In the following, (C) is a subversion client, (S) is a svnserve process
serving repositories, and (P) is a svn RA proxy which stands between (C)
and (S).

A. The client side
------------------

When (C) sees the proxy capability in a server greeting, it knows it is
talking to a proxy and not a regular server.
It should then send back a request with the proxy capability set (to
indicate to (P) that it has caught on) and send the repository URL like
a normal greeting request.

Then, (C) starts over and goes back to expecting the initial server
greeting, which will this time be the greeting from (S), relayed by (P).
(C) resends the same repository URL as in its first response, and can
then proceed with whatever it wanted to do in the first place.

From then on, no mention of proxies or anything else comes into play.
The discussion is a perfectly normal svn RA exchange.

B. The server side
------------------

The simplest yet: (S) is oblivious that anything murky is going on. As
far as it is concerned, (P) is a regular svn client requesting stuff and
getting normal replies.

C. The proxy sides
------------------

A complying proxy server (P) sends a standard greeting to the connecting
(C), with any protocol versions it feels like supporting, and nothing
but the proxy capability set.

Upon receiving the reply from (C), (P) dissects the requested URL and
use the first element of the path as a key in a proxy map table, in
order to find the corresponding root URI for (S). It then connects to
that (S), and sets its translation engine to transpose from one URL
space to the other.

If (C) replied with the proxy cap set (ie. "I understand you are a proxy
server, let's do this"), then just pipeline (S)'s greeting back to (C)
and let them work it out amongst themselve.

If (C) did not acknowledge the proxy capability, either disconnect him
with "Error: You need a client that can talk to proxies"; or connect to
(S) anyway, and see if the protocol version and caps (C) selected are
compatible with what (S) offers. If so, forward (C)'s initial response
on to (S). (P) essentially becomes a transparent proxy if the initial
negociations happen to be compatible.

D. Example session
------------------

(C) wants to commit to svn://magic.mushroom.server/shroom/spores .
Little does it know that magic.mushroom.server is in fact a world famous
hallucinogenic version control proxy set up in the Caiman Islands, which
hides the real location of (S) (somewhere in lower Amsterdam) from the
rest of the world's police.

(P) rewrites the requested client URL into proxy_map['shroom'] +
"/spores/", connects to the resulting server, starts to relay
communications, scrambling URLs as required.

(S) receives (C)'s greeting reply, mangled by (P):
( 2 ( edit-pipeline ) svn://shroom.badtrip.nl/spores/ )

And so on and so forth.

PROPOSAL PART II: THE SVNPROXY DAEMON
=====================================

The svnproxy daemon is an implementation of (P) in the above discussion
of the proxy capability. It operates on the svn_ra layer and slightly
below (connection establishment and such is done manually, as in
svnserve). Aside from the behaviour described above, it processes
commands through callbacks in a way much the same as svnserve. The
difference is that for most commands it calls a "passthrough" handler,
which just blindly relays the request and response back and forth
without changing anything.

For the few commands and responses that have URLs in their parameters,
the commands are parsed, the URL extracted, mangled into the correct (S)
or (C) side URL, and the command is then reconstructed and sent on.

A. Configuration
----------------

Svnproxy has very little internal configuration, and can operate off a
single configuration file with one section, and up to one configuration
file with 2 sections and a password store.

The file that is always present is the proxy configuration file
svnproxy.conf, which is composed of one mandatory section and one
optional section. The first section defines a name:repository-url map
which svnproxy uses to establish connections to the various (S) servers
and to mangle URLs properly. The second section is optional, and
defines the access rights to the proxy configuration, in the same way
that blanket access directives and a password file are defined in
svnserve.conf.

If the auth section is omited, all access to the proxy configuration,
read or write, is denied. More on what the hell I'm thinking about here
a little further down.

The second file is - you guessed it - the optional passwd datastore, if
defined in svnproxy.conf. Same format as passwd.

With this configuration (passed on the commandline like the root of
repositories to serve is passed to svnserve), svnproxy can initialise
its proxy map and start servicing requests.

B. On-line proxy map edition
----------------------------

One of the requirements of the svl people is that svnproxy should be
usable as a "repository PA system". That is, users (authenticated and
authorized or anonymous, depending on the setting) should be able to
edit the proxy map of a running svnproxy, not just admins with access to
the flat file proxy map.

A nice way to do this would be to do it in such a way that we don't need
to listen for an alternate protocol on a different port and whatnot.

One way to do this is to have svnproxy build a virtual repository when a
client requests the root URL, with no proxy map name. In many ways
similar to the /sys filesystem, svnproxy would construct a repository
that looks to the client to be totally empty and at r0. However, all
the defined proxy maps are present, represented as individual revision
properties. Access to this virtual repos is restricted by the
configured auth access rules.

For example, a map from 'shroom' to 'svn://shroom.nl/repos' would appear
in this configuration virtual repository as the revprop 'proxy:shroom'
-> 'svn://shroom.nl/repos'.

Svnproxy only accepts revprop related commands on this virtual
repository. All other attempts at repository manipulation result in
access denied errors. This allows clients with read access to the
configuration virtual repository to view the currently active mappings,
and clients with write access to define new/edit/delete mappings by
altering revprops.

Modifications made to the virtual repository are impacted back to the
flat configuration file, so that the last 'live' configured state is
restored when svnproxy restarts.

A specialised UI for handling proxy remote configuration will be
written, in the form of a python script, svnproxyctl.py, which connects
to the proxy configuration repository and messes with revprops, all
hidden in a nice little interface. Svl can probably be made to
integrate some kind of abstraction to configure these proxies (which
appear to be of some importance to it) in a more intuitive manner.

Of course, if you don't want to let remote people reconfigure your
lovely proxy, just completely deny access, and svnproxy will deny it's
configuration virtual repository ever existed and dismiss your requests
as liberal propaganda.

YOUR AD HERE
============

What do people thing of all this? I am relatively satisfied with
everything, except for the auth done for the virtual configuration
repository, because I don't feel it offers the level of control that svl
would feel happy to work with. I've ben toying with fine grained
credentials such as anon = add-only, or something similar, but I'm not
sure what svl (and others) would actually need in this department.

Oh, and there's of course the issue that "Your virtual repository is
Evil, because it opens access to editing unversionned server
configuration from a client!!" I believe that with proper auth control
on the access to the configuration repository, that is no problem.
Proxy configuration is as open as you make it.

As for the unversionniness... Well, svnproxy could be made to operate
off a full svn repository that would contain it's (versionned)
configuration. But I don't see any real use that would justify this
degree of added setup complexity.

What do y'all think?

- Dave.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Aug 24 00:43:27 2005

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: Older versions through http-repository browsing"
Previous message: C. Michael Pilato: "Re: Older versions through http-repository browsing"
Next in thread: David Anderson: "Re: [PROPOSAL] Proxy support for svn://"
Maybe reply: David Anderson: "Re: [PROPOSAL] Proxy support for svn://"
Reply: David Anderson: "Re: [PROPOSAL] Proxy support for svn://"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]