I generally think this is a solid analysis. I'm a little concerned about the
implications of implementing the "can access to some degree paths which have
a readable child". It seems like this check would only be feasible and
reasonably efficient if we assume that all read access control is done using
mechanisms similar to the current "we have a list of all paths with ACL
stuff set on them in a file" system, whereas one thing I've always liked
about the svn AVL architecture is that in theory you can rip out any of the
explicit-file based systems and replace them with something that does a less
trivial calculation or lookup on the path name at ACL-query time. This new
model would seem to restrict ACL backends to designs where paths that have
access changes are explicitly listed.
On the other hand, it's generally not my style to give much attention to
arguments of the form "but this would limit the ability of us to extend the
system in this theoretically-nice way that nobody has bothered to do in the
past 5+ years of svn" (see: basically any design decision related to the
first three letters of my first name), so maybe this is moot. But I had kind
of assumed that there were installations out there with custom authz
backends (I was certainly considering writing one when I was working on
Google corp svn)... is there any truth to that guess?
On Mar 10, 2009 11:36 AM, "C. Michael Pilato" <cmpilato_at_collab.net> wrote:
Over the past few years, I've found myself moaning about Subversion's
security model. Issue #3242 has finally driven me to post about it.
The Subversion security model is pretty simple. You have paths, you have
the ability to say whether those paths are readable, writable, both, or
neither by anyone or any-specific-one.
The Subversion change transmission model is also pretty simple. You have
the editor (svn_delta_editor_t) concept, which provides an interface for
depth-first tree traversal, query, and modification.
Finally, we have atomic commits.
By themselves, these things are all Just Fine. When put together, we start
to see problems.
The binding of a single editor drive with the atomic commit concept for
commits means that for a single commit, you must use a single editor drive
to touch not just all the things you want to change in the commit, but all
the parents of those things up to and including their longest common
ancestor path. That becomes a problem when our security model's idea of
"read access" is applied.
Consider this operation:
svn move ^/A/B/E/alpha ^/A/D/G/alpha
The editor drive required to do this as an atomic operation is something
like the following:
# do file stuff
This works fine ... unless you find yourself lacking "read" permission on
"/A". Even if you are permitted to read /A/B/* and /A/D/*, Subversion
croaks because it is trying to open "/A". The funny thing is, if you break
the move into two commits (a copy and a delete), the editor drives are like
# do file stuff
"/A" isn't even consulted, so Subversion is happy. You've done effectively
the same thing -- copied a file from one place to another, and then deleted
the original. You just used two revisions to do it. It's the atomicity
requirement, as enforced by the use of the editor tree drive, that breaks
stuff for you.
Folks, this is wrong.
Now, we can point the finger at any number of places here. Some possibilies
* Shame on the editor for requiring a depth-first tree drive that
includes walking uninteresting parent directories.
* Shame on the RA commit mechanism for binding an atomic commit
operation to but a single editor drive.
But here's where I think the real finger-pointing is best aimed:
* Shame on the Subversion security model for not distinguishing
between "you may know that PATH exists" and "you may see PATH's
history, content and metadata".
CollabNet Enterprise Edition's Subversion integration uses an Apache
authorization module that is similar to our stock mod_authz_svn, except that
it uses regular expressions for its rules instead of path prefixes. As
such, it suffers in all the same ways that mod_authz_svn does when problems
like those in issue #3242 crop up.
But our ViewVC integration takes a different approach. It actually
distinguishes between "you can know that PATH exists" and "you can see
PATH's history, content, metadata, etc." The key here is that "you can know
that PATH exists" is answered affirmatively if you have read access to PATH
*or* if you have read access to any of its children. So when you browse our
ViewVC tool, you can see all the stuff you have read access to *plus* the
skeletal parent directory structure required to navigate to those items.
ViewVC will still hide revision metadata and such from those parent
directories, of course. This works great, keeps folks from having to
hand-type ViewVC URLs (they can always navigate from the root of the
repository), and I would argue leaks no information. I mean, if someone
tells you that you have access to "/A/B/D", is it considered a leak of
information to say that both "/A" and "/A/B" exist?
I believe strongly that Subversion could benefit from a switch to this
slightly more relaxed authorization model.
In terms of changes to mod_authz_svn, I would expect a bare minimum: since
it works with path-math anyway, it's a simple matter of string prefix
comparison to determine if you can read something only by virtue of being
able to read one of its [grand]children. In fact, at CollabNet we've even
experimented with exactly such a tweak, mostly with success. The results
were awesome -- you could checkout the root of a repository and only get the
items you had read access to, *even if you didn't have read access to root*.
The benefits to users are obvious -- no more telling different folks to
checkout different portions of the tree because they all have different read
But there were two showstoppers with our naive implementation:
1. First, Subversion has no way to look at the response to a "can I
read this?" query of the auth subsystem and know if the answer means
"Yes, you can read it and know everything about it", or just "You
can know it exists because (duh) you can read its children."
This means that we couldn't blot out metadata from directories
deemed readable solely because of the accessibility of their children.
2. Our current security concession -- the absent-entries leak --
gets arguably out of hand. Today, Subversion leaks the name of
any unreadable immediate children of a readable directory. But
if "readable" means "you can read at least some deep child of it",
then you find yourself leaking siblings all along the chain of
otherwise unreadable parent directories.
I'm sorry to say that I don't have ready solutions to these issues. But I
wanted to begin a dialogue on this topic. Do others see the problems I see?
Do others see the same root causes that I see? Do others have recommended
solutions that I'm not yet seeing?
 Ideally, that is -- issue #3242 exists primary because Subversion
isn't even doing the ideal thing here.
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2009-03-12 21:09:45 CET