Hi,
This summer, I'll be implementing path-based access control in svnserve,
as my SoC project. The following is my more detailed thoughts on this
(dare I say an implementation plan?). If you fancy a reading with some
formatting, you can read the same info on the wiki page I've set up for
my personal bookkeeping:
https://ssl.natulte.net/wiki/dave/wakka.php?wiki=SummerOfCode
====Objective====
The objective of my task is to implement path-based access control
(authz) in the standalone Subversion server. Currently, path-based
access control only works with the HTTP Repository Access method,
through the use of mod_dav_svn and mod_authz_svn in apache2. This makes
svnserve look bad, because it offers only blanket access control (all or
nothing).
I'll consider this task complete when both mod_dav_svn and svnserve can
grant or deny access to filesystem resources based on the same format of
ACL file, with as little code replication in various modules as
possible. If I finish before my SoC time is up (which I certainly
hope!), I'll move on to the enhancements part of the plan, and/or onto
other issues of svnserve.
====A summary of questions for the busy people====
- Is having a function in libsvn_repos that doesn't actually use a
svn_repos_t acceptable, or should the authz routines move to another lib?
- How do we cache the authz file (svn_config_t) in svnserve? Keep it
loaded and reload on SIGUSR1 or similar?
- What enhancements to the authz file format are badly needed by the
community?
- Is caching authz info in svnserve a stupid idea (see detailed
proposal) ?
- (for those who've had the courage to walk the rest of the proposal)
Are there points to which ou object or on which you have further
notes/comments/observations/flames ?
====Subprojects====
Getting access control into both mod_dav_svn and svnserve will require
somewhat redesigning and shifting of the current authz code.
===Authz utility routines in libsvn_repos===
The simplest way to allow both mod_dav_svn and svnserve to access authz
files in an identical fashion is to move the code that does the parsing
and analysing of authz files to libsvn_repos. There are two
possibilities here:
- Either we decide that each repository has an authz file defined in
its configuration, in which case checking authz is simply a matter of
extracting the authz file path given the svn_repos_t; or
- We go with flexibility and provide a routine that opens an authz
file given its full path, and let mod_dav_svn and svnserve decide for
themselves what path should be opened.
I can see the unifying merits of the first solution, but I'd go with
opening a path rather than systematically going via a svn_repos_t, for
both its flexibility and lesser disruptiveness to the way mod_dav_svn
currently works. This however makes the function not use a svn_repos_t.
Is this acceptable for a function in libsvn_repos?
In libsvn_repos, we basically want a single public routine that checks a
path and requested access level against the ACL definition file. There
will be a few internal functions to walk the authz file and parse the
sections to work out the authorization. No need to provide a function to
locate the repository authz file, as one can expect svnserve to be
capable of parsing its own config file to work this out. mod_dav_svn
already locates its own and caches the info, so no problem there. The
public interface changes therefore sum up to:
####
// In include/svn_repos.h
/**
* Check wether @a user can access @a path in the repository @a
* repos_name with the @a required_credentials. The access control
* lists are supplied in @a cfg, and the updated access mask is
* returned in @a *granted_access.
*
*/
svn_error_t *
svn_repos_authz_check_access(svn_config_t *cfg, const char *repos_name,
const char *path, const char *user;
int required_access, int *granted_access,
apr_pool_t *pool);
####
===Alter mod_dav_svn to use the new routines===
The foremost problem as I see it that the AuthzSVNAuthoritative
configuration directive that can be set for mod_dav_svn/mod_authz_svn
wouldn't be usable if the authz is delegated to libsvn_repos, as it
relies on other apache2 modules. This would mean keeping mod_authz_svn
with the authz hooks and URI checking code, so that mod_dav_svn can
still delegate access control to some extent. So, actually, no problem
at all :-) Just cut in the libsvn_repos routines instead of the
mod_authz_svn internal ones that are currently used, and Zark's your
brother.
===Alter the libsvn_repos commit editor to hook write operations===
The mod_dav_svn RA method doesn't use libsvn_repos' commit editor, so
there are no authz hooks on write operations. Svnserve will need these,
so we need to add them (possibly reviewing function and callback
prototypes and propagating change accordingly).
===Write the authz code in svnserve===
The code to write is split in two kinds:
- The authz callback code, which is basically glue between the commit
editor and the authz utility routines, and
- The code to call authz routines directly for certain operations
which do not go through the commit editor.
On a side note, it seems mod_authz_svn currently uses Apache/APR to
cache the authz config file, using apr_pool_userdata_set
(mod_authz_svn.c:485). This doesn't give much in the way of long term
caching (the lifetime of a few requests), but this is normal given
mod_dav_svn's mode of operation. Svnserve on the other hand is
longer-lived, and could probably load and cache the content of the authz
file on a semi-permanent basis, to avoid the overhead of opening and
parsing the file. //Should// it do so? Introducing configuration caching
for authz files implies a reloading mechanism, via SIGUSR1 or a similar
notification mechanism.
===Expand the semantics of the authz file format===
If required, this refactoring could be the occasion to implement some
new behaviour for authz files. I currently don't have the need for
enhancements, as the authz in mod_authz_svn does exactly what I need.
However, if people out there badly need something, speak up, and we can
think about getting it into the authz code.
One suggestion was to be able to distinguish "Any **authenticated**
user" from "Any user at all, anonymous or authenticated", perhaps using
* and ** to distinguish them. However, this seemed to pose some
technical problems, as outlined by Greg Hudson in (
http://svn.haxx.se/dev/archive-2005-05/0043.shtml ). I don't yet have
sufficient grasp of the mechanism of updates to understand his argument,
if someone could expand on this...
===Optimize the operation of authz code===
Currently and as far as I know, authz has a somewhat bad reputation of
slowing down access to the repository, because of all the path checks
that need to be performed. In the immediate sense, I don't really see
how this could be bettered. There may be a little optimisation possible
when the caller requests a recursive access lookup, but the speed impact
will probably stay.
Another possibility I've been loosely thinking about is adding a flag to
what the authz routine can return in the way of granted access,
something along the lines of a AUTHZ_BLANKET. The idea being that, even
when svnserve doesn't request recursive access, the authz code can
return a flag saying "Oh and by the way, if you fancy you can access
anything below this with those rights as well". The point being that if
svnserve is made aware of that flag, it can build itself a cache lookup
dictionary, with paths as keys and user/access rights as values. Any
path that falls within one of the cached paths and whose requested
rights are less or equal to the cached ones is granted access without
diving into the authz parsing code.
To clarify this, I'll take an example. Say you're running the greek tree
repository in svnserve with authz access control. An anonymous user
request triggers an authz lookup for READ access on /A for the anonymous
user. The authz code mulls over the ACL file, and decides that not only
does the anonymous user have READ access to /A, but that there are no
other ACL rules denying him read access for paths within /A . So the
authz routine returns the granted rights (READ & BLANKET). Svnserve
notices the BLANKET flag is set, and sets a cache entry: "*:/A" -> "r".
It then goes on to reply to the client as it normally would.
Then, another client (or the same one come to that) requests anonymous
READ access to /A/B/lambda . Before invoking the authz function which
checks this against the ACL file, svnserve checks wether the requested
user:path exists in the cache dictionary. In this case, *:/A exists,
which is a parent path for the requested path. Svnserve then looks up
the cached rights, and sees that the authz has already granted blanket
read access to everything contained in /A for the anonymous user. So it
returns, successfully granting READ access without making authz reparse
the whole ACL file.
Now, another request comes in which triggers an authz request for WRITE
access to /A/mu for the anonymous user. Svnserve looks up its cache,
sees that it has a cached access rule for *:/A. However, the value only
indicates that blanket READ access was granted; nothing is known about
write access. So, svnserve calls the authz routines, which parse the ACL
file and decide in the usual way. On the way back up, if the authz
routine has granted blanket access, add (or modify) an entry for that
path, adding blanket write to the cached rights.
Please note this is off the top of my head for the time being. Even now
I can see a few problems with this:
- Each and every request to the authz routines will trigger an
analysis of the complete authz file, in order to work out wether blanket
access can be granted or not. This can probably be avoided by having
svnserve specify when it wants to have blanket access looked up, so that
it can be disabled and alltogether bypassed by mod_dav_svn, which has no
possibility I know of of caching such information durably. Actually, the
caching behaviour can probably be obtained by forcing the RECURSIVE flag
for all authz requests made by svnserve, so that "blanket" access to
subdirectories is automagically checked. That way, no extra code in the
authz routines, only in svnserve.
- Caching adds another layer of information storage, which will need
to be looked after by svnserve and flushed when a reload is requested.
- The speedup I'm hoping for (because of the relative cost of a few
string comparisons and lookups vs. parsing a potentially large file)
might not actually happen. My intuition is that there would probably be
some speedup, but is it worth it?
- As I said earlier, mod_dav_svn and friends don't have, as far as I
know, a way of durably caching data over several requests. Therefore the
benefits of the cache speedup would be lost for the HTTP RA method. Is
it desirable to give such an advantage to one RA method over the other?
====Conclusion====
That's all I can think of now. As you can probably see, the grey area
for me is the commit editor and adding of hooks. I haven't explored this
much yet, and so only have a vague idea of how exactly to go about it.
I'll be looking into that later today, and once I've somewhat cleared up
my doubts I'll start with the coding... Assuming the plan I'm outlining
is satisfactory of course.
Thanks to those still reading this. :-)
- Dave.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 29 16:33:55 2005