[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PROPOSAL] Return of the (svnserve) log

From: Max Bowsher <maxb_at_ukf.net>
Date: 2005-10-30 11:16:42 CET

Jonathan Gilbert wrote:
> Hello,
> I've been using SVN for a while now and I realized suddenly, "svnserve is
> quite similar to httpd, so why doesn't it log events like an httpd does?"
> Actually, I've been subscribed to the dev list for a while, so the real
> thought was "Logging facilities for svnserve were discussed in July, so
> why
> doesn't it *already* log events like an httpd does?" I brought up the
> issue
> on the IRC channel and a discussion ensued in which I learned that Sussman
> had been planning to implement it, but only finished mod_dav_svn before
> the
> "Big 1.3 Push". Before he could sink his teeth into the svnserve half of
> it, he woke up one morning and found himself an employee of Google instead
> of CollabNet :-) As a result, Sussman no longer has time to implement the
> logging facility for svnserve. I was thinking I might be up to the task
> :-)
> I'm going to present a list of features that I thought would be good for a
> base implementation. I know there is a strong urge to K.I.S.S., but I will
> present a rationale for each of the features, and in any event, none of
> these features are particularly difficult to implement properly.

Excellent, a design proposal is _definitely_ the right place to start.

> The basic concept of logging is, as I started out saying, the same as that
> for a web server such as Apache. There are a number of classes of logs, of
> which at least two are "access" and "error" (to use the Apache
> terminology). The "access" log shows successful day-to-day transactions,
> such as checkout/export, commit, mkdir, list, what have you. The "error"
> log shows the same kind of actions, but indicates their failure. For
> instance, if I attempt to list a directory which doesn't exist, the log
> entry for that failed request goes into the "error" log, not the "access"
> log. Sussman also mentioned that a third class of error should also be
> implemented: "authorization", so that those events which directly pertain
> to security can be painlessly split off.
> What follows is the list of features that I feel are appropriate, with a
> brief explanation of why I think each one should be the way it is:
> ------------------------------------------------------------------------
> 1. Possible targets: syslogd, Windows Event Log service, and flat files.
> Sussman told me on IRC that the previous discussion of the topic had
> decided that flat files would probably be redundant, as syslogd can simply
> be configured to redirect the SVN log entries to their own file.
[snip reasoning I agree with]
> I believe it is important to support directly writing to flat files.

+1. An additional reason is to support logging from a svnserve daemon run by
a non-root user, who won't be able to tweak syslog configuration.

> ------------------------------------------------------------------------
> 2. Classes of events: auth, access, and error.
> Different system administrators will be interested in different things.
> Some people are interested in knowing precisely who is communicating from
> where and what user they are purporting to be. Other people are interested
> to know what parts of their repository are being accessed the most.
> Probably a fair number of people are interested in being able to detect
> attacks on their server, which could take the form of denial of service.
> For these reasons, I believe it is important to divide logs up primarily
> into these 3 categories:
> - Auth events (Authentication & Authorization), which involve people
> identifying themselves to the server and requesting resources, would
> indicate what user account, if any, the user had provided, what IP address
> they were connecting from, and, most importantly, whether the attempt was
> successful. Failed authorization attempts (attempting to write when the
> repository is read-only) would also indicate which resource the user had
> attempted to access without authority.

Why failed only? "User X was granted access to something" isn't very useful.
It would probably also be appropriate to log the SASL mechanism in use.

> - Access events, which involve people successfully working with the server
> using day-to-day functions like "checkout", "commit", "list", "mkdir",
> would indicate which user & IP address the request had come from, what the
> request type was, and which resource the request involved. If there are
> some options which the server can discern for certain requests (such as
> perhaps a request for recursion), these should also be noted if they are
> available.

A first draft of the list of the events would then be simply the list of
main commands, from the ra_svn protocol document.

> - Error events, which involve people who have successfully authenticated
> with the server asking it to perform an action it cannot or will not do,
> would indicate similar information as access events but also indicate the
> cause of the failure, perhaps through the use of a status code.
> Of course, not everyone will want error events split off from access
> events. Functionality for this is discussed in feature #4 below :-)
> ------------------------------------------------------------------------
> 3. Common file format for all plain text log data.
> While auth, access and error events do not log precisely the same sets of
> information, it should be (as discussed in feature #4 below) at least
> possible for an administrator to combine all log information into a single
> file. When web servers proliferated and came under widespread use, tools
> emerged for analyzing the log files produced by servers and providing
> statistics and other analyses. While svnserve is less likely to attract
> such tools, it doesn't cost us anything to at least use a common format
> when logging any class of event.

Hmm. Apache's access and error logs do not use a common format. I'm of the
opinion that the intrinsically different nature of the events will prevent a
fully common format. Instead, perhaps the first few fields on a line can be
common, with the later fields being class-specific?

> In order to be human-readable, such a format should be plain ASCII text,
> similar in nature to a web server's logs (this is only an issue on
> Windows,
> where the Event Log service allows arbitrary binary data to be logged). In
> order to be machine-parseable, the format should have a fixed number of
> fields delimited by spaces, and fields whose content could potentially
> contain spaces should (always) be enclosed in quotation marks, with some
> provision for escaping (I'm thinking of URLs here, primarily, and as auth
> events would not necessarily include a URL, the field could be encoded
> using two adjacent quotes in the file "").

I don't know exactly what apache does, but it might be worth checking up on
that, and possibly following their example.

> The exact format of the log
> messages will depend on precisely which data is available, which is
> something I will determine when I review svnserve's architecture and the
> existing logging facility added to mod_dav_svn by Sussman. It will,
> however, most certainly include a date & timestamp.
> ------------------------------------------------------------------------
> 4. Config file syntax to allow multiple classes of events to be logged to
> the same flat file.
> As mentioned earlier, some people will inevitably wish to have a
> consolidated log format (myself, for instance), a mechanism to allow
> multiple logs to be directed at the same file is required. While this
> could
> be done by simply requesting the same filename:
> error-log = access_and_error.log
> access-log = access_and_error.log
> .. the appearance of this in the config file raises questions in the
> user's
> head: Will the server be smart enough to canonicalize the paths & check
> that they are the same file, or will it open the same file with two
> separate handles and completely mangle the resulting log data? In
> addition,
> the implementation of this kind of checking is troublesome to say the
> least.
> In order to simplify implementation and remove this dubious appearance
> from
> the config file, I propose the following syntax:
> logfile-1 = access_and_error.log
> logfile-2 = auth.log
> error-log = file 1
> access-log = file 1
> auth-log = file 2
> The precise format of the right-hand-side of each "-log" entry would be to
> allow one of the following:
> "file N", to use the file indicated by the "logfile-N" directive,
> "syslog", to use the UNIX syslog facility (an error on unsupporting
> systems), "WindowsEventLog", to use the Windows API (an error when not on
> Windows).
> This syntax suggests a level of abstraction between the event sinks and
> the
> output mechanisms, which I believe is the best way to implement the
> functionality.

No, *PLEASE*, no. The whole numbered logfile idea might make sense to you,
having approached this from a "how shall I code this" perspective, but it
feels painfully unintuitive from an outside perspective.

Simply canonicalize, then strcmp the filenames to tell if they are the same
file, and then take an advisory write lock on the files to guard against the
server admin doing something stupid with symlinks.

Then, put a commented out example in the default config file to demonstrate
that this is possible.

> ------------------------------------------------------------------------
> 5. Configurable behaviour for failure to log an event.
> Some people are interested in logged information for important security
> reasons; they will see it as an audit trail. Other users of SVN, such as
> myself, will be interested purely for informational purposes.
> When an audit trail is being produced and the target device becomes full
> or
> otherwise unable to accommodate a log entry, everything grinds to a
> terrifying halt, because it would be completely unacceptable to permit
> events to proceed without logging them when the administrator has
> specifically requested an audit trail. However, if the log information is
> not being considered a vital source of information about the behaviour
> patterns of those with access to the repository, it would be inappropriate
> to deny service in the event that actions cannot be logged.
> Therefore, I propose a property to be applied independently to classes of
> logs which makes that class guarantee an audit. Disabled by default, this
> property would make svnserve refuse to handle requests if it failed to log
> them.
> I propose the following name for the property in the config file:
> auth-log-auditing = on
> access-log-auditing = off
> If auditing is disabled and logging fails, I propose that svnserve first
> attempt to directly log the failure (not the event itself) to syslog, and
> if that fails, write it to stderr, which may or may not show up on the
> system's console. Understand that this is a last resort :-)

The config property name is not at all intuitive, but the rest of the
discussion is good.
I'd suggest "fail-on-{access,auth}-log-failure" is more likely to allow
someone seeing it without any prior specific knowledge to understand what it

I'm also not sure that writing to syslog (at least unconditionally) is the
right thing to do. Without a configured facility and severity, who knows
where the log event would go.

> These 5 features seem to me fundamental to any properly functional &
> usable
> logging system for svnserve. If I've missed anything important, just let
> me
> know, of course :-) I'm interested to hear everyone's thoughts on what
> I've
> written here and on logging in svnserve.

One thing that might be worth (at least briefly) considering is: Can we
borrow code from Apache to help us with this? (And then long term, feed that
code back into APR(-util))


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Oct 30 11:17:48 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.