[PROPOSAL] Return of the (svnserve) log

From: Jonathan Gilbert <o2w9gs702_at_sneakemail.com>
Date: 2005-10-30 05:13:30 CET

Hello,

I've been using SVN for a while now and I realized suddenly, "svnserve is
quite similar to httpd, so why doesn't it log events like an httpd does?"
Actually, I've been subscribed to the dev list for a while, so the real
thought was "Logging facilities for svnserve were discussed in July, so why
doesn't it *already* log events like an httpd does?" I brought up the issue
on the IRC channel and a discussion ensued in which I learned that Sussman
had been planning to implement it, but only finished mod_dav_svn before the
"Big 1.3 Push". Before he could sink his teeth into the svnserve half of
it, he woke up one morning and found himself an employee of Google instead
of CollabNet :-) As a result, Sussman no longer has time to implement the
logging facility for svnserve. I was thinking I might be up to the task :-)

I'm going to present a list of features that I thought would be good for a
base implementation. I know there is a strong urge to K.I.S.S., but I will
present a rationale for each of the features, and in any event, none of
these features are particularly difficult to implement properly.

The basic concept of logging is, as I started out saying, the same as that
for a web server such as Apache. There are a number of classes of logs, of
which at least two are "access" and "error" (to use the Apache
terminology). The "access" log shows successful day-to-day transactions,
such as checkout/export, commit, mkdir, list, what have you. The "error"
log shows the same kind of actions, but indicates their failure. For
instance, if I attempt to list a directory which doesn't exist, the log
entry for that failed request goes into the "error" log, not the "access"
log. Sussman also mentioned that a third class of error should also be
implemented: "authorization", so that those events which directly pertain
to security can be painlessly split off.

What follows is the list of features that I feel are appropriate, with a
brief explanation of why I think each one should be the way it is:

------------------------------------------------------------------------
1. Possible targets: syslogd, Windows Event Log service, and flat files.

Sussman told me on IRC that the previous discussion of the topic had
decided that flat files would probably be redundant, as syslogd can simply
be configured to redirect the SVN log entries to their own file.

It turns out, however, that the Windows Event Log service does *not*
support this. The closest it can come is allocating a (binary) .evt file
for svnserve. These .evt files have a maximum size, and when the size is
reached, they are not automatically rolled. One of three possible
behaviours can be selected:

1) Old events will be overwritten as needed.
2) Events older than a certain date will be deleted en masse to make room.
3) New events are not logged.

Obviously, the Windows Event Log service is designed for an entirely
different class of logging, where events are expected to be few in number
and relatively infrequent. Microsoft's documentation of the service states:

    Event logging consumes resources such as disk space and processor time.
    The amount of disk space that an event log requires and the overhead for
    an application that logs events depend on how much information you choose
    to log. This is why it is important to log only essential information. It
    is also good to place event logging calls in an error path in the code
    rather than in the main code path, which would reduce performance.

David Anderson mentioned to me that syslogd has a limited number of
"facilities", and that these are what are used by syslogd to split events
off to different files. As such, multiple repositories being handled by
svnserve would all have their log entries sent to the same place. I have
also seen mention in the SVN dev list logs that some people do not wish to
run syslogd for whatever reason. In order to accommodate these people as
well as those using Windows and not force our log messages onto the
available syslog facility codes, I believe it is important to support
directly writing to flat files.

------------------------------------------------------------------------
2. Classes of events: auth, access, and error.

Different system administrators will be interested in different things.
Some people are interested in knowing precisely who is communicating from
where and what user they are purporting to be. Other people are interested
to know what parts of their repository are being accessed the most.
Probably a fair number of people are interested in being able to detect
attacks on their server, which could take the form of denial of service.
For these reasons, I believe it is important to divide logs up primarily
into these 3 categories:

- Auth events (Authentication & Authorization), which involve people
identifying themselves to the server and requesting resources, would
indicate what user account, if any, the user had provided, what IP address
they were connecting from, and, most importantly, whether the attempt was
successful. Failed authorization attempts (attempting to write when the
repository is read-only) would also indicate which resource the user had
attempted to access without authority.

- Access events, which involve people successfully working with the server
using day-to-day functions like "checkout", "commit", "list", "mkdir",
would indicate which user & IP address the request had come from, what the
request type was, and which resource the request involved. If there are
some options which the server can discern for certain requests (such as
perhaps a request for recursion), these should also be noted if they are
available.

- Error events, which involve people who have successfully authenticated
with the server asking it to perform an action it cannot or will not do,
would indicate similar information as access events but also indicate the
cause of the failure, perhaps through the use of a status code.

Of course, not everyone will want error events split off from access
events. Functionality for this is discussed in feature #4 below :-)

------------------------------------------------------------------------
3. Common file format for all plain text log data.

While auth, access and error events do not log precisely the same sets of
information, it should be (as discussed in feature #4 below) at least
possible for an administrator to combine all log information into a single
file. When web servers proliferated and came under widespread use, tools
emerged for analyzing the log files produced by servers and providing
statistics and other analyses. While svnserve is less likely to attract
such tools, it doesn't cost us anything to at least use a common format
when logging any class of event.

In order to be human-readable, such a format should be plain ASCII text,
similar in nature to a web server's logs (this is only an issue on Windows,
where the Event Log service allows arbitrary binary data to be logged). In
order to be machine-parseable, the format should have a fixed number of
fields delimited by spaces, and fields whose content could potentially
contain spaces should (always) be enclosed in quotation marks, with some
provision for escaping (I'm thinking of URLs here, primarily, and as auth
events would not necessarily include a URL, the field could be encoded
using two adjacent quotes in the file ""). The exact format of the log
messages will depend on precisely which data is available, which is
something I will determine when I review svnserve's architecture and the
existing logging facility added to mod_dav_svn by Sussman. It will,
however, most certainly include a date & timestamp.

------------------------------------------------------------------------
4. Config file syntax to allow multiple classes of events to be logged to
the same flat file.

As mentioned earlier, some people will inevitably wish to have a
consolidated log format (myself, for instance), a mechanism to allow
multiple logs to be directed at the same file is required. While this could
be done by simply requesting the same filename:

error-log = access_and_error.log
access-log = access_and_error.log

.. the appearance of this in the config file raises questions in the user's
head: Will the server be smart enough to canonicalize the paths & check
that they are the same file, or will it open the same file with two
separate handles and completely mangle the resulting log data? In addition,
the implementation of this kind of checking is troublesome to say the least.

In order to simplify implementation and remove this dubious appearance from
the config file, I propose the following syntax:

logfile-1 = access_and_error.log
logfile-2 = auth.log

error-log = file 1
access-log = file 1
auth-log = file 2

The precise format of the right-hand-side of each "-log" entry would be to
allow one of the following:

"file N", to use the file indicated by the "logfile-N" directive,
"syslog", to use the UNIX syslog facility (an error on unsupporting systems),
"WindowsEventLog", to use the Windows API (an error when not on Windows).

This syntax suggests a level of abstraction between the event sinks and the
output mechanisms, which I believe is the best way to implement the
functionality.

------------------------------------------------------------------------
5. Configurable behaviour for failure to log an event.

Some people are interested in logged information for important security
reasons; they will see it as an audit trail. Other users of SVN, such as
myself, will be interested purely for informational purposes.

When an audit trail is being produced and the target device becomes full or
otherwise unable to accommodate a log entry, everything grinds to a
terrifying halt, because it would be completely unacceptable to permit
events to proceed without logging them when the administrator has
specifically requested an audit trail. However, if the log information is
not being considered a vital source of information about the behaviour
patterns of those with access to the repository, it would be inappropriate
to deny service in the event that actions cannot be logged.

Therefore, I propose a property to be applied independently to classes of
logs which makes that class guarantee an audit. Disabled by default, this
property would make svnserve refuse to handle requests if it failed to log
them.

I propose the following name for the property in the config file:

auth-log-auditing = on
access-log-auditing = off

If auditing is disabled and logging fails, I propose that svnserve first
attempt to directly log the failure (not the event itself) to syslog, and
if that fails, write it to stderr, which may or may not show up on the
system's console. Understand that this is a last resort :-)

These 5 features seem to me fundamental to any properly functional & usable
logging system for svnserve. If I've missed anything important, just let me
know, of course :-) I'm interested to hear everyone's thoughts on what I've
written here and on logging in svnserve.

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Oct 30 05:16:03 2005

This message: [ Message body ]
Next message: David James: "Re: 1.3.0-rc1 tarballs up for testing/signing"
Previous message: Daniel Berlin: "[PATCH]: Increase size of FSFS dir cache"
Next in thread: Max Bowsher: "Re: [PROPOSAL] Return of the (svnserve) log"
Reply: Max Bowsher: "Re: [PROPOSAL] Return of the (svnserve) log"
Reply: Peter N. Lundblad: "Re: [PROPOSAL] Return of the (svnserve) log"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]