[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: format of svn:author

From: Mark Mielke <mark_at_mark.mielke.cc>
Date: Mon, 02 Jan 2012 03:34:36 -0500

On 01/02/2012 02:52 AM, Alan Barrett wrote:
> On Sun, 01 Jan 2012, Mark Mielke wrote:
>>> Another idea is to change the revprop's value in the pre-commit or
>>> post-commit hook: [...]
>> This is what we've been doing for about two years. It has the
>> consequence that tools don't automatically match unique identifier to
>> commit as they no longer match.
> If your third party tools can't extract the unique ID from svn:author
> = "Display Name <uniqueid_at_domain>" then perhaps the problem lies at
> least as much in your third party tools as in subversion.

I wonder if you thought this through before posting. :-)

You are saying that if I make up an essentially arbitrary scheme, such
as "Display Name <uniqueid_at_domain>", and you have a tool which is
unaware of my scheme, and therefore your tool fails to matches users in
the region because of my scheme - that your tool has the problem?
Despite the documentation for Subversion never mentioning or even
suggesting a convention that you should be responsible for understanding?


The convention must be defined in the Subversion book, and it must be
part of the release notes so that third party tools adhere to the

Otherwise, only extremely casual interpretation can be done of the
field. For example, it can be treated as a unique identifier - but more
like a "foreign key" unique identifier in the sense that it is a key in
some domain, but not necessarily a domain I know about or am an
authority for. This is why tools such as FishEye provide a "committer
mapping" that is precisely this. It allows me to code on a
per-repository basis each of the committer values that I want to
associate with my own FishEye account. This is really horrible for
dozens of repositories and thousands of users. Every user having to
input their own mappings? Yuck, yuck, yuck.

If, instead, a convention was defined such that (and just hand waving
here, I'm not really attached to these details):

     svn:author => unique identifier
     svn:author-name => Mark Mielke
     svn:author-email => mark_at_mark.mielke.cc

Then tools could make much more intelligent decisions on what to do or
show. They could use svn:author as the mapping key, but show name and
email in "svn log" or graphical browsers.

The above model is a simple solution to the problem. More data stored
for every commit. Data which can be used by downstream tools. This has a
benefit in that the data is static which is sometimes good. In a large
project, there is normally a turnover, and accounts that exists or are
active in one year are not necessarily the same as the ones active in
another year. By taking a snapshot of the data at the time of commit, it
represents a permanent record of sorts. ClearCase is a system which does
it this way. Event history records which track such things as object
creation which is the closest map to svn:author have username, domain
(NIS - old school), and fullname.

The other alternative is for a Subversion client to be able to lookup
details for svn:author by asking the server using a published protocol.
This model would allow the server to implement these queries
transparently using LDAP lookups or similar depending on the
requirements of the project. This stores less data for every commit, and
allows for dynamic updates. It would allow for "Mark Mielke" to become
"Mielke, Mark" with a server side configuration, but in contrast to the
previous method, it would not all for a snapshot of history to be taken.
It would be a requirement that the identity management system used on
the server would always have a record for me even after I am gone - or
- alternatively, that the detail would become more vague over time. I
disappear, and my account disappears - so it is left with only a unique
identifier which might not be enough information.

In our particular case, we value all three of: 1) unique identifiers to
be able to do cross referencing of reports between tools, 2) display of
humanly readable names in output such as "svn log" or annotations in
FishEye, ViewVC, Eclipse, or whatever tool the user is using, and 3)
permanent historical record for auditing purposes.

Our exact compromise for the last three years is:

1) original svn:author value arrives on the server as as "1234567" - a
corporate unique identifier
2) pre-commit re-writes svn:author to "Full Name (<original svn:author
3) pre-commit adds <company>:gid as "<original svn:author value>"

Then as I mention - various other tools such as FishEye have explicit
mappings from "Mark Mielke (1234567)" => "1234567" for each Subversion
repository. We're primarily a ClearCase and Perforce shop right now -
but even so, I have several Subversion repository mappings of this form.
It works. It just sucks.

For svn:author to have structure - either internally using punctuation
such as Unix gecos, or separated out as separate attributes - and for
tools to all honour this structure - would be far more ideal. As
Subversion is already well established, separate attributes is probably
the best approach as it would enable forwards and backwards
compatibility for uses of svn:author implemented by the Subversion code
base itself. Tools that know how to access and do intelligent things
with the new fields could feel free to do so. Users of tools that do not
do something intelligent things with the new fields could point to the
Subversion release notes and Subversion book and say "this new attribute
svn:author-name should be recognized by your tool", the change can make
the tool roadmap, and we can all be happy.

Mark Mielke<mark_at_mielke.cc>
Received on 2012-01-02 09:35:22 CET

This is an archived mail posted to the Subversion Dev mailing list.