[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Auditing Design - v1 (was: Re: [merge-tracking] 'svn blame' auditing)

From: Daniel Rall <dlr_at_collab.net>
Date: 2007-02-24 03:54:04 CET

On Thu, 22 Feb 2007, Hyrum K. Wright wrote:

> Hyrum K. Wright wrote:
> Thanks to all those who have commented on this thread so far. This
> document is a preliminary attempt to incorporate those comments into a
> cohesive design. Comprehensive review and suggestions are encouraged.

Hyrum, this looks great, and asks all the right questions. Once we
finalize things, I'll roll your work into the func spec, and any
nitty-gritty details into the design doc.

> Merge Tracking Commutative Author Reporting
> or
> "How to pass the buck"
> The Problem
> -----------
> With the advent of merge tracking, users want the ability to obtain
> usable information about with revisions have been merged to a given
> location, where a given revision has been merged to, and who the
> original author of content within a file is. Collectively, these
> reporting operations are referred to as "Merge Auditing".

The requirements/use cases can be found here:


> This design seeks to address only the third aspect of auditing:
> reporting commutative authorship of a file by using 'blame', 'log',
> 'status --show-updates' and 'info', hereafter referred to as "reporting
> commands".

Are there any other commands which report user names and revisions
from a previous commit? I can't think of any.

> Proposed Behavior
> -----------------
> To maintain backward compatibility, all the reporting commands will
> continue to to behave as they currently do. A '--merge-sensitive' (we
> can bikeshed about the name later) command-line switch will be added to
> the client to enable the extra merge information requested. This switch
> will also work with '--xml' to include the additional merge information
> in xml-ized output.

I like --merge-sensitive. Alternatives (which I don't necessarilly
like better):


What will --merge-sensitive do for reverts (e.g. commits which remove
merge info)?

> Proposed changes with the '--merge-sensitive' switch are:
> 'svn log':
> The original log message, in the current format, with the addition of a
> list of revisions that have been merged into the target. The
> '--verbose' flag would output the log information for the merged
> revisions as well.

I addition to the list of merged revisions, we should provide a "merge
source" path. A single merge could theoretically have multiple merge
sources, and each source could have multiple revisions (what's called
a "range list" on the merge-tracking branch). I've added a
description of the merge-tracking branch's new data structures to the
design doc:


Showing all original log messages could be quite verbose in some cases
(e.g. after sync'ing a feature branch with trunk). That said, I agree
that we should provide easy access to that information.

> (Question: What is the best way to visually represent all the data
> being returned by 'svn log --merge-sensitive'?)

To represent the merge info, we probably want to use output consistent
with what we did for diffs of the "svn:mergeinfo" property. We
currently just stringify the merge info, showing it similarly to how
we record it in the "svn:mergeinfo" property (but with an additional
"r", for some reason):

$ svn ps svn:mergeinfo "/trunk:3" .
$ ./subversion/svn/svn di -N .

Property changes on: .
Name: svn:mergeinfo
   Merged /trunk:r3

If we want to do something fancier for 'log', we should change 'diff'
as well.

As far as the original log messages go (for 'log --merge-sensitive
--verbose'), we could do something like svnmerge.py's generated log

$ cat svnmerge-commit-message.txt
Merged revisions 23490 via svnmerge from

  r23490 | dlr | 2007-02-23 18:19:06 -0800 (Fri, 23 Feb 2007) | 1 line
  * www/merge-tracking/design.html (data-structures): Add new section.

It indents the original log message, and adds separators between them.
This seems like a reasonable style of output to me.

> 'svn blame':
> Two additional columns for each line, one with the original revision
> number, and one with the original author of that line. Unlike other
> commands, we do not need to worry about multiple source revisions,
> because each line can have at most one author.

Yup, but determine which of a merge's source revisions changed a line
is going to be tricky.

> 'svn info':
> Add two more pieces of information: 'Last Original Author:' and 'Last
> Original Revision:'. Note that these could either be set by a merge to
> the node, or a commit to the node. In the latter case, they would be
> the same as the 'Last Author' and 'Last Revision'.
> (Question: Do we need '--merge-sensitive' to do this? Should the 'Last
> Original {Author,Revision}' lines appear if only different from 'Last
> {Author, Revision}'?)

As the output must remain machine-parsable, we need --merge-sensitive
to toggle the expected format.

> 'svn status --show-updates':
> Add additional columns, reflecting the last original authors and revisions.
> Implementation
> --------------
> Each of the reporting commands, although separate, will need to access
> the merge info in a similar way. The implementation will provide a
> clean interface, so that all of the client APIs can access the reporting
> logic. It is anticipated that most of this logic will live in
> libsvn_repos, with appropriate parameters punched back through the RA
> layer to the client.

For all cases discussed above, we're looking at an API which takes one
or more paths and a revision (range?), and returns the merge info
which was added or removed in that revision. For the 'blame' case, we
may also need to include some type of line number parameter, the
handling of which is going to get ugly, since our FS isn't based on a


In svn_repos.h on the merge-tracking branch, we have this API:

 * Fetch the merge info for @a paths at @rev, and save it to @a
 * mergeinfo. @a mergeinfo is a mapping of @c char * target paths
 * (from @a paths) to textual (@c char *) representations of merge
 * info (as managed by svn_mergeinfo.h), or @c NULL if there is no
 * merge info visible or available.
 * When @a include_parents is @c TRUE, include inherited merge info
 * from parent directories of @a paths.
 * If @a revision is @c SVN_INVALID_REVNUM, it defaults to youngest.
 * If optional @a authz_read_func is non-NULL, then use this function
 * (along with optional @a authz_read_baton) to check the readability
 * of each path which merge info was requested for (from @a paths).
 * Silently omit unreadable paths from the request for merge info.
 * Use @a pool for temporary allocations.
 * @since New in 1.5.
svn_error_t *
svn_repos_fs_get_merge_info(apr_hash_t **mergeoutput,
                            svn_repos_t *repos,
                            const apr_array_header_t *paths,
                            svn_revnum_t revision,
                            svn_boolean_t include_parents,
                            svn_repos_authz_func_t authz_read_func,
                            void *authz_read_baton,
                            apr_pool_t *pool);

I think this doesn't do exactly what you want, because it gives you
the entire current state of the merge info for some paths, rather than
telling you only the merge info which changed in a given revision. We
should already be recording the information you want, but you'll need
to examine the FS code changed on the merge-tracking branch to see how
to get it out of current sqlite schema.

This API might be a good place to look:

/* Get the merge info for the set of PATHS (an array of
   absolute-in-the-fs paths) under ROOT and return it in *MERGEINFO,
   mapping char * paths to char * strings with mergeinfo, allocated in
   POOL. If INCLUDE_PARENTS is TRUE elide for mergeinfo. If a path
   has no mergeinfo, just return no info for that path. Return an
   error if the mergeinfo store does not exist or doesn't use the
   'mergeinfo' schema. */
svn_error_t *
svn_fs_merge_info__get_merge_info(apr_hash_t **mergeinfo,
                                  svn_fs_root_t *root,
                                  const apr_array_header_t *paths,
                                  svn_boolean_t include_parents,
                                  apr_pool_t *pool);

> Outstanding Question(s)
> -----------------------
> As this design is still preliminary, there are outstanding questions.
> * From a UI perspective, how best do we handle multiple source revisions
> in a single merge revision? That's potentially *a lot* of data. Do we
> only need handle the most recent original revision in a merge?

In most cases, we can represent multiple source revisions using a
range notation (e.g. 3-7). However, this doesn't fix the case where a
merge cherry-picks a large number of revision ranges
(e.g. 3-7,11,17-20,42-50,71,80-132). I think that this is something
that we are just going to have to live with.

Another question:

* How are we going to implement 'blame --merge-sensitive'? It seems
like we need to look at each delta in our set of merge sources,
figure out which lines it changes, and use the most recent (youngest)
revision/author for each line of our blame output.

> Miscellany
> ----------
> Although not required as part of the Subversion 1.5 "Merge Tracking"
> release, auditing is a significant "nice to have" feature, which many
> users will find useful. Once the design is finalized, whatever help
> people can contribute toward getting it implemented would be appreciated.

mbk mentioned on IRC that he required auditing features, and might be
available to help out with the work. I've added him to the CC list.

  • application/pgp-signature attachment: stored
Received on Sat Feb 24 03:58:38 2007

This is an archived mail posted to the Subversion Dev mailing list.