While investigating the newly opened issue #3830[1], I found myself very
quickly running into wall after wall with the approach I'd hoped to take.
The more I thought about and attempted to classify the problems I was
seeing, the more something became apparent to me: the svn_repos_get_logsX()
API claims to do one thing, but takes an approach that's simply incompatible
with that claim.
Let's take a look at the first part of that function's docstring:
/**
* Invoke @a receiver with @a receiver_baton on each log message from
* @a start to @a end in @a repos's filesystem. @a start may be greater
* or less than @a end; this just controls whether the log messages are
* processed in descending or ascending revision number order.
*
* If @a start or @a end is #SVN_INVALID_REVNUM, it defaults to youngest.
*
* If @a paths is non-NULL and has one or more elements, then only show
* revisions in which at least one of @a paths was changed (i.e., if
* file, text or props changed; if dir, props or entries changed or any node
* changed below it). Each path is a <tt>const char *</tt> representing
* an absolute path in the repository. If @a paths is NULL or empty,
* show all revisions regardless of what paths were changed in those
* revisions.
[...]
Reading this, you'd be led as a developer to imagine a fairly
straightforward implementation:
for rev in [start .. end]:
if paths:
for path in paths:
if path_changed_in_rev(path, rev):
send_log(rev)
break
else:
send_log(rev)
In other words, loop over the requested range, transmitting logs for any
revision in the range in which any of the specified paths was modified. If
no paths are specified, send all the revisions in the range. I mean, that's
basically what the docstring says it will do, right?
But that's not what happens, at least when paths are provided. Instead, the
implementation tries to simultaneously walk the histories of each
path_at_MAX(start,end), transmitting logs for each interesting revision within
the requested range and in the union of all such interesting revisions for
those lines of history. These paths may weave around into different
locations (through copy history) over time, even to locations that are in
the original set of paths, but the function just keeps churning out revision
logs.
Most of the time -- where copies/moves aren't present -- this results in
information similar to what you'd expect based on the docstring. But there
are key differences:
1. The API talks exclusively about paths, not "lines of history". Thus, if
multiple lines of history happen to occupy the same path at different times
in the course of the requested revision range, only the youngest of those
will get reported on.
2. Because the implementation is really tied more to lines of history
(which are necessarily precise) than to revisions with path filtering, the
implementation is susceptible to failures due to requests for history
locations at the range extremes that don't exist. (This is the crux of
issue #3830.)
I guess I'm just surprised that we've made it this far without really
noticing this disparity before. We *do* have an old issue #928[2] in which
there's a request for a mode for 'svn log' which behaves in the way you'd
*expect*, based on the docstring, svn_repos_get_logsX() to behave today.
I'm not precisely sure what I'm going to do, or suggest be done, about this
situation. At a minimum, I think the docstring for svn_repos_get_logs4()
should be revised to reveal reality. But beyond that, and specifically
toward trying to honor the requests in the two mentioned issues, I dunno. I
just thought I'd share. :-)
-- C-Mike
[1] http://subversion.tigris.org/issues/show_bug.cgi?id=3830
[2] http://subversion.tigris.org/issues/show_bug.cgi?id=928
--
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2011-03-09 20:51:44 CET