[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

ra_replay: Uncovering a latent design failure

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Fri, 11 Jan 2013 12:15:45 -0500

While trying to minimize our 1.8 issue count, I found myself looking issues
#4287 (http://subversion.tigris.org/issues/show_bug.cgi?id=4287) and #4100
(http://subversion.tigris.org/issues/show_bug.cgi?id=4100). Both of these
involve problems using svnrdump to dump a revision range of a URL which no
longer exists in HEAD of the repository.

Along the way, I uncovered something that I guess has just gone unnoticed
for the past ... well, quite a few years.

The svn_ra_replay() API -- which is the core of the 'svnrdump dump' and
'svnsync sync' functionalities -- is documented like so:

 /**
  * Replay the changes from @a revision through @a editor and @a edit_baton.
  *
  * Changes will be limited to those that occur under @a session's URL, and
  * the server will assume that the client has no knowledge of revisions
  * prior to @a low_water_mark. These two limiting factors define the portion
  [...]

Understand, of course, that given a PATH and a revision range, there are two
different ways to do path-and-revision-based filtering:

1. Dump changes related to each revision in the range, filtering out those
which affect paths not equal to or descendants of PATH.

2. Crawl the history of PATH between the revisions in the range, dumping
related changes.

'svn log' uses method #2. The primary operand for 'svn log' is the line of
history of some versioned object. This is why you can run 'svn log' on a
branch, and see the changes as they follow the branch's copy from the trunk,
etc.

Not so for replay -- it's clearly a method-#1 type of operation, where the
primary operands are revisions, with a path being used merely as a filter.

Unfortunately, our HTTP RA layers are using a method-#2 type of addressing
scheme for replay. Both ra_neon and ra_serf issue "replay-report" REPORT
requests against the public session URL. For example, if the API is used
with a session URL of "http://svn.apache.org/repos/asf/subversion/trunk",
then both ra_neon and ra_serf issue REPORT requests against that URL.

That works most of the time, but what if the path has been deleted from
HEAD? Well, today that's when both ra_neon and ra_serf get 404's back from
the server, and can't deal.

See, the correct approach when doing a method-#1 operation is to issue the
REPORT request against the "me resource URL" (or, pre-httpv2, the "default
VCC URL") and then to embed the path filter in the request body itself.
That URL always exists, and is a generic way of addressing "the repository".
 mod_dav_svn would still apply the same path filtering, only it would find
the FS-PATH on which to filter in the request body, not tacked onto the end
of the request URL itself. ra_neon and ra_serf should be issuing the REPORT
against "http://svn.apache.org/repos/asf/!svn/me" and dropping a
"filter-path=\"subversion/trunk\"" XML attribute into that request body.

So. Looks like I'll be taking a little detour here to try to fix this in a
compatibility-preserving way. Here's my plan:

Server-side: Recognize which resource was hit with the request, looking for
the (optional) path filter in the REPORT request body if the "me resource"
or "default VCC" were used. Advertise the new support so clients can
construct the best REPORTs possible.

Client-side: If the server supports the correct constructs, drop the path
filter into the REPORT body and issue the request against the "me resource"
or "default VCC" (if non-httpv2). Otherwise, just keep on doing what we do
today and wish for the best.

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development

Received on 2013-01-11 18:16:22 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.