[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Fri, 30 Jul 2010 16:17:51 +0100

On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> I am investigating the following DAV + FSFS server error message which
> we at WANdisco have seen rarely, but in at least two real installations,
> and am sharing it here in case anyone can shed light on it. It happens
> during a DAV 'MERGE':
> Can't open file '[...]/db/transactions/props': No such file or directory
> FSFS is trying to open an invalid path; it should be
> '[...]/db/transactions/<txn-id>/props'. From the source code, I can see
> how this must happen at a low level: the txn-id path component will be
> omitted if and only if something passes a NULL transaction-id string to
> fs_fs.c:get_txn_proplist().
> But why and from where?

A status update:

I've traced the calls to get_txn_proplist(), a few call levels up, and I
believe one of these must have supplied txn_id = NULL.

Callees of get_txn_proplist(), with txn-id parameter identified by name
and by param-list position:

  svn_fs_fs__change_txn_props(txn->id) # DOESN'T CAUSE THE ERROR
    svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
    svn_fs_fs__begin_obliteration_txn(using ...)
  svn_fs_fs__get_txn(txn_id) # CAN CAUSE THE ERROR
          prep_working() # NO, txn_name != NULL
          is_our_resource() # NO, txn name != NULL
            # no callers?
  commit_body(baton->txn->id) # DOESN'T CAUSE THE ERROR
  svn_fs_fs__txn_proplist(,txn->id) # CAN CAUSE THE ERROR

The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
'db/transactions/props'" error when I forced txn_id=NULL at these places
in a trivial test scenario, and the ones marked "DOESN'T", didn't.

>From the fact that a DAV MERGE was the operation under way, I would
hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
going wrong. I haven't tried to trace that back to its caller (in
mod_dav perhaps?).

I don't know if this line of attack has any mileage left in it. I'm
thinking of leaving it at that for now, unless anyone comes up with
suggestions for how to proceed. I don't feel capable of setting up a
test system that would replicate the sequence of DAV messages that
resulted in the error condition.

(The issue of the current assertion leaving a DOS attack possible is
still in my sights - email thread "Re: svn commit: r980046".)

- Julian

> The error might be triggered by the WANdisco software sending commands
> in the wrong sequence, or something like that, and so to that extent it
> may not be interesting to the general community. In one case, detailed
> below, it happened after a DELETE was sent, more than an hour into a
> very long commit. In another case, it happened after a start-commit
> hook failed. The server might be receiving a DAV MERGE request when it
> isn't expecting one. Can anyone comment on that, given this limited
> information?
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]
> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
> The error appears to match the second-last MERGE in the access log. I
> don't have a copy of the body of the MERGE request in this case.
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier. A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
> Is this a case where the client side shouldn't be sending a MERGE at
> this point? Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.
> I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> the attached patch. I determined the comments in the attached patch
> after manually passing NULL instead of the txn-id. However, that won't
> get me far on its own.
> To try to reproduce the error, do you think I should try sending a
> sequence of DAV commands similar to the sequence shown above?
> Any other ideas on what might be happening or how to diagnose this?
> - Julian
Received on 2010-07-30 17:18:37 CEST

This is an archived mail posted to the Subversion Dev mailing list.