Ping? Just wondering if anybody else has seen this problem, or looked
at Julian's patch.
-Hyrum
On Tue, Jun 22, 2010 at 1:57 PM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> I am investigating the following DAV + FSFS server error message which
> we at WANdisco have seen rarely, but in at least two real installations,
> and am sharing it here in case anyone can shed light on it. It happens
> during a DAV 'MERGE':
>
> Can't open file '[...]/db/transactions/props': No such file or directory
>
> FSFS is trying to open an invalid path; it should be
> '[...]/db/transactions/<txn-id>/props'. From the source code, I can see
> how this must happen at a low level: the txn-id path component will be
> omitted if and only if something passes a NULL transaction-id string to
> fs_fs.c:get_txn_proplist().
>
> But why and from where?
>
> The error might be triggered by the WANdisco software sending commands
> in the wrong sequence, or something like that, and so to that extent it
> may not be interesting to the general community. In one case, detailed
> below, it happened after a DELETE was sent, more than an hour into a
> very long commit. In another case, it happened after a start-commit
> hook failed. The server might be receiving a DAV MERGE request when it
> isn't expecting one. Can anyone comment on that, given this limited
> information?
>
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]
>
> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
>
> The error appears to match the second-last MERGE in the access log. I
> don't have a copy of the body of the MERGE request in this case.
>
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier. A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
>
> Is this a case where the client side shouldn't be sending a MERGE at
> this point? Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.
>
> I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> the attached patch. I determined the comments in the attached patch
> after manually passing NULL instead of the txn-id. However, that won't
> get me far on its own.
>
> To try to reproduce the error, do you think I should try sending a
> sequence of DAV commands similar to the sequence shown above?
>
> Any other ideas on what might be happening or how to diagnose this?
>
> - Julian
>
>
Received on 2010-07-15 16:19:33 CEST