I am investigating the following DAV + FSFS server error message which
we at WANdisco have seen rarely, but in at least two real installations,
and am sharing it here in case anyone can shed light on it. It happens
during a DAV 'MERGE':
Can't open file '[...]/db/transactions/props': No such file or directory
FSFS is trying to open an invalid path; it should be
'[...]/db/transactions/<txn-id>/props'. From the source code, I can see
how this must happen at a low level: the txn-id path component will be
omitted if and only if something passes a NULL transaction-id string to
fs_fs.c:get_txn_proplist().
But why and from where?
The error might be triggered by the WANdisco software sending commands
in the wrong sequence, or something like that, and so to that extent it
may not be interesting to the general community. In one case, detailed
below, it happened after a DELETE was sent, more than an hour into a
very long commit. In another case, it happened after a start-commit
hook failed. The server might be receiving a DAV MERGE request when it
isn't expecting one. Can anyone comment on that, given this limited
information?
Here are some Apache access log entries from one occurrence, with paths
anonymized and the client Id string (always "SVN/1.6.6 (r40053)
neon/0.28.3") omitted for brevity:
[[[
127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
"MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
"MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
"DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
]]]
and the corresponding error log:
[[[
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
There was a problem opening the transaction specified by this activity. [500, #2]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
]]]
The error appears to match the second-last MERGE in the access log. I
don't have a copy of the body of the MERGE request in this case.
The error refers to an activity Id that was the subject of the DELETE
that was logged 18 minutes earlier. A comment from one of our guys
looking at the logs: "We are seeing 22,600 changes for Activity A and no
merge at the end (just a DELETE). Then we see about 931 commands for
Activity B and a MERGE that fails for Activity A."
Is this a case where the client side shouldn't be sending a MERGE at
this point? Even if that is the primary problem, I'm still interested
in Subversion's response to this MERGE request.
I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
the attached patch. I determined the comments in the attached patch
after manually passing NULL instead of the txn-id. However, that won't
get me far on its own.
To try to reproduce the error, do you think I should try sending a
sequence of DAV commands similar to the sequence shown above?
Any other ideas on what might be happening or how to diagnose this?
- Julian
Received on 2010-06-22 14:57:48 CEST