I have opened issue #3696,
<http://subversion.tigris.org/issues/show_bug.cgi?id=3696>, to track
this.
- Julian
On Fri, 2010-07-30 at 16:17 +0100, Julian Foad wrote:
> On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> > I am investigating the following DAV + FSFS server error message which
> > we at WANdisco have seen rarely, but in at least two real installations,
> > and am sharing it here in case anyone can shed light on it. It happens
> > during a DAV 'MERGE':
> >
> > Can't open file '[...]/db/transactions/props': No such file or directory
> >
> > FSFS is trying to open an invalid path; it should be
> > '[...]/db/transactions/<txn-id>/props'. From the source code, I can see
> > how this must happen at a low level: the txn-id path component will be
> > omitted if and only if something passes a NULL transaction-id string to
> > fs_fs.c:get_txn_proplist().
> >
> > But why and from where?
>
> A status update:
>
> I've traced the calls to get_txn_proplist(), a few call levels up, and I
> believe one of these must have supplied txn_id = NULL.
>
> Callees of get_txn_proplist(), with txn-id parameter identified by name
> and by param-list position:
>
> get_txn_proplist(,,txn_id)
> svn_fs_fs__change_txn_props(txn->id) # DOESN'T CAUSE THE ERROR
> txn_vtable.change_props(txn->id)
> svn_fs_fs__change_txn_prop(txn->id)
> commit_body()
> svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
> svn_fs_fs__begin_obliteration_txn(using ...)
> svn_fs_fs__get_txn(txn_id) # CAN CAUSE THE ERROR
> svn_fs_fs__open_txn(,,name)
> svn_fs_t.vtable.open_txn(,,name)
> svn_fs_open_txn(,,name)
> dav_svn__abort_txn(,txn_name)
> prep_working() # NO, txn_name != NULL
> prep_private(comb->priv.root.txn_name)
> DAV_RESOURCE_TYPE_PRIVATE
> is_our_resource() # NO, txn name != NULL
> open_txn(,,txn_name)
> dav_svn__checkout()
> merge()
> dav_svn__hooks_vsn.merge
> fs_merge(,,,target_root->txn)
> svn_fs_root_t.vtable.merge(,,,target_root->txn)
> svn_fs_merge(,,target_root->txn)
> # no callers?
> svn_fs_fs__get_txn_ids(,,,txn_name)
> svn_fs_fs__dag_txn_root(,,txn_id)
> root_node<-open_path<-get_dag(,root->txn)
> svn_fs_fs__dag_txn_base_root(,,txn_id)
> merge_changes(,,txn->id)
> svn_fs_fs__commit_txn(,,txn->id)
> svn_fs_txn_t.vtable.commit(,,txn->id)
> svn_fs_commit_txn(,,txn->id)
> fs_merge()
> svn_fs_fs__dag_clone_root(,,txn_id)
> mutable_root_node(,root->txn)
> make_path_mutable(root->txn)
> commit_body(baton->txn->id) # DOESN'T CAUSE THE ERROR
> svn_fs_fs__commit(,,txn->id)
> svn_fs_fs__commit_txn(,,txn->id)
> svn_fs_fs__txn_proplist(,txn->id) # CAN CAUSE THE ERROR
> svn_fs_txn_t.vtable.get_proplist(,txn->id)
> svn_fs_txn_proplist(,,txn->id)
> commit_body()
> svn_fs_fs__txn_prop(,txn->id)
> svn_fs_txn_t.vtable.get_prop(,txn->id)
> svn_fs_txn_prop(,txn->id)
> svn_fs_fs__txn_root(,txn->id)
> svn_fs_txn_t.vtable.root(,txn->id)
> svn_fs_txn_root(,txn->id)
>
> The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
> 'db/transactions/props'" error when I forced txn_id=NULL at these places
> in a trivial test scenario, and the ones marked "DOESN'T", didn't.
>
> >From the fact that a DAV MERGE was the operation under way, I would
> hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
> going wrong. I haven't tried to trace that back to its caller (in
> mod_dav perhaps?).
>
> I don't know if this line of attack has any mileage left in it. I'm
> thinking of leaving it at that for now, unless anyone comes up with
> suggestions for how to proceed. I don't feel capable of setting up a
> test system that would replicate the sequence of DAV messages that
> resulted in the error condition.
>
> (The issue of the current assertion leaving a DOS attack possible is
> still in my sights - email thread "Re: svn commit: r980046".)
>
> - Julian
>
>
> > The error might be triggered by the WANdisco software sending commands
> > in the wrong sequence, or something like that, and so to that extent it
> > may not be interesting to the general community. In one case, detailed
> > below, it happened after a DELETE was sent, more than an hour into a
> > very long commit. In another case, it happened after a start-commit
> > hook failed. The server might be receiving a DAV MERGE request when it
> > isn't expecting one. Can anyone comment on that, given this limited
> > information?
> >
> > Here are some Apache access log entries from one occurrence, with paths
> > anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> > neon/0.28.3") omitted for brevity:
> > [[[
> > 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> > "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> > "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > ]]]
> >
> > and the corresponding error log:
> > [[[
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > There was a problem opening the transaction specified by this activity. [500, #2]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> > ]]]
> >
> > The error appears to match the second-last MERGE in the access log. I
> > don't have a copy of the body of the MERGE request in this case.
> >
> > The error refers to an activity Id that was the subject of the DELETE
> > that was logged 18 minutes earlier. A comment from one of our guys
> > looking at the logs: "We are seeing 22,600 changes for Activity A and no
> > merge at the end (just a DELETE). Then we see about 931 commands for
> > Activity B and a MERGE that fails for Activity A."
> >
> > Is this a case where the client side shouldn't be sending a MERGE at
> > this point? Even if that is the primary problem, I'm still interested
> > in Subversion's response to this MERGE request.
> >
> > I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> > the attached patch. I determined the comments in the attached patch
> > after manually passing NULL instead of the txn-id. However, that won't
> > get me far on its own.
> >
> > To try to reproduce the error, do you think I should try sending a
> > sequence of DAV commands similar to the sequence shown above?
> >
> > Any other ideas on what might be happening or how to diagnose this?
> >
> > - Julian
> >
>
>
Received on 2010-08-31 13:44:03 CEST