Re: svn commit: r1411982 - in /subversion/branches/1.6.x: ./ STATUS subversion/libsvn_client/commit_util.c subversion/libsvn_fs_fs/fs_fs.c

From: Ben Reser <ben_at_reser.org>
Date: Wed, 21 Nov 2012 17:52:18 -0800

First of all thanks for asking these questions, my effort to answer
them gave me even more confidence in the changes.

On Wed, Nov 21, 2012 at 1:36 AM, Bert Huijben <bert_at_qqmail.nl> wrote:
> What would be the effect of *not* patching the client?

When I first wrote the patch and tried to apply it to 1.6 I was
getting SIGABRT's from some tests that exercised the error handling
code that I later patched in commit_util.c. For example
basic_tests.py 8: basic corruption detection.

In producing the response to this email I removed the client library
change since I'd forgotten exactly which test was failing, expecting
it to fail and refresh my memory. I was surprised to find it did not
fail.

The command I used to remove the client side changes was:
svn merge -c-1404736 ^/subversion/branches/1.6.x-rep_write_cleanup_at_1404736

I went back to the original patch I'd been using before I'd developed
the client changes and was once again able to see the problem.

When I produced the failure under gdb I saw that the unlock call in
rep_write_cleanup was returning an error. This surprised me, since I
didn't really expect there to be a problem unlocking the transaction
ever. What was happening is that the iterpool was not cleaned up so
that the rep_write_cleanup was not called until after the client lib
called the abort_edit function on the editor it was driving, which
calls svn_fs_abort_txn and ultimately deletes the transaction.

So I resolved the problem by fixing the fact that we weren't cleaning
up our iterpool, which is something that was resolved in 1.7 when the
wc code was rewritten. The error stopped being thrown and my SIGABRT
went away.

However, the real reason I was getting SIGABRT was that I was passing
err->apr_err without copying it. Ultimately, the error gets passed up
the chain in APR until ultimately it gets passed outside of the pool
destruction call that free'd the memory it was in.

Daniel's review caught this error and I resolved it before the 1.6
patch was made. However, I still thought the client change was
necessary so I still included it.

So now to answer your question...

The effect of not including this change now is leaving unnecessary
data in memory in the case of an error while committing. If you're
calling svn_client_commit from a long lived pool and you're hitting
those error paths you have a memory leak.

My original purpose for adding the change is essentially moot. Since
the client always abort_edit if it got an error during the edit drive
which will remove the transaction entry from memory (including the
being_written) flag and the protorev file from disk.

> What impact does this have on other users of the fs/repos apis?

None.

Other users delete the entire transaction if something fails along the
way of building it. mod_dav_svn is unique in that the communications
of individual pieces of the commit can be retried (though we don't do
that with our client implementation).

So with the new code either they destroy/cleanup the pool triggering
the new rep_write_cleanup() call which wipes out the failed
representation and removes the lock on the protorevfile. Or they do
not and end up destroying the transaction themselves (probably with an
abort_edit call) and then the rep_write_cleanup will return an error
which goes nowhere since the error occurs in the cleanup/destroy call
of a pool which returns a void.

> In other words: Is this a breaking change that we shouldn't port back to 1.6 clients and try to patch server only?

No I don't think this change breaks anything for anyone.

We could remove the client change now as it really doesn't have any
impact. But it's a valid change for other reasons as mentioned above,
so we might as well leave it.

Even without the client library change you're still touching the
client. The solution I'm using here is a fix to the fsfs layer.
Which when using ra_local the client is the server. Moving the fix up
to mod_dav_svn is much harder.

Ultimately the problem here is that the fs is receiving the
representation of a node via a window handler. The window handler
only has two things you can tell it. 1) Here is some data. 2) There
is no more data. As such there is no way for mod_dav_svn to
communicate to the fs layer that the representation I sent you is
incomplete. Additionally, the whole thing is driven through the
svndiff parsing code, which has an option (error_on_early_close) to
say error if the svndiff looks incomplete. If this option is true
then the window handler is never called with NULL for the window
(we've reached the end of the data). Setting this flag to FALSE would
make that call happen but then fsfs would have no idea that we feed it
incomplete data, thus adding an invalid representation to the protorev
file.

Long term the fix is probably refactoring some public interfaces. But
that's not a very attractive option for backport. Nor am I in a huge
hurry to do it since I think making these changes is going to cascade
out into a lot of other code.
Received on 2012-11-22 02:52:55 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]