Re: ra-test.exe deadlock condition
On 2/5/2017 20:57, Stefan wrote:
> On 2/4/2017 12:41, Stefan Fuhrmann wrote:
>> On 31.01.2017 10:09, Stefan wrote:
>>> I've been looking at the cause of a deadlock when running ra-test.exe
>>> with -fs-type=fsx (trunk version).
>>> The most important findings are summed up here atm .
>>> The issue was discussed with brane and danielsh on IRC (thanks for your
>>> time, once again).
>>> As far as my current understanding of the problem goes: the deadlock is
>>> caused by the fact that the apr_terminate() function registered in
>>> svn_cmdline_init() via the atexit-call is called after the termination
>>> of the threads which were created as part of the calls to
>>> apr_thread_pool_push() in svn_fs_x__batch_fsync_run().
>>> This means that apr's thread counter (thd_cnt) is getting out of sync
>>> (since the apr-function thread_pool_func() is not executed) and then
>>> gets stuck in thread_pool_cleanup() (waiting for the already terminated
>>> threads to be terminated).
>>> To me it looks like svnserve's main-function already contains a
>>> safeguard against a corresponding issue, and calls
>>> apr_thread_pool_destroy(threads) (or was this a completely different
>>> scenario?). This however does not cover the threads created from
>>> Talking to danielsh and brane it became apparent to me that the issue
>>> might not be too obvious (in the end it might still be an issue on how I
>>> build SVN and therefore cause the atexit-registered apr_terminate()
>>> function to be called too late). It's also not fully clear to me at
>>> which exact point (in regards to registerd atexit()-calls) threads of
>>> the process are terminated if the process itself terminates. If indeed
>>> atexit()-registered functions get called after the threads are forcibly
>>> terminates (which to me it looks like it does atm) it might contradict
>>> the C(89/99) standard - see 18.104.22.168/22.214.171.124. On the other side this
>>> thread on stackoverflow  suggests it's simply undefined (by the
>>> standard) what comes first.
>>> As danielsh suggested, I'm planning to come up with a plain minimal
>>> repro app only based on APR demonstrating the problem, so to make it
>>> more obvious (and double check for myself) what the issue is about.
>>>  http://www.luke1410.de:8090/browse/MAXSVN-94
>>>  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
>> Hi Stefan,
>> I had a look at the code and found a possibly related problem.
>> If you are using DLLs, this might have affected you.
>> It would be nice if you could try r1781657 and see whether it
>> makes any difference in your case.
>> -- Stefan^2.
> Hi Stefan^2,
> I tested trunk r1781790 which also includes your follow-up commit
> (r1781726). With that one the ra-test.exe test which previously
> deadlocked passes now. However, test 60 (basic_test.py) deadlocks now
> (svnmucc.exe seems to be the process which is being tested here).
> I'm planning to details of the underlying issue which I think has now
> been traced down to the actual root-cause in a blog post most likely
> tomorrow. That should explain the actual issue in full detail then.
Details are published now here: http://www.luke1410.de/blog/?p=95
Received on 2017-02-08 23:23:34 CET
This is an archived mail posted to the Subversion Dev