Hi,
I've been looking at the cause of a deadlock when running ra-test.exe
with -fs-type=fsx (trunk version).
The most important findings are summed up here atm [1].
The issue was discussed with brane and danielsh on IRC (thanks for your
time, once again).
As far as my current understanding of the problem goes: the deadlock is
caused by the fact that the apr_terminate() function registered in
svn_cmdline_init() via the atexit-call is called after the termination
of the threads which were created as part of the calls to
apr_thread_pool_push() in svn_fs_x__batch_fsync_run().
This means that apr's thread counter (thd_cnt) is getting out of sync
(since the apr-function thread_pool_func() is not executed) and then
gets stuck in thread_pool_cleanup() (waiting for the already terminated
threads to be terminated).
To me it looks like svnserve's main-function already contains a
safeguard against a corresponding issue, and calls
apr_thread_pool_destroy(threads) (or was this a completely different
scenario?). This however does not cover the threads created from
svn_fs_x__batch_fsync_run().
Talking to danielsh and brane it became apparent to me that the issue
might not be too obvious (in the end it might still be an issue on how I
build SVN and therefore cause the atexit-registered apr_terminate()
function to be called too late). It's also not fully clear to me at
which exact point (in regards to registerd atexit()-calls) threads of
the process are terminated if the process itself terminates. If indeed
atexit()-registered functions get called after the threads are forcibly
terminates (which to me it looks like it does atm) it might contradict
the C(89/99) standard - see[2] 7.20.4.2/7.20.4.3. On the other side this
thread on stackoverflow [3] suggests it's simply undefined (by the
standard) what comes first.
As danielsh suggested, I'm planning to come up with a plain minimal
repro app only based on APR demonstrating the problem, so to make it
more obvious (and double check for myself) what the issue is about.
Regards,
Stefan
[1] http://www.luke1410.de:8090/browse/MAXSVN-94
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
[3]
https://stackoverflow.com/questions/39655868/what-does-the-posix-standard-say-about-thread-stacks-in-atexit-handlers-what
Received on 2017-01-31 10:10:00 CET