I was a little suspicious of the BDB MUTEX asm, so I switched my
BDB over to fcntl-based mutexes.
I had hoped that this would resolve the problem, but it did not.
However, it did (perhaps) shed some new light on things.
In the past, those who have experienced lockups have seen them
mainly in txn_checkpoint() (or with at least one process in txn_checkpoint).
IIRC, They have ALWAYS been seen with muliple processes (either hung
httpds or hung svnserves).
With the fcntl change I have a mostly-reproducible hang in a *single*
svnserve (other than the socket-listening-forker).
Here's the backtrace:
#0 0x420e187e in select () from /lib/i686/libc.so.6
#1 0x400ce09c in __DTOR_END__ () from /home/bking/tools/db4/lib/libdb-4.0.so
#2 0x400b047e in __os_yield () from /home/bking/tools/db4/lib/libdb-4.0.so
#3 0x40057097 in __db_fcntl_mutex_lock ()
from /home/bking/tools/db4/lib/libdb-4.0.so
#4 0x400a6c10 in __log_put_int () from /home/bking/tools/db4/lib/libdb-4.0.so
#5 0x400a6779 in __log_put () from /home/bking/tools/db4/lib/libdb-4.0.so
#6 0x400be788 in __txn_regop_log ()
from /home/bking/tools/db4/lib/libdb-4.0.so
#7 0x400bd5d6 in __txn_commit () from /home/bking/tools/db4/lib/libdb-4.0.so
#8 0x4003b0c7 in commit_trail (trail=0x8080450, fs=0x8073ad8)
at subversion/libsvn_fs/trail.c:100
#9 0x4003b1d2 in svn_fs__retry_txn (fs=0x8073ad8,
txn_body=0x40040834 <txn_body_begin_txn>, baton=0xbffff700, pool=0x807f9a0)
at subversion/libsvn_fs/trail.c:136
#10 0x40040938 in svn_fs_begin_txn (txn_p=0x807fe0c, fs=0x8073ad8, rev=4,
pool=0x807f9a0) at subversion/libsvn_fs/txn.c:134
#11 0x4001cc57 in svn_repos_fs_begin_txn_for_update (txn_p=0x807fe0c,
repos=0x8073888, rev=4, author=0x807fe50 "anonymous", pool=0x807f9a0)
at subversion/libsvn_repos/fs-wrap.c:127
#12 0x4001ff6b in svn_repos_set_path (report_baton=0x807fe08,
path=0x80803d8 "", revision=4, pool=0x8080098)
at subversion/libsvn_repos/reporter.c:173
#13 0x0804a99c in set_path (conn=0x8060b88, pool=0x8080098, params=0x80803b0,
baton=0xbffff840) at subversion/svnserve/serve.c:96
#14 0x40104c2a in svn_ra_svn_handle_commands (conn=0x8060b88, pool=0x807f9a0,
commands=0x804dbb4, baton=0xbffff840, pass_through_errors=0)
at subversion/libsvn_ra_svn/marshal.c:637
#15 0x0804ad07 in handle_report (conn=0x8060b88, pool=0x807f9a0,
repos_url=0x807f928 "svn://localhost/repositories/merge_tests-4"ton=0x807fe08) at subversion/svnserve/serve.c:169
#16 0x0804c99c in status (conn=0x8060b88, pool=0x807f9a0, params=0x807fca0,
baton=0xbffff900) at subversion/svnserve/serve.c:658
#17 0x40104c2a in svn_ra_svn_handle_commands (conn=0x8060b88, pool=0x8060848,
commands=0x804dde4, baton=0xbffff900, pass_through_errors=0)
at subversion/libsvn_ra_svn/marshal.c:637
#18 0x0804d8fa in serve (conn=0x8060b88,
root=0x8060698 "/home/bking/projects/svn/subversion/tests/clients/cmdline", tunnel=0, read_only=0, pool=0x8060848) at subversion/svnserve/serve.c:986
#19 0x0804a86d in main (argc=4, argv=0xbffffb64)
at subversion/svnserve/main.c:201
Note that it is not in txn_checkpoint, but in txn_commit.
Here's another data-point:
$ /usr/sbin/lsof +D /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
lt-svnser 32439 bking mem REG 3,4 8192 277545 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.001
lt-svnser 32439 bking mem REG 3,4 16384 277550 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.005
lt-svnser 32439 bking mem REG 3,4 270336 277546 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.002
lt-svnser 32439 bking mem REG 3,4 327680 277547 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.003
lt-svnser 32439 bking mem REG 3,4 712704 277549 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.004
lt-svnser 32439 bking 5u REG 3,4 8192 277545 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.001
lt-svnser 32439 bking 6u REG 3,4 8192 277697 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/nodes
lt-svnser 32439 bking 7u REG 3,4 8192 277698 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/revisions
lt-svnser 32439 bking 8u REG 3,4 8192 277830 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/transactions
lt-svnser 32439 bking 9u REG 3,4 8192 277831 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/copies
lt-svnser 32439 bking 10u REG 3,4 8192 277832 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/changes
lt-svnser 32439 bking 11u REG 3,4 8192 277833 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/representations
lt-svnser 32439 bking 12u REG 3,4 8192 277834 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/strings
lt-svnser 32439 bking 13u REG 3,4 8192 277835 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/uuids
lt-svnser 32439 bking 14r REG 3,4 460 277536 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/locks/db.lock
Note that ONLY pid 32439 is accessing the repository.
This is surprising (to me, at least); I think it means that there is no
deadlock, and that there must be some sort of mutex leakage/clobbering.
--ben
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Feb 20 18:35:12 2003