The multi-threaded RPC svn server we've been working on is getting core dumps
using the BDB backend. I think our server is different than any of the other
svn servers (mod_dav_svn, svnserve) in that we cache opened svn_fs_t,
svn_fs_txn_t and other objects between RPC calls to save IO. The server can
serve a second RPC request with a different thread that created the svn_fs_t and
svn_fs_txn_t's, which is leading to the problem
Using svn 1.5.1 with apr 1.2.2 we can get the following stack trace:
Program terminated with signal 11, Segmentation fault.
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
752 while (pool->child)
(gdb) bt
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
#1 0x00002aaaacd2fe7b in apr_pool_destroy (pool=0x4bec82)
at memory/unix/apr_pools.c:753
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
at subversion/libsvn_fs_base/bdb/bdb-err.c:93
#3 0x00002aaaabf5fcca in open_databases (fs=0xc352830, create=0, format=1,
path=<value optimized out>, pool=<value optimized out>)
at subversion/libsvn_fs_base/fs.c:552
#4 0x00002aaaabf602aa in base_open (fs=0xc352830,
path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
pool=0xccb9f88, common_pool=0xc068808)
at subversion/libsvn_fs_base/fs.c:731
#5 0x00002aaaabb29eb2 in svn_fs_open (fs_p=0xccb9ff8,
path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
fs_config=0x0, pool=0xccb9f88) at subversion/libsvn_fs/fs-loader.c:396
#6 0x00002aaaab916187 in get_repos (repos_p=0x50417ee8,
path=<value optimized out>, exclusive=0, nonblocking=0, open_fs=1,
pool=0xccb9f88) at subversion/libsvn_repos/repos.c:1322
#7 0x000000000056a17c in SvnUtil::get_svn_repos_and_fs (
repos_name=@0xc5106d0, context=@0xc5106a8, pool=@0xc5106c8)
at SvnUtil.cpp:131
In frame #2, we see these garbage values:
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
at subversion/libsvn_fs_base/bdb/bdb-err.c:93
93 svn_error_clear(bfd->bdb->error_info->pending_errors);
(gdb) p bfd->bdb->error_info
$1 = (bdb_error_info_t *) 0xc28fca0
(gdb) p *bfd->bdb->error_info
$2 = {pending_errors = 0x5ace50, user_callback = 0x100000001,
refcount = 204658481}
The following patch appears to fix the issues.
There also appears to be some reference counting bugs in the bdb_error_info_t
refcount field that I believe should be removed that will fix a memory leak, but
that's separate from this issue.
[[[
In a multi-threaded environment using BDB, if one thread opens a repository,
saves the svn_fs_t, completes its work, and a second thread starts working with
the svn_fs_t, then the underlying bdb_env_baton_t's error_info points to the
first thread's bdb_error_info_t's. Additionally, if the second thread then
closes the BDB with svn_fs_bdb__close(), it'll free the error_info field that
the first thread is expecting to be valid.
Remove the error_info field from the bdb_env_baton_t and introduce a new
function for accessing it to ensure that the thread calling using the
bdb_error_info_t always works with its own copy.
* subversion/libsvn_fs_base/bdb/env.h
(bdb_env_baton_t):
Remove the error_info field.
(svn_fs_bdb__get_error_info):
New function to get a bdb_error_info_t from a bdb_env_baton_t.
* subversion/libsvn_fs_base/bdb/env.c
(svn_fs_bdb__get_error_info):
New function that wraps get_error_info() but takes a bdb_env_baton_t
instead of a bdb_env_t.
(convert_bdb_error):
No longer set the error_info field in the bdb_env_baton_t.
(svn_fs_bdb__close):
Use get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
* subversion/libsvn_fs_base/bdb/bdb-err.h
(SVN_BDB_ERR):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
* subversion/libsvn_fs_base/bdb/bdb-err.c
(svn_fs_bdb__dberr),
(svn_fs_bdb__dberrf),
(svn_fs_bdb__wrap_db):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
(create_env):
No longer set the error_info field in the bdb_error_info_t.
* subversion/libsvn_fs_base/fs.c
(base_bdb_set_errcall):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
Patch by: Michael Zhang <mzhang_at_imageworks.com>
Reviewed by: blair
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=978637
The multi-threaded RPC svn server we've been working on is getting core dumps
using the BDB backend. I think our server is different than any of the other
svn servers (mod_dav_svn, svnserve) in that we cache opened svn_fs_t,
svn_fs_txn_t and other objects between RPC calls to save IO. The server can
serve a second RPC request with a different thread that created the svn_fs_t and
svn_fs_txn_t's, which is leading to the problem
Using svn 1.5.1 with apr 1.2.2 we can get the following stack trace:
Program terminated with signal 11, Segmentation fault.
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
752 while (pool->child)
(gdb) bt
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
#1 0x00002aaaacd2fe7b in apr_pool_destroy (pool=0x4bec82)
at memory/unix/apr_pools.c:753
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
at subversion/libsvn_fs_base/bdb/bdb-err.c:93
#3 0x00002aaaabf5fcca in open_databases (fs=0xc352830, create=0, format=1,
path=<value optimized out>, pool=<value optimized out>)
at subversion/libsvn_fs_base/fs.c:552
#4 0x00002aaaabf602aa in base_open (fs=0xc352830,
path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
pool=0xccb9f88, common_pool=0xc068808)
at subversion/libsvn_fs_base/fs.c:731
#5 0x00002aaaabb29eb2 in svn_fs_open (fs_p=0xccb9ff8,
path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
fs_config=0x0, pool=0xccb9f88) at subversion/libsvn_fs/fs-loader.c:396
#6 0x00002aaaab916187 in get_repos (repos_p=0x50417ee8,
path=<value optimized out>, exclusive=0, nonblocking=0, open_fs=1,
pool=0xccb9f88) at subversion/libsvn_repos/repos.c:1322
#7 0x000000000056a17c in SvnUtil::get_svn_repos_and_fs (
repos_name=@0xc5106d0, context=@0xc5106a8, pool=@0xc5106c8)
at SvnUtil.cpp:131
In frame #2, we see these garbage values:
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
at subversion/libsvn_fs_base/bdb/bdb-err.c:93
93 svn_error_clear(bfd->bdb->error_info->pending_errors);
(gdb) p bfd->bdb->error_info
$1 = (bdb_error_info_t *) 0xc28fca0
(gdb) p *bfd->bdb->error_info
$2 = {pending_errors = 0x5ace50, user_callback = 0x100000001,
refcount = 204658481}
The following patch appears to fix the issues.
There also appears to be some reference counting bugs in the bdb_error_info_t
refcount field that I believe should be removed that will fix a memory leak, but
that's separate from this issue.
[[[
In a multi-threaded environment using BDB, if one thread opens a repository,
saves the svn_fs_t, completes its work, and a second thread starts working with
the svn_fs_t, then the underlying bdb_env_baton_t's error_info points to the
first thread's bdb_error_info_t's. Additionally, if the second thread then
closes the BDB with svn_fs_bdb__close(), it'll free the error_info field that
the first thread is expecting to be valid.
Remove the error_info field from the bdb_env_baton_t and introduce a new
function for accessing it to ensure that the thread calling using the
bdb_error_info_t always works with its own copy.
* subversion/libsvn_fs_base/bdb/env.h
(bdb_env_baton_t):
Remove the error_info field.
(svn_fs_bdb__get_error_info):
New function to get a bdb_error_info_t from a bdb_env_baton_t.
* subversion/libsvn_fs_base/bdb/env.c
(svn_fs_bdb__get_error_info):
New function that wraps get_error_info() but takes a bdb_env_baton_t
instead of a bdb_env_t.
(convert_bdb_error):
No longer set the error_info field in the bdb_env_baton_t.
(svn_fs_bdb__close):
Use get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
* subversion/libsvn_fs_base/bdb/bdb-err.h
(SVN_BDB_ERR):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
* subversion/libsvn_fs_base/bdb/bdb-err.c
(svn_fs_bdb__dberr),
(svn_fs_bdb__dberrf),
(svn_fs_bdb__wrap_db):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
(create_env):
No longer set the error_info field in the bdb_error_info_t.
* subversion/libsvn_fs_base/fs.c
(base_bdb_set_errcall):
Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
getting it directly from the structure.
Patch by: Michael Zhang <mzhang_at_imageworks.com>
Reviewed by: blair
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=978637
Received on 2008-12-03 08:54:17 CET