[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] BDB error_info threading issue causing core dumps

From: Blair Zajac <blair_at_orcaware.com>
Date: Tue, 02 Dec 2008 18:05:03 -0800

The multi-threaded RPC svn server we've been working on is getting core dumps
using the BDB backend. I think our server is different than any of the other
svn servers (mod_dav_svn, svnserve) in that we cache opened svn_fs_t,
svn_fs_txn_t and other objects between RPC calls to save IO. The server can
serve a second RPC request with a different thread that created the svn_fs_t and
svn_fs_txn_t's, which is leading to the problem

Using svn 1.5.1 with apr 1.2.2 we can get the following stack trace:

Program terminated with signal 11, Segmentation fault.
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
752 while (pool->child)
(gdb) bt
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
#1 0x00002aaaacd2fe7b in apr_pool_destroy (pool=0x4bec82)
     at memory/unix/apr_pools.c:753
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
     operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
     at subversion/libsvn_fs_base/bdb/bdb-err.c:93
#3 0x00002aaaabf5fcca in open_databases (fs=0xc352830, create=0, format=1,
     path=<value optimized out>, pool=<value optimized out>)
     at subversion/libsvn_fs_base/fs.c:552
#4 0x00002aaaabf602aa in base_open (fs=0xc352830,
     path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
     pool=0xccb9f88, common_pool=0xc068808)
     at subversion/libsvn_fs_base/fs.c:731
#5 0x00002aaaabb29eb2 in svn_fs_open (fs_p=0xccb9ff8,
     path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
     fs_config=0x0, pool=0xccb9f88) at subversion/libsvn_fs/fs-loader.c:396
#6 0x00002aaaab916187 in get_repos (repos_p=0x50417ee8,
     path=<value optimized out>, exclusive=0, nonblocking=0, open_fs=1,
     pool=0xccb9f88) at subversion/libsvn_repos/repos.c:1322
#7 0x000000000056a17c in SvnUtil::get_svn_repos_and_fs (
     repos_name=@0xc5106d0, context=@0xc5106a8, pool=@0xc5106c8)
     at SvnUtil.cpp:131

In frame #2, we see these garbage values:

#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
     operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
     at subversion/libsvn_fs_base/bdb/bdb-err.c:93
93 svn_error_clear(bfd->bdb->error_info->pending_errors);
(gdb) p bfd->bdb->error_info
$1 = (bdb_error_info_t *) 0xc28fca0
(gdb) p *bfd->bdb->error_info
$2 = {pending_errors = 0x5ace50, user_callback = 0x100000001,
   refcount = 204658481}

The following patch appears to fix the issues.

There also appears to be some reference counting bugs in the bdb_error_info_t
refcount field that I believe should be removed that will fix a memory leak, but
that's separate from this issue.

[[[
In a multi-threaded environment using BDB, if one thread opens a repository,
saves the svn_fs_t, completes its work, and a second thread starts working with
the svn_fs_t, then the underlying bdb_env_baton_t's error_info points to the
first thread's bdb_error_info_t's. Additionally, if the second thread then
closes the BDB with svn_fs_bdb__close(), it'll free the error_info field that
the first thread is expecting to be valid.

Remove the error_info field from the bdb_env_baton_t and introduce a new
function for accessing it to ensure that the thread calling using the
bdb_error_info_t always works with its own copy.

* subversion/libsvn_fs_base/bdb/env.h
   (bdb_env_baton_t):
     Remove the error_info field.
   (svn_fs_bdb__get_error_info):
     New function to get a bdb_error_info_t from a bdb_env_baton_t.

* subversion/libsvn_fs_base/bdb/env.c
   (svn_fs_bdb__get_error_info):
     New function that wraps get_error_info() but takes a bdb_env_baton_t
     instead of a bdb_env_t.
   (convert_bdb_error):
     No longer set the error_info field in the bdb_env_baton_t.
   (svn_fs_bdb__close):
     Use get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

* subversion/libsvn_fs_base/bdb/bdb-err.h
   (SVN_BDB_ERR):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

* subversion/libsvn_fs_base/bdb/bdb-err.c
   (svn_fs_bdb__dberr),
   (svn_fs_bdb__dberrf),
   (svn_fs_bdb__wrap_db):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.
   (create_env):
     No longer set the error_info field in the bdb_error_info_t.

* subversion/libsvn_fs_base/fs.c
   (base_bdb_set_errcall):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

Patch by: Michael Zhang <mzhang_at_imageworks.com>
Reviewed by: blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=978637

The multi-threaded RPC svn server we've been working on is getting core dumps
using the BDB backend. I think our server is different than any of the other
svn servers (mod_dav_svn, svnserve) in that we cache opened svn_fs_t,
svn_fs_txn_t and other objects between RPC calls to save IO. The server can
serve a second RPC request with a different thread that created the svn_fs_t and
svn_fs_txn_t's, which is leading to the problem

Using svn 1.5.1 with apr 1.2.2 we can get the following stack trace:

Program terminated with signal 11, Segmentation fault.
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
752 while (pool->child)
(gdb) bt
#0 apr_pool_destroy (pool=0xf87d8348f87d8948) at memory/unix/apr_pools.c:752
#1 0x00002aaaacd2fe7b in apr_pool_destroy (pool=0x4bec82)
     at memory/unix/apr_pools.c:753
#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
     operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
     at subversion/libsvn_fs_base/bdb/bdb-err.c:93
#3 0x00002aaaabf5fcca in open_databases (fs=0xc352830, create=0, format=1,
     path=<value optimized out>, pool=<value optimized out>)
     at subversion/libsvn_fs_base/fs.c:552
#4 0x00002aaaabf602aa in base_open (fs=0xc352830,
     path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
     pool=0xccb9f88, common_pool=0xc068808)
     at subversion/libsvn_fs_base/fs.c:731
#5 0x00002aaaabb29eb2 in svn_fs_open (fs_p=0xccb9ff8,
     path=0xccba078 "/opt/vnp3-production-db/repositories/pipe.lax/db",
     fs_config=0x0, pool=0xccb9f88) at subversion/libsvn_fs/fs-loader.c:396
#6 0x00002aaaab916187 in get_repos (repos_p=0x50417ee8,
     path=<value optimized out>, exclusive=0, nonblocking=0, open_fs=1,
     pool=0xccb9f88) at subversion/libsvn_repos/repos.c:1322
#7 0x000000000056a17c in SvnUtil::get_svn_repos_and_fs (
     repos_name=@0xc5106d0, context=@0xc5106a8, pool=@0xc5106c8)
     at SvnUtil.cpp:131

In frame #2, we see these garbage values:

#2 0x00002aaaabf55def in svn_fs_bdb__wrap_db (fs=<value optimized out>,
     operation=0x2aaaabf71876 "opening 'nodes' table", db_err=0)
     at subversion/libsvn_fs_base/bdb/bdb-err.c:93
93 svn_error_clear(bfd->bdb->error_info->pending_errors);
(gdb) p bfd->bdb->error_info
$1 = (bdb_error_info_t *) 0xc28fca0
(gdb) p *bfd->bdb->error_info
$2 = {pending_errors = 0x5ace50, user_callback = 0x100000001,
   refcount = 204658481}

The following patch appears to fix the issues.

There also appears to be some reference counting bugs in the bdb_error_info_t
refcount field that I believe should be removed that will fix a memory leak, but
that's separate from this issue.

[[[
In a multi-threaded environment using BDB, if one thread opens a repository,
saves the svn_fs_t, completes its work, and a second thread starts working with
the svn_fs_t, then the underlying bdb_env_baton_t's error_info points to the
first thread's bdb_error_info_t's. Additionally, if the second thread then
closes the BDB with svn_fs_bdb__close(), it'll free the error_info field that
the first thread is expecting to be valid.

Remove the error_info field from the bdb_env_baton_t and introduce a new
function for accessing it to ensure that the thread calling using the
bdb_error_info_t always works with its own copy.

* subversion/libsvn_fs_base/bdb/env.h
   (bdb_env_baton_t):
     Remove the error_info field.
   (svn_fs_bdb__get_error_info):
     New function to get a bdb_error_info_t from a bdb_env_baton_t.

* subversion/libsvn_fs_base/bdb/env.c
   (svn_fs_bdb__get_error_info):
     New function that wraps get_error_info() but takes a bdb_env_baton_t
     instead of a bdb_env_t.
   (convert_bdb_error):
     No longer set the error_info field in the bdb_env_baton_t.
   (svn_fs_bdb__close):
     Use get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

* subversion/libsvn_fs_base/bdb/bdb-err.h
   (SVN_BDB_ERR):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

* subversion/libsvn_fs_base/bdb/bdb-err.c
   (svn_fs_bdb__dberr),
   (svn_fs_bdb__dberrf),
   (svn_fs_bdb__wrap_db):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.
   (create_env):
     No longer set the error_info field in the bdb_error_info_t.

* subversion/libsvn_fs_base/fs.c
   (base_bdb_set_errcall):
     Use svn_fs_bdb__get_error_info() to get the bdb_error_info_t instead of
     getting it directly from the structure.

Patch by: Michael Zhang <mzhang_at_imageworks.com>
Reviewed by: blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=978637
Received on 2008-12-03 08:54:17 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.