Hi, my name is Keith Bostic and I'm with Sleepycat Software.
We (Sleepycat Software) are getting beaten up periodically
because Subversion users have problems with Berkeley DB, and I'd
like to see if we can fix that once and for all. To that end,
I've been talking with Mike Pilato over the past few days about
how Subversion uses Berkeley DB, and where problems might be.
There were three issues we found. I'm going to describe them
in this email, and I'm happy to answer any questions anyone has.
Then, Mike and I were hoping to find someone willing to sign up
for making whatever code changes are needed in Subversion.
1. The Subversion code is not setting the Berkeley DB cache size.
Given Berkeley DB's small default cache size (256KB), and the
expected good locality of reference for Subversion queries,
I think Subversion will be able to increase performance by
setting the cache size.
You can set the cache in the DB_CONFIG file, or by using
the DbEnv::set_cachesize method:
For more information, see the "Selecting a cache size"
section of the Berkeley DB Reference Guide, included in your
download package and also available at:
Investigate the efficiency of the current Subversion cache
(using the Berkeley DB db_stat utility), and see if there's
benefit to be had by increasing the cache size.
Change Subversion to specify a cache size whenever creating
a Berkeley DB database environment.
2. Subversion users are occasionally seeing "out of memory
errors". The Subversion code has recently added an error
callback routine, so future occurrences of this problem
should result in the detailed Berkeley DB error message being
available for later debugging.
Given the default 256KB cache size, and using, for example,
16KB database page sizes, 8 threads of control in the
database at the same time, each grabbing 2 pages, will run
the cache out of room, resulting in this failure. So,
increasing the cache size may very well fix this problem.
None at this time.
3. Subversion isn't recovering the database after application or
system failure -- it's only running recovery if Berkeley DB
explicitly returns DB_RUNRECOVERY.
This is likely the source of the periodic corruption Subversion
users have seen.
The problem is Subversion is itself a library, with different
top-layer interfaces, Apache and standalone administrative
programs among them. To solve this problem we're going to
need to find a way for the Subversion library to know if a
thread of control entering Subversion code is the first
thread of control to access the Berkeley DB database
environment so it can run recovery as it opens the database
This is the problem that George Schlossnagle had to solve for
integrating Berkeley DB with the Apache mod_db4 module, and
it's a standard problem for Sleepycat Software customers
using Berkeley DB in multi-process environments. The fact
that Subversion is a library, and the Subversion installation
cannot modify system startup procedures complicates things
There already appears to be some code in Subversion trying
to know when Subversion is creating a database environment,
so it may be simpler than we think.
This item may need more discussion.
As a springboard for that discussion, I propose we find a
serialization point for all threads of control using a
Subversion repository so we can determine if a thread of
control is the first thread of control entering the database
environment after a possible application or system failure.
Keith Bostic email@example.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Wed Dec 8 18:39:46 2004