Subversion's use of Berkeley DB [#11511]

From: Keith Bostic <bostic_at_abyssinian.sleepycat.com>
Date: 2004-12-08 18:38:38 CET

Hi, my name is Keith Bostic and I'm with Sleepycat Software.

We (Sleepycat Software) are getting beaten up periodically
because Subversion users have problems with Berkeley DB, and I'd
like to see if we can fix that once and for all. To that end,
I've been talking with Mike Pilato over the past few days about
how Subversion uses Berkeley DB, and where problems might be.

There were three issues we found. I'm going to describe them
in this email, and I'm happy to answer any questions anyone has.
Then, Mike and I were hoping to find someone willing to sign up
for making whatever code changes are needed in Subversion.

1. The Subversion code is not setting the Berkeley DB cache size.
   Given Berkeley DB's small default cache size (256KB), and the
   expected good locality of reference for Subversion queries,
   I think Subversion will be able to increase performance by
   setting the cache size.

You can set the cache in the DB_CONFIG file, or by using
the DbEnv::set_cachesize method:

http://www.sleepycat.com/docs/api_c/env_set_cachesize.html

   For more information, see the "Selecting a cache size"
   section of the Berkeley DB Reference Guide, included in your
   download package and also available at:

http://www.sleepycat.com/docs/ref/am_conf/cachesize.html

   Action Items:
   Investigate the efficiency of the current Subversion cache
   (using the Berkeley DB db_stat utility), and see if there's
   benefit to be had by increasing the cache size.

Change Subversion to specify a cache size whenever creating
a Berkeley DB database environment.

2. Subversion users are occasionally seeing "out of memory
   errors". The Subversion code has recently added an error
   callback routine, so future occurrences of this problem
   should result in the detailed Berkeley DB error message being
   available for later debugging.

   Given the default 256KB cache size, and using, for example,
   16KB database page sizes, 8 threads of control in the
   database at the same time, each grabbing 2 pages, will run
   the cache out of room, resulting in this failure. So,
   increasing the cache size may very well fix this problem.

Action Items:
None at this time.

3. Subversion isn't recovering the database after application or
system failure -- it's only running recovery if Berkeley DB
explicitly returns DB_RUNRECOVERY.

This is likely the source of the periodic corruption Subversion
users have seen.

   The problem is Subversion is itself a library, with different
   top-layer interfaces, Apache and standalone administrative
   programs among them. To solve this problem we're going to
   need to find a way for the Subversion library to know if a
   thread of control entering Subversion code is the first
   thread of control to access the Berkeley DB database
   environment so it can run recovery as it opens the database
   environment.

   This is the problem that George Schlossnagle had to solve for
   integrating Berkeley DB with the Apache mod_db4 module, and
   it's a standard problem for Sleepycat Software customers
   using Berkeley DB in multi-process environments. The fact
   that Subversion is a library, and the Subversion installation
   cannot modify system startup procedures complicates things
   somewhat, though.

   There already appears to be some code in Subversion trying
   to know when Subversion is creating a database environment,
   so it may be simpler than we think.

Action Items:
This item may need more discussion.

   As a springboard for that discussion, I propose we find a
   serialization point for all threads of control using a
   Subversion repository so we can determine if a thread of
   control is the first thread of control entering the database
   environment after a possible application or system failure.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 8 18:39:46 2004

This message: [ Message body ]
Next message: François Beausoleil: "Getting better error messages out of the Python bindings"
Previous message: Jim Correia: "Bug with svn ls and filename with space?"
Next in thread: Justin Erenkrantz: "Re: Subversion's use of Berkeley DB [#11511]"
Reply: Justin Erenkrantz: "Re: Subversion's use of Berkeley DB [#11511]"
Reply: Justin Erenkrantz: "Re: Subversion's use of Berkeley DB [#11511]"
Reply: C. Michael Pilato: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Reply: Garrett Rooney: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Reply: Greg Hudson: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"
Maybe reply: Keith Bostic: "Re: Subversion's use of Berkeley DB [#11511]"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]