Re: [RFC] Conceptual locking procedure for database access [#11511]

From: Keith Bostic <bostic_at_abyssinian.sleepycat.com>
Date: 2004-12-23 15:16:07 CET

> I wish you hadn't... or at least, bring me into the loop, please?

Sorry for the delay, but I was pulled away from this task for
the past 10 days; I'm back working on it now.

Here's my current design document. Comments welcome!

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Based on the Subversion SR (#11511), I've added routines to do
portable, multi-process database environment locking. Here's
how it will work.

The feature is triggered by specifying the DB_REGISTER flag to
the DbEnv.open method. If DB_REGISTER is specified, Berkeley
DB opens the file DB_REGISTER in the database environment home
directory. That file is formatted as follows:

        Berkeley DB environment registry. # identifying string
                            12345 # process ID slot 1
        X # empty slot
                            12346 # process ID slot 2
        X # empty slot
                            12347 # process ID slot 3
                            12348 # process ID slot 4
        X 12349 # empty slot
        X # empty slot

The first line is an identifying string, subsequent lines are
fixed-length process ID slots. Empty slots are marked with a
leading, non-digit character.

To modify the file, you get an exclusive lock on the first byte
of the file.

Each process has an exclusive lock on the first byte of its
process ID slot.

This work is based on the fact that if a process dies or the
system crashes, the process will drop its lock.

We decide if recovery needs to be run by walking the list of
process IDs. If we can lock the first byte of any allocated
slot, then a process must have died with an open DB_ENV handle,
and we have to clean up, running recovery on the environment.

Here's the pseudo-code to acquire a DB_ENV handle:

        Open/create the DB_REGISTER file
        If the DB_REGISTER file is 0-length
                Write identifying string in the first line

        Acquire exclusive lock on the file (WAIT)
        For every allocated process ID slot {
                Acquire lock on the process slot (NOWAIT)
                If acquire was successful {
                        Release process slot lock
                        Recovery is needed
                        Break out of loop
                }
        }

If recovery is needed
Mark all slots in the DB_REGISTER file empty

        Find an empty process slot {
                Acquire lock on the process slot (NOWAIT)
                if acquire was successful {
                        Overwrite the slot with our process ID
                        Break out of loop
                }

}

If recovery is needed
Recover the database environment

Release exclusive lock

Here's the pseudo-code to discard a DB_ENV handle:

Mark our process ID slot empty.
Release process slot lock

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Q1: Why isn't an exclusive lock necessary to discard a DB_ENV
handle?

We mark our process ID slot empty before we discard the process
slot lock, and threads of control reviewing the register file
ignore any slots which they can't lock.

Q2: Can there be processes still running in the existing
environment when we recover it?

Yes. However, performing recovery immediately panics (and
removes) the existing environment, so the window of
vulnerability is small. (We check the panic flag in the DB API
methods, when spinning on a mutex, and whenever about to write
to disk). The only window of corruption is if the write check
of the panic were to complete, the region subsequently be
recovered, and then write continue. That's very, very unlikely
to happen. Note this vulnerability already exists in Berkeley
DB, and we've never heard of a problem.

Q3: Can there be processes still running in the old environment
after we're up and running with the new one?

Yes. However, those processes can't corrupt anything (as they
won't be able to write anything into the log or database files
after the panic of the environment), and those processes will
hopefully notice the panic flag eventually.

Q4: Is this design portable?

I'm using fcntl(2) locking and file offsets; that's the only way
I can think of to get lots of locks, where the locks will go
away on process death. (We could do the same thing with flock
locking instead, but flock would require a separate file for
each process of control in the database environment, which isn't
going to be pleasant.) Windows supports fcntl-style locking
using a different API (except for Win/9X and Win/ME). This
feature won't be as portable as Berkeley DB in general, and will
need to auto-configure itself off where locking isn't available.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 23 15:17:20 2004

This message: [ Message body ]
Next message: John Peacock: "Re: 1.1.2 and Perl Bindings ..."
Previous message: Nicklas Norling: "Re: mailer.py now should work on Windows - soliciting testers"
Maybe in reply to: Keith Bostic: "Re: [RFC] Conceptual locking procedure for database access [#11511]"
Next in thread: Branko Čibej: "Re: [RFC] Conceptual locking procedure for database access [#11511]"
Reply: Branko Čibej: "Re: [RFC] Conceptual locking procedure for database access [#11511]"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]