Re: [RFC] Conceptual locking procedure for database access [#11511]

From: Keith Bostic <bostic_at_abyssinian.sleepycat.com>
Date: 2004-12-23 18:07:20 CET

> From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <brane@xbc.nu>
>
>> Here's the pseudo-code to acquire a DB_ENV handle:
>>
>> Open/create the DB_REGISTER file
>> If the DB_REGISTER file is 0-length
>> Write identifying string in the first line
>>
>>
>
> There's a race here. It would be better to acquire the exclusive lock
> first, then check and/or write the identifying string.
>
> Yes, fcntl can lock byte ranges beyond EOF, in every implementation I've
> heard of.

I don't think there's a race here -- all threads of control are
writing the same byte strings, I don't think we care which one
wins.

>> Acquire exclusive lock on the file (WAIT)
>> For every allocated process ID slot {
>> Acquire lock on the process slot (NOWAIT)
>> If acquire was successful {
>> Release process slot lock
>> Recovery is needed
>> Break out of loop
>> }
>> }
>>
>> If recovery is needed
>> Mark all slots in the DB_REGISTER file empty
>>
>> Find an empty process slot {
>> Acquire lock on the process slot (NOWAIT)
>> if acquire was successful {
>> Overwrite the slot with our process ID
>> Break out of loop
>> }
>> }
>>
>
> Instead of marking all the slots empty, wouldn't it be better to mark
> only those that are marked used but aren't locked? This means that you
> always have to walk the whole list in the first loop, but that's the
> expected case anyway and it doesn't make sense to short-circuit the
> error case, and you can merge the second loop into the first. Live
> processes will release their slots anyway when they panic out of the
> environment.

No, that might cause more recovery runs than are necessary.

Let's say we find an allocated slot w/o a lock. Call that
generation #1, and we then run recovery, starting generation #2.
Then, while in generation #2, a still running process from
generation #1 drops core, dropping its lock.

The next time we review the registry file we will find an
allocated slot w/o a lock and will run recovery. That's not
going to cause a problem, but it's not necessary, we've already
dealt with any failures from generation #1 processes failing.

>> Q2: Can there be processes still running in the existing
>> environment when we recover it?
>>
>> Yes. However, performing recovery immediately panics (and
>> removes) the existing environment, so the window of
>> vulnerability is small. (We check the panic flag in the DB API
>> methods, when spinning on a mutex, and whenever about to write
>> to disk). The only window of corruption is if the write check
>> of the panic were to complete, the region subsequently be
>> recovered, and then write continue. That's very, very unlikely
>> to happen. Note this vulnerability already exists in Berkeley
>> DB, and we've never heard of a problem.
>
> Wouldn't that be because the documentation says that you must run
> recovery only when no other processes are using the environment? That
> is, you don't hear problem reports because people aren't excercising
> this case very often? I'm concerned that with the new DB_REGISTER flag,
> the opportuity for this race to happen would be much greater.

The opportunity for the race is indeed greater.

I don't think the window is worth worrying about, though. You
would have to pass the panic check and then go immediately to
sleep, then sleep while the database environment is removed and
re-created, and then wake up. That's pretty unlikely.

Regardless, I can't think of any way to close the window, can
anybody else?

I would have to kill any running processes using the database
environment to close the window. I'd do that if if I could,
but that assumes a model where all processes in the environment
are either setuid or related so that they could signal each
other, and I don't think that's possible.

>> Q3: Can there be processes still running in the old environment
>> after we're up and running with the new one?
>>
>> Yes. However, those processes can't corrupt anything (as they
>> won't be able to write anything into the log or database files
>> after the panic of the environment), and those processes will
>> hopefully notice the panic flag eventually.
>
> But what happens to any open transactions these processes hold? Do they
> get rolled back automaically? If not, how can the process do a rollback
> if it can't write to the log file?

Transactions held by these processes are rolled-back as part of
recovery, by the process doing recovery. For the purposes of
this database environment, these processes are of no interest
once the environment is recovered.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 23 18:08:34 2004

This message: [ Message body ]
Next message: Basile STARYNKEVITCH: "missing feature in Subversion: $Format keyword a la PRCS"
Previous message: Uwe Zeisberger: "Re: [PATCH] skip_uri_schema without strlen"
Maybe in reply to: Keith Bostic: "Re: [RFC] Conceptual locking procedure for database access [#11511]"
Next in thread: Branko Čibej: "Re: [RFC] Conceptual locking procedure for database access [#11511]"
Reply: Branko Čibej: "Re: [RFC] Conceptual locking procedure for database access [#11511]"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]