Keith Bostic wrote:
>>From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <firstname.lastname@example.org>
>>>Here's the pseudo-code to acquire a DB_ENV handle:
>>> Open/create the DB_REGISTER file
>>> If the DB_REGISTER file is 0-length
>>> Write identifying string in the first line
>>There's a race here. It would be better to acquire the exclusive lock
>>first, then check and/or write the identifying string.
>>Yes, fcntl can lock byte ranges beyond EOF, in every implementation I've
>I don't think there's a race here -- all threads of control are
>writing the same byte strings, I don't think we care which one
That's assuming that writing the identification string is atomic. While
that will mostly be true, I'd rather not rely on that. I don't think
lokcing the file first would cause any problems, and it would make the
code slightly safer (and more portable). Safety is good. :-)
>>Instead of marking all the slots empty, wouldn't it be better to mark
>>only those that are marked used but aren't locked? This means that you
>>always have to walk the whole list in the first loop, but that's the
>>expected case anyway and it doesn't make sense to short-circuit the
>>error case, and you can merge the second loop into the first. Live
>>processes will release their slots anyway when they panic out of the
>No, that might cause more recovery runs than are necessary.
>Let's say we find an allocated slot w/o a lock. Call that
>generation #1, and we then run recovery, starting generation #2.
>Then, while in generation #2, a still running process from
>generation #1 drops core, dropping its lock.
>The next time we review the registry file we will find an
>allocated slot w/o a lock and will run recovery. That's not
>going to cause a problem, but it's not necessary, we've already
>dealt with any failures from generation #1 processes failing.
You're optimising the error case here, which I thnk is a bad idea. In
general I'd expect gen#1 processes to be aware of this automatic
recovery scheme and to exit gracefully. Of course it's always possible
that someone writes an app that always (or at least often) crashes while
holding an open DB_ENV, and this would indeed cause more recoveries to
be run. But that application is buggy anyway, and I think the
no-locked-empty-slots guarantee is more important than catering to that
sort of bug.
>>>Q2: Can there be processes still running in the existing
>>>environment when we recover it?
>>>Yes. However, performing recovery immediately panics (and
>>>removes) the existing environment, so the window of
>>>vulnerability is small. (We check the panic flag in the DB API
>>>methods, when spinning on a mutex, and whenever about to write
>>>to disk). The only window of corruption is if the write check
>>>of the panic were to complete, the region subsequently be
>>>recovered, and then write continue. That's very, very unlikely
>>>to happen. Note this vulnerability already exists in Berkeley
>>>DB, and we've never heard of a problem.
>>Wouldn't that be because the documentation says that you must run
>>recovery only when no other processes are using the environment? That
>>is, you don't hear problem reports because people aren't excercising
>>this case very often? I'm concerned that with the new DB_REGISTER flag,
>>the opportuity for this race to happen would be much greater.
>The opportunity for the race is indeed greater.
>I don't think the window is worth worrying about, though. You
>would have to pass the panic check and then go immediately to
>sleep, then sleep while the database environment is removed and
>re-created, and then wake up. That's pretty unlikely.
Stranger things have happened...
>Regardless, I can't think of any way to close the window, can
Only if there was a way to guarantee that the process doesn't get
preempted in that tiny window, but I can't imagine how you'd do that
short of relying on some very ugly OS-specific magic.
I'm still worried because it seems to me that if a process slips through
this race, it can unrecoverably corrupt the database. But I admit I
don't see a foolproof, portable soution.
>I would have to kill any running processes using the database
>environment to close the window. I'd do that if if I could,
>but that assumes a model where all processes in the environment
>are either setuid or related so that they could signal each
>other, and I don't think that's possible.
And it's not desirable, either. One of those processes might be an
Apache worker, for example, and you certainly don't want to kill those
off in mid-request.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Thu Dec 23 18:31:46 2004