On Thu 2004-12-23 at 18:31:48 +0100, Branko Čibej wrote:
> Keith Bostic wrote:
[...]
> >>>Q2: Can there be processes still running in the existing
> >>>environment when we recover it?
> >>>
> >>>Yes. However, performing recovery immediately panics (and
> >>>removes) the existing environment, so the window of
> >>>vulnerability is small. (We check the panic flag in the DB API
> >>>methods, when spinning on a mutex, and whenever about to write
> >>>to disk). The only window of corruption is if the write check
> >>>of the panic were to complete, the region subsequently be
> >>>recovered, and then write continue. That's very, very unlikely
> >>>to happen. Note this vulnerability already exists in Berkeley
> >>>DB, and we've never heard of a problem.
> >>>
> >>Wouldn't that be because the documentation says that you must run
> >>recovery only when no other processes are using the environment? That
> >>is, you don't hear problem reports because people aren't excercising
> >>this case very often? I'm concerned that with the new DB_REGISTER flag,
> >>the opportuity for this race to happen would be much greater.
> >
> >The opportunity for the race is indeed greater.
> >
> >I don't think the window is worth worrying about, though. You
> >would have to pass the panic check and then go immediately to
> >sleep, then sleep while the database environment is removed and
> >re-created, and then wake up. That's pretty unlikely.
> >
> >
> Stranger things have happened...
>
> >Regardless, I can't think of any way to close the window, can
> >anybody else?
> >
> Only if there was a way to guarantee that the process doesn't get
> preempted in that tiny window, but I can't imagine how you'd do that
> short of relying on some very ugly OS-specific magic.
>
> I'm still worried because it seems to me that if a process slips through
> this race, it can unrecoverably corrupt the database. But I admit I
> don't see a foolproof, portable soution.
Under the assumption that the need for recover is an exception, you
could explicitly yield (sleep) for the other processes to learn of the
problem before you continue. Even if it isn't sufficient to eliminate
the risk completely, it reduces it a lot, doesn't it? No other
processes could open the environment, because you are still holding
the exclusive lock on the whole file.
You could even check which of the other processes have released their
slot lock in the meantime, and only continue when all are done (is
there a possibility for a dead-lock?)
That suggestion requires that the "panic" (whatever that is :) can be
issued separately from destroying the old environment, if I understand
it correctly.
Sorry for so much hand-waving, but I hope I could at least inspire an
new idea, even if my suggestion isn't working.
Bye,
Benjamin.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Dec 24 08:16:19 2004