Jim Blandy <jimb@red-bean.com> writes:
> I think the logic in svn_open_repos needs to be:
>
> - get a shared lock on locks/db.lock
> - try calling svn_fs_open_berkeley
> - if it fails with DB_RUN_RECOVERY, then:
> - release shared lock on db.lock
> - get an exclusive lock on db.lock
> - call svn_fs_berkeley_recover
> - release exclusive lock
> - retry from the top
>
> In this arrangement, we only try to recover when we have an exclusive
> lock, and we never return an opened filesystem object unless we have a
> shared lock.
The problem with that arrangement is that it doesn't run recovery
unless somebody returns DB_RUN_RECOVERY. But not every failure mode
requiring recovery manifests itself this clearly. For example, if a
process dies holding locks, then other processes will just end up
hanging, waiting for those locks to be released --- you don't get a
nice error return. I think this is part of what Keith meant when he
said:
Generally you don't want to depend on DB_RUN_RECOVERY being
returned. Detecting corruption is performance prohibitive,
Berkeley DB can't do that. Instead, just always run recovery
if there has been system or application failure.
But part of the problem is figuring out whether we have a failure or
not --- we don't have any mechanisms in place at the moment for
recognizing a hung process. Perhaps the main Apache process has some
facilities for killing processes that spend too long on a request.
Greg?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 24 01:03:06 2002