Philip Martin wrote:
>Branko ÄŒibej <firstname.lastname@example.org> writes:
>>C. Michael Pilato wrote:
>>>Keith Bostic <email@example.com> writes:
>>>> As a springboard for that discussion, I propose we find a
>>>> serialization point for all threads of control using a
>>>> Subversion repository so we can determine if a thread of
>>>> control is the first thread of control entering the database
>>>> environment after a possible application or system failure.
>>>My suggestion is that libsvn_fs_base grows the same serialization that
>>>mod_db4 uses, which is based around the use of a shared memory segment
>>>with a reference count in it.
>>There is a pretty fundamental problem with using a reference count
>>like that. If a process that is accessing BDB crashes after having
>>incremented the refcount, the refcount is gets out if sync and is
>>useless. Also, other processes that are already running might wedge on
>>a lock owned by the crashed process. I see no way to resolve this.
>We have the libsvn_repos lock, which is the current mechanism to
>ensures that recovery gets exclusive access. Normal read/write
>repository access, such as svn_repos_open(), takes a non-exclusive
>lock, while "special" access, such as svn_repos_recover(), takes an
>Suppose svn_repos_open() were first to make a non-blocking attempt to
>take an exclusive lock, if that fails it just carries on as at present
>and tries to take a non-exclusive lock. If however svn_repos_open()
>manages to take an exclusive lock then it can do whatever it wants
>(perhaps run recovery?) and then it can drop the exclusive lock, take
>a non-exclusive one and continue as at present.
>This would mean that recovery would get run whenever anything accessed
>a repository that was not otherwise being accessed, is that the
>desired behaviour? Should be simple to implement, but I don't know
>what sort of performance effect it would have. Do we really want to
>run recovery that often?
Probably not; some sort of counter would be nice, with a forced check
every N times (like forced fscks on some filesystems). Of course, we
should also force a check any time some process gets a DB_RUN_RECOVERY
Keith, in the past I noticed that when a process with that has opened a
BDB environment crashes, leving love locks behind, other processes that
are using the same environment may hang indefinitely. This is what
usually causes SVN repositories to get in a "wedged" state. This leads
me to believe that relying on BDB to detect this situation and return
DB_RUN_RECOVERY to the other processes isn't reliable.
Am I missing something here, or is using a separate server process with
exclusive access to the repository truly the only way to completely
avoid such hangs?
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Fri Dec 10 02:59:22 2004