"Sander Striker" <striker@apache.org> writes:
> Anyhow, I'm CC'ing in Keith Bostic (Hi Keith, hope you don't mind),
> hoping he can shed some light on this*.
>
> Keith: for the record, one of my repositories locked up one day.
> First I tried to run 'dbrecover -v -h ${REPOS}/db'. This didn't
> unlock my repos. Then I tried 'dbrecover -ve -h ${REPOS}/db' and
> it did unlock my repos. Any ideas?
Indeed, it might be very beneficial (for us) if Keith Bostic read the
whole thread, starting from Jim Blandy's first message. It's pretty
short, and Jim's questions are clear and pointed.
On the off chance that he's willing to do this, I've included below
the three messages that Keith hasn't seen yet, with some minor edits
for brevity. Here they are:
-----------------
Jim's first post:
-----------------
From: Jim Blandy <jimb@red-bean.com>
Subject: Wedged repositories
To: dev@subversion.tigris.org
Date: Mon, 22 Jul 2002 17:27:59 -0500
It's a known problem that Subversion repositories can get wedged, and
that running db_recover -e on them fixes things. The db_recover
program, part of the Berkeley DB distribution, is basically a wrapper
around a single call to DBENV->open, which is given the DB_RECOVER
flag. Since recovery is fast when the repository was shut down
properly, there's no reason Subversion couldn't do this itself. The
FS API provides for this.
In fact, the code in libsvn_repos looks like it's trying to do this,
but it doesn't. I'm having a hard time discerning the intent here.
There is some locking stuff in libsvn_repos/repos.c, but
svn_open_repos uses it in a weird way: it calls svn_fs_open_berkeley,
and *then* it acquires its locks on db.lock. Since there are no
shared resources acquired *after* we get the lock, it's hard to see
what that lock could reliably exclude.
I think the logic in svn_open_repos needs to be:
- get a shared lock on locks/db.lock
- try calling svn_fs_open_berkeley
- if it fails with DB_RUN_RECOVERY, then:
- release shared lock on db.lock
- get an exclusive lock on db.lock
- call svn_fs_berkeley_recover
- release exclusive lock
- retry from the top
In this arrangement, we only try to recover when we have an exclusive
lock, and we never return an opened filesystem object unless we have a
shared lock.
I understand there have been problems with svn_fs_berkeley_recover
itself returning DB_RUN_RECOVERY. That's pretty confusing; it's kind
of useless if it does that. But that's not a problem which can be
swept under the rug, say, by declaring Issue 403 resolved; it's a
central part of the problem.
-----------------------
Sander responds to Jim:
-----------------------
From: "Sander Striker" <striker@apache.org>
Subject: RE: Wedged repositories
To: "Jim Blandy" <jimb@red-bean.com>, <dev@subversion.tigris.org>
Date: Tue, 23 Jul 2002 00:41:22 +0200
Jim Blandy wrote:
> In this arrangement, we only try to recover when we have an exclusive
> lock, and we never return an opened filesystem object unless we have a
> shared lock.
I am sooo +1 on this approach.
> I understand there have been problems with svn_fs_berkeley_recover
> itself returning DB_RUN_RECOVERY. That's pretty confusing; it's kind
> of useless if it does that. But that's not a problem which can be
> swept under the rug, say, by declaring Issue 403 resolved; it's a
> central part of the problem.
It seems that 'db_recover -e' works more often than 'db_recover'. The
-e flag is to retain the environment, which could be influenced by
DB_CONFIG. So, we need to remove the DB_PRIVATE flag from the recover
routine in repos.c
Sander
----------------------------------
Jim responds to Sander's response:
----------------------------------
From: Jim Blandy <jimb@red-bean.com>
Subject: Re: Wedged repositories
To: "Sander Striker" <striker@apache.org>
CC: dev@subversion.tigris.org
Date: 22 Jul 2002 17:48:51 -0500
"Sander Striker" <striker@apache.org> writes:
> It seems that 'db_recover -e' works more often than 'db_recover'. The
> -e flag is to retain the environment, which could be influenced by
> DB_CONFIG. So, we need to remove the DB_PRIVATE flag from the recover
> routine in repos.c
This doesn't jibe with the Berkeley DB docs, though:
http://www.sleepycat.com/docs/api_c/env_open.html
If we've made sure we're the only process accessing the DB
environment, it shouldn't matter. We should be able to let the
recovery process do whatever it pleases. If it doesn't work, then we
don't really understand what's happening.
I'd like to have a solid explanation for what's going before we start
flipping flags on and off based on what "works more often." These are
the repository's low-level mechanisms we're dealing with here; we
really should try to understand what we're doing.
...And that's all.
Keith, if you've read this far, THANK YOU for your time!
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 23 05:50:12 2002