On 9/1/06, C. Michael Pilato <cmpilato@collab.net> wrote:
> Jack Repenning wrote:
> > BDB's "db_recover" (and likewise their API) provides two levels of
> > recovery, a "normal" and a "catastrophic." Currently, "svnadmin
> > recover" uses "normal" recovery; if your BDB database actually needs
> > "catastrophic" recovery, you have to use the raw BDB commands. This is
> > unobvious to the new administrator and pretty tiresome even to the
> > cognoscenti.
> >
> > After dancing and reading and experimenting quite a bit, I believe that
> > there is really no reason for "svnadmin recover" ever to only "normally"
> > recover, and further I have painful experience of cases where
> > "catastrophic" recovery really is required. I propose (see patch) that
> > "svnadmin recover" just do it "catastrophically" always.
> >
> > This patch does, of course, mess with database integrity (or mess about
> > in that neighborhood), so I'm quite a bit more than ordinarily
> > interested in any additional eyes on the solution. As you'll see from
> > the patch, you're not really going to learn much from the patch itself
> > (a classic "one line change" sort of booby trap!;-), but the BDB
> > documentation linked below might be food for helpful thought.
> >
> > Further notes:
> >
> > The BDB documentation on the difference is perhaps less than ideal. The
> > db_recover man page, e.g.,
> > http://www.sleepycat.com/docs/utility/db_recover.html
> > documents that a catastrophic recovery may require access to more of the
> > database log files than for a normal recovery. However, if you attempt
> > a catastrophic recovery without these extra files, you get a message to
> > that effect (including naming the missing files), so this seems fairly
> > innocuous. The man page doesn't say much else of interest here.
> >
> > Somewhat more information is available in the Reference Guide (as you
> > might well expect),
> > http://www.sleepycat.com/docs/ref/transapp/archival.html
> > but not a whole lot: the primary added information here is that the
> > "catastrophe" they had in mind when naming this option was "physical
> > hardware has been destroyed" or similar--which, OK, is definitely a
> > "catastrophe," but there seems to be quite a lot of turf between a minor
> > inconsistency like an aborted BDB client (the "normal" case) and this
> > "catastrophe"!
> >
> > The situation where I ran into all this was: creating a tool for moving
> > an SVN/BDB database around. I needed to make a really bullet-proof
> > script, for use by administrators in the heat of battle. The procedures
> > absolutely must and do say to make sure no one is touching the
> > repository when the procedure is applied, and if you do ensure that, you
> > don't need recovery. But what if you forget that, or slip up, or
> > someone was running a monumentally long check out just as you closed off
> > all further connections? In a live, paying-customer support sort of
> > situation, it's not really good enough to slap the opsie's wrist, say
> > "bad opsie, bad opsie," leave the customer down for hours longer, and
> > roll back to whatever version was last sent to backup media (which is
> > what you end up having to do, at least sometimes, if you choose the
> > wrong recovery option) ... very tacky. If, on the other hand, you just
> > whack the moved DB with a "catastrophic" recovery then one of the
> > following applies:
> >
> > 1. there was nothing wrong, and nothing is done--we're cool
> > 2. there was a slight glitch, it could have been recovered without "-c",
> > but what the heck, -c works, too--still cool
> > 3. there was a big glitch, normal recovery would have failed (oh, and by
> > the way, made the problem worse), but "-c" succeeds--so we're ahead
> > 4. things are so very screwed up that nothing will work--well, OK, we're
> > not any WORSE off....
> >
> > I was able to recreate all four of those circumstances by careful abuses
> > of the tool and procedure. I was not able to create any case where "-c"
> > failed when "normal" would have worked (other than the case where some
> > log files are absent, mentioned above, which as mentioned above is sort
> > of self-documenting and healable).
> >
> > Thoughts?
>
> Considered doing the following?
>
> db_err = bdb_recover(path, FALSE, pool);
> if (db_err)
> db_err = bdb_recover(path, TRUE, pool);
> return db_err;
>
> Try the non-catastrophic first, and, failing that, try the catastrophic?
That assumes that the initial non-catastrophic recovery won't make
things worse. I believe Jack implied in his mail that it was possible
that it would, although I personally have no idea what kind of
situation would cause such a thing.
-garrett
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Sep 1 17:40:47 2006