Re: [PATCH] [svnadmin] Always recover "catastrophically"

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2006-09-01 15:40:35 CEST

Jack Repenning wrote:
> BDB's "db_recover" (and likewise their API) provides two levels of
> recovery, a "normal" and a "catastrophic." Currently, "svnadmin
> recover" uses "normal" recovery; if your BDB database actually needs
> "catastrophic" recovery, you have to use the raw BDB commands. This is
> unobvious to the new administrator and pretty tiresome even to the
> cognoscenti.
>
> After dancing and reading and experimenting quite a bit, I believe that
> there is really no reason for "svnadmin recover" ever to only "normally"
> recover, and further I have painful experience of cases where
> "catastrophic" recovery really is required. I propose (see patch) that
> "svnadmin recover" just do it "catastrophically" always.
>
> This patch does, of course, mess with database integrity (or mess about
> in that neighborhood), so I'm quite a bit more than ordinarily
> interested in any additional eyes on the solution. As you'll see from
> the patch, you're not really going to learn much from the patch itself
> (a classic "one line change" sort of booby trap!;-), but the BDB
> documentation linked below might be food for helpful thought.
>
> Further notes:
>
> The BDB documentation on the difference is perhaps less than ideal. The
> db_recover man page, e.g.,
> http://www.sleepycat.com/docs/utility/db_recover.html
> documents that a catastrophic recovery may require access to more of the
> database log files than for a normal recovery. However, if you attempt
> a catastrophic recovery without these extra files, you get a message to
> that effect (including naming the missing files), so this seems fairly
> innocuous. The man page doesn't say much else of interest here.
>
> Somewhat more information is available in the Reference Guide (as you
> might well expect),
> http://www.sleepycat.com/docs/ref/transapp/archival.html
> but not a whole lot: the primary added information here is that the
> "catastrophe" they had in mind when naming this option was "physical
> hardware has been destroyed" or similar--which, OK, is definitely a
> "catastrophe," but there seems to be quite a lot of turf between a minor
> inconsistency like an aborted BDB client (the "normal" case) and this
> "catastrophe"!
>
> The situation where I ran into all this was: creating a tool for moving
> an SVN/BDB database around. I needed to make a really bullet-proof
> script, for use by administrators in the heat of battle. The procedures
> absolutely must and do say to make sure no one is touching the
> repository when the procedure is applied, and if you do ensure that, you
> don't need recovery. But what if you forget that, or slip up, or
> someone was running a monumentally long check out just as you closed off
> all further connections? In a live, paying-customer support sort of
> situation, it's not really good enough to slap the opsie's wrist, say
> "bad opsie, bad opsie," leave the customer down for hours longer, and
> roll back to whatever version was last sent to backup media (which is
> what you end up having to do, at least sometimes, if you choose the
> wrong recovery option) ... very tacky. If, on the other hand, you just
> whack the moved DB with a "catastrophic" recovery then one of the
> following applies:
>
> 1. there was nothing wrong, and nothing is done--we're cool
> 2. there was a slight glitch, it could have been recovered without "-c",
> but what the heck, -c works, too--still cool
> 3. there was a big glitch, normal recovery would have failed (oh, and by
> the way, made the problem worse), but "-c" succeeds--so we're ahead
> 4. things are so very screwed up that nothing will work--well, OK, we're
> not any WORSE off....
>
> I was able to recreate all four of those circumstances by careful abuses
> of the tool and procedure. I was not able to create any case where "-c"
> failed when "normal" would have worked (other than the case where some
> log files are absent, mentioned above, which as mentioned above is sort
> of self-documenting and healable).
>
> Thoughts?

Considered doing the following?

  db_err = bdb_recover(path, FALSE, pool);
  if (db_err)
    db_err = bdb_recover(path, TRUE, pool);
  return db_err;

Try the non-catastrophic first, and, failing that, try the catastrophic?

-- 
C. Michael Pilato <cmpilato@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

application/pgp-signature attachment: OpenPGP digital signature

Received on Fri Sep 1 15:41:10 2006

This message: [ Message body ]
Next message: Justin Erenkrantz: "Re: [PATCH] [svnadmin] Always recover "catastrophically""
Previous message: Kamesh Jayachandran: "Re: svnmerge.py not using standard revision syntax"
In reply to: Jack Repenning: "[PATCH] [svnadmin] Always recover "catastrophically""
Next in thread: Garrett Rooney: "Re: [PATCH] [svnadmin] Always recover "catastrophically""
Reply: Garrett Rooney: "Re: [PATCH] [svnadmin] Always recover "catastrophically""

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]