[PATCH] [svnadmin] Always recover "catastrophically"

From: Jack Repenning <jrepenning_at_collab.net>
Date: 2006-09-01 06:59:21 CEST

BDB's "db_recover" (and likewise their API) provides two levels of
recovery, a "normal" and a "catastrophic." Currently, "svnadmin
recover" uses "normal" recovery; if your BDB database actually needs
"catastrophic" recovery, you have to use the raw BDB commands. This
is unobvious to the new administrator and pretty tiresome even to the

After dancing and reading and experimenting quite a bit, I believe
that there is really no reason for "svnadmin recover" ever to only
"normally" recover, and further I have painful experience of cases
where "catastrophic" recovery really is required. I propose (see
patch) that "svnadmin recover" just do it "catastrophically" always.

This patch does, of course, mess with database integrity (or mess
about in that neighborhood), so I'm quite a bit more than ordinarily
interested in any additional eyes on the solution. As you'll see
from the patch, you're not really going to learn much from the patch
itself (a classic "one line change" sort of booby trap!;-), but the
BDB documentation linked below might be food for helpful thought.

Further notes:

The BDB documentation on the difference is perhaps less than ideal.
The db_recover man page, e.g.,
documents that a catastrophic recovery may require access to more of
the database log files than for a normal recovery. However, if you
attempt a catastrophic recovery without these extra files, you get a
message to that effect (including naming the missing files), so this
seems fairly innocuous. The man page doesn't say much else of
interest here.

Somewhat more information is available in the Reference Guide (as you
might well expect),
but not a whole lot: the primary added information here is that the
"catastrophe" they had in mind when naming this option was "physical
hardware has been destroyed" or similar--which, OK, is definitely a
"catastrophe," but there seems to be quite a lot of turf between a
minor inconsistency like an aborted BDB client (the "normal" case)
and this "catastrophe"!

The situation where I ran into all this was: creating a tool for
moving an SVN/BDB database around. I needed to make a really bullet-
proof script, for use by administrators in the heat of battle. The
procedures absolutely must and do say to make sure no one is touching
the repository when the procedure is applied, and if you do ensure
that, you don't need recovery. But what if you forget that, or slip
up, or someone was running a monumentally long check out just as you
closed off all further connections? In a live, paying-customer
support sort of situation, it's not really good enough to slap the
opsie's wrist, say "bad opsie, bad opsie," leave the customer down
for hours longer, and roll back to whatever version was last sent to
backup media (which is what you end up having to do, at least
sometimes, if you choose the wrong recovery option) ... very tacky.
If, on the other hand, you just whack the moved DB with a
"catastrophic" recovery then one of the following applies:

1. there was nothing wrong, and nothing is done--we're cool
2. there was a slight glitch, it could have been recovered without "-
c", but what the heck, -c works, too--still cool
3. there was a big glitch, normal recovery would have failed (oh, and
by the way, made the problem worse), but "-c" succeeds--so we're ahead
4. things are so very screwed up that nothing will work--well, OK,
we're not any WORSE off....

I was able to recreate all four of those circumstances by careful
abuses of the tool and procedure. I was not able to create any case
where "-c" failed when "normal" would have worked (other than the
case where some log files are absent, mentioned above, which as
mentioned above is sort of self-documenting and healable).


Jack Repenning
Chief Engineer, CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
mobile: +1 408.835.8090
raindance: 844.7461
skype: jrepenning

