Re: I lost 7 bdb repositories yesterday!

From: Soren 'Frank' Munch <sm_sbv_at_u5.com>
Date: 2005-07-01 19:07:41 CEST

Hi Dave,

wow, that is a terrible situation. :-(

> It seems as though you can't rely on it at all if something goes
> wrong

If something goes wrong _enough_ you are right but that goes for any software
known.

FWIW, we have a dedicated repository-only server which over a year went down 4
times because of a poor disk until the source of the problem was detected.
Never a problem with any repository (of about 20), not even needing recover.
At least it shows that it is not a law of nature that svn/bdb crashes when
the server does.

> It's a good bet there will be a call to move
> to a more stable platform.

Finding a platform that is well documented to be more stable across crashing
servers may be a hard job.

Your server crash is the place to start. Is your repository server planned
well for its job? What crashed it? Is it busy running other applications? If
so would a dedicated server maybe be a smaller investment than moving into
new unknows of another version control system?

> - Some of the damaged repositories were not in use at the time
> (idle for days or months). How is that possible without drive or
> filesystem damage? There were no svnserve processes running on those
> files at the time.

If the files in a repo are really unchanged and no corruption has taken place
what reasons could we possibly imagine would have it stopping to work if they
did earlier?

To check, try to move a failing repo to another server with a svn/bdb
installation. There is to my experience no hardcoded absolute path-info in a
repo, so you can simply tar the directory and move. Then play with svncheck
out etc.

If it all works you will have the answer: The crash caused changes in parts
of the svn/bdb installation (or something it depends on) itself.

If they still don't work reconsider your statement that the repo-files were
not corrupted. Did you check on file timestamps?

Best wishes,

Soren 'Frank,
u5com

On Friday 01 July 2005 22:17, Dave Camp wrote:
> This was terrible (but I guess that goes without saying). The server
> "unexpectedly" restarted and when it cam back up, 10 of our 16
> repositories were broken (a variety of errors when a svn co was
> performed). 2-3 were fixable with svnadmin recover. The remainder
> were not fixable with either svnadmin recover or db_recover -c. In
> each case, they reported problems with the "magic number" of one of
> the log files, and list-unused-dblogs always reported that the
> affected files were needed.
>
> Since we've got 10 engineers waiting to get back to work, the
> immediate solution was to restore the broken repositories from the
> nightly backups. Not pleasant, but at least no work was lost this
> time. One of the repositories is 24 Gigs of source and data for a
> massive project.
>
> So, after getting my co-workers to switch to subversion, I'm now
> being questioned as to how a catastrophe like this is possible with a
> version control system (we've had other problems in the past too).
> I'm questioning it as well. How can subversion/bdb be this brittle?
> It seems as though you can't rely on it at all if something goes
> wrong! We realize that when a machine goes down, any files that were
> being written at the time are obviously suspect, but we lost
> repositories that have not been used in weeks or months. We have
> checked drive and data corruption on the rest of the server and there
> is none. Only the log.xxx files report issues.
>
> Our setup:
> Xserve, Mac OS X 10.3, 250 Gig mirrored raid, HFS+ (Journaled)
> filesystem
> svnserve 1.1.2, db-4.2.52
> All access is via ssh. No local access or Apache.
> Clients are a mixture of Linux, Mac, and Windows.
>
> Questions I was hoping someone could answer:
>
> - Some of the damaged repositories were not in use at the time
> (idle for days or months). How is that possible without drive or
> filesystem damage? There were no svnserve processes running on those
> files at the time.
>
> - Is this expected behavior for bdb repositories? This is not the
> first time we've had unrecoverable damage to them (previous times
> seemed to be from normal use).
>
> - Does fsfs solve the fragility issues that bdb seems to have? If
> moving the repositories to fsfs will make these problems a thing of
> the past, I'm there.
>
> Management (and my fellow engineers) are going to be wanting some
> answers from me in the next few days. If I can't explain how this can
> be avoided in the future It's a good bet there will be a call to move
> to a more stable platform. I really don't want that, but I'm at a
> point where I don't trust subversion either right now (one of the
> projects I had to restore was mine, and I'm supposed to be Beta
> tomorrow).
>
> Can anyone help me figure out what happened and how to prevent this
> from happening again.
>
> Thanks,
> Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Jul 1 19:30:04 2005

This message: [ Message body ]
Next message: Soren 'Frank' Munch: "Re: buildcheck.sh "warnings""
Previous message: Martin A. Brooks: "Re: I lost 7 bdb repositories yesterday!"
In reply to: Dave Camp: "I lost 7 bdb repositories yesterday!"
Next in thread: Karan, Cem \(Civ, ARL/CISD\): "RE: I lost 7 bdb repositories yesterday!"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]