This was terrible (but I guess that goes without saying). The server
"unexpectedly" restarted and when it cam back up, 10 of our 16
repositories were broken (a variety of errors when a svn co was
performed). 2-3 were fixable with svnadmin recover. The remainder
were not fixable with either svnadmin recover or db_recover -c. In
each case, they reported problems with the "magic number" of one of
the log files, and list-unused-dblogs always reported that the
affected files were needed.
Since we've got 10 engineers waiting to get back to work, the
immediate solution was to restore the broken repositories from the
nightly backups. Not pleasant, but at least no work was lost this
time. One of the repositories is 24 Gigs of source and data for a
So, after getting my co-workers to switch to subversion, I'm now
being questioned as to how a catastrophe like this is possible with a
version control system (we've had other problems in the past too).
I'm questioning it as well. How can subversion/bdb be this brittle?
It seems as though you can't rely on it at all if something goes
wrong! We realize that when a machine goes down, any files that were
being written at the time are obviously suspect, but we lost
repositories that have not been used in weeks or months. We have
checked drive and data corruption on the rest of the server and there
is none. Only the log.xxx files report issues.
Xserve, Mac OS X 10.3, 250 Gig mirrored raid, HFS+ (Journaled)
svnserve 1.1.2, db-4.2.52
All access is via ssh. No local access or Apache.
Clients are a mixture of Linux, Mac, and Windows.
Questions I was hoping someone could answer:
- Some of the damaged repositories were not in use at the time
(idle for days or months). How is that possible without drive or
filesystem damage? There were no svnserve processes running on those
files at the time.
- Is this expected behavior for bdb repositories? This is not the
first time we've had unrecoverable damage to them (previous times
seemed to be from normal use).
- Does fsfs solve the fragility issues that bdb seems to have? If
moving the repositories to fsfs will make these problems a thing of
the past, I'm there.
Management (and my fellow engineers) are going to be wanting some
answers from me in the next few days. If I can't explain how this can
be avoided in the future It's a good bet there will be a call to move
to a more stable platform. I really don't want that, but I'm at a
point where I don't trust subversion either right now (one of the
projects I had to restore was mine, and I'm supposed to be Beta
Can anyone help me figure out what happened and how to prevent this
from happening again.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Fri Jul 1 17:31:09 2005