Rescuing a repository

From: Marc Haisenko <haisenko_at_webport.de>
Date: 2004-05-14 13:23:01 CEST

Hi folks,
we have a big repository (strings file is 3.1GB) on a RAID 5. Problem is, I
can't dump the repository, I always get a checksum mismatch in revision 336.
And it seems the latest revision has the same problem as well.

The system is a SuSE 8.2 with BDB 4.0.14, running SubVersion 0.37 (sic). I
yesterday and today tried to rescue the repository with with SubVersion 1.0.2
linked against BDB 4.1.25, and SubVersion 1.0.2 linked against BDB 4.2.52,
but everything failed.

First, I made a copy the repository so I can't mess up the original. I then
ran SVN 0.37/BDB 4.0.14 'svnadmin recover /export/rescue', which after
several minutes said "Input/output error" (note that despite the path this is
not a NFS directory). I then copied the copy to check whether there is a
problem with the RAID or something, but the copy went smooth. A fsck also
went without problems.

I then ran 'svnadmin dump /export/rescue', and everything went fine until it
reached revision 336 where a checksum mismatch was reported. Repeatedly
running the dump always yields the exact same checksum mismatch (always the
same checksums).

I suspected a defect RAM, installed Memtest86 and ran it but after letting it
run through the night no defective RAM was found.

Then today I downloaded BDB 4.1.25 and SVN 1.0.2, and compiled/linked SVN
1.0.2 against the BDB 4.1.25. Results are the very same as with SVN 0.37/BDB

I then also downloaded BDB 4.2.52, linked another SVN 1.0.2 against it an ran
its 'svnadmin recover'... this time the recover came back almost immediately,
reporting the correct current revision number (r506). But the dump still
doesn't work.

What's also interesting is that when trying to dump the revision 336 I get
different checksums for SVN 0.37 and SVN 1.0.2

SVN 0.37/BDB 4.0.14:
svn: Checksum mismatch on rep '4pi':
   expected: fe339b5a4133f58051f1f15380f46413
     actual: 4d21ea0c68cdde21698bc99e86eab179

SVN 1.0.2/BDB 4.1.25:
svn: Checksum mismatch on rep '4pi':
   expected: fe339b5a4133f58051f1f15380f46413
     actual: 7fc67fdbf244751f68f229270c97c3de

SVN 1.0.2/BDB 4.2.52:
svn: Checksum mismatch on rep '4pi':
   expected: fe339b5a4133f58051f1f15380f46413
     actual: 7fc67fdbf244751f68f229270c97c3de

I also tried experimenting with 'db_recover' but the 4.0.14 and 4.1.25
versions both yield the 'Input/output error'. And the 4.2.52 'db_recover'
returns immediately, no output whatsoever.

It's very important to get that repository up and running again as my boss is
fed up with SubVersion and will force me to ditch it if I can't get it
running again... the repository contains our main products.

So can anyone hint me what else I might try ? Or if nothing else works, how I
could possible try to fix that transaction by hand (I don't fear binary
editors ;-)

Thnx in advance,

Marc Haisenko
Webport IT-Services GmbH
mailto: haisenko@webport.de
