[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Apparent repository corruption resulting from dump/load

From: Garrett Rooney <rooneg_at_electricjellyfish.net>
Date: 2006-08-29 19:55:17 CEST

On 8/29/06, Dan Mercer <dmercer@8kb.net> wrote:
> Garrett Rooney wrote:
> > On 8/25/06, Dan Mercer <dmercer@8kb.net> wrote:
> >> Howdy Subversives,
> >>
> >> I have recently encountered what appears to be repository corruption
> >> resulting from a dump and load of my subversion repository.
> snip
> >>
> >>
> >> I've used svn dump/load without issue for 6-8 months now and haven't
> >> encountered anything like this. Can anyone offer any advice or
> >> assistance in getting to the root cause?
> >
> > Is the problem reproducible (i.e. if you recreate the repository and
> > reload do you get the same problem)? Is it specific to that machine,
> > or can you make it happen on any system?
> I've started a reload on another system to test this. It is a slow
> operation and will take awhile as the repository is at ~90,000
> revisions. To recover in the short term, I rsync'd the repository to the
> mirror, which worked fine.
> > If not the entire thing, just the problematic revisions (both the
> > good and bad versions of them) would probably help.
> I built a list of the mismatched revs between the 'good' and 'corrupt'
> repositories. You can
> download a tarball with these from here:
>
> http://dmercer.info/svndiag.tgz
>
> I appreciate the assistance in getting to the bottom of this.

Interesting. The errors appear to be very specific, almost entirely
composed of offsets that are slightly off, and checksum mismatches
that are the result of the offsets being different. There's also a
case where some of the delta reps appear to be out of order, but I'm
not entirely sure that's a problem (well, it's certainly weird, but it
could be caused by other problems earlier on).

I'm curious if this all resulted because of the error in the first
file. For example, if you diff the earliest two revfiles that differ,
you get something like this:

--- corrupt/66517 2006-08-28 19:52:34.000000000 -0400
+++ good/66517 2006-08-28 19:52:48.000000000 -0400
@@ -591,7 +591,7 @@
 K 5
 hosts
 V 20
-dir 1.0.r66500/14124
+dir 1.0.r66500/14126
 K 5
 roles
 V 20
@@ -606,7 +606,7 @@
 type: dir
 pred: 0.0.r66516/7196
 count: 66517
-text: 66517 7184 110 110 892ec46e6a2ae7590f4ac95e68b3a8af
+text: 66517 7184 110 110 104b6ff5a69cdd523a58c37d9cb2e594
 cpath: /
 copyroot: 0 /

I think that first bit is an incorrect offset into the revfile for
r66500, which we don't have here, since apparently it matches
perfectly in both sides. The second bit is a checksum for the first
bit, it's only different because that number is off. At least, I
think that's what's happening here. If the problem is reproducible,
I'd love to know what happens if you do a load up to r66517, stop,
correct the contents of r66517 manually, then continue to load the
subsequent revisions.

This doesn't, of course, give us any clue WHY that rev is different,
but it's an interesting starting point, and would prove my theory that
everything else is just a side effect of that initial error.

If anyone else with more experience in debugging fsfs problems wants
to jump in, I'd appreciate it. This is really the first time I've
tried to do this sort of thing...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Aug 29 20:30:45 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.