[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS breakages

From: John Szakmeister <john_at_szakmeister.net>
Date: 2005-11-23 00:33:34 CET

On Tuesday 22 November 2005 11:19, Malcolm Rowe wrote:
> On Tue, Nov 22, 2005 at 10:18:36AM -0500, John Szakmeister wrote:
> > man-made mistake. In the wild, the FSFS corruptions that I have seen
> > have had broken svndiffs, or the text rep was pointing to the wrong
> > location, or a mix of both.
>
> Out of interest, do we have any idea why these problems are occurring?
> (Particular filesystems, distributions, versions of Subversion?).
> Is it something that a broken client can cause?

It's not clear whether a broken client can induce the problem or not. I've
helped 3 people on the users list recover their repository, 1 other fixed it
himself, and one other mentioned a similar issue but didn't have the old
broken one around any more to examine. In one case I got access to the
entire repository. In another case, I was only given access to the broken
revision. In the 3rd case, we used my tool to locate the problem, and the
person patched it by hand. In the 4 cases, I was able to inquire about their
setup, and I also obtained logs of information about the broken revision
using my tool (it basically pulls out the various metadata, and decodes the
svndiff window, but it doesn't have any actual data in the output so it made
people more comfortable when having to deal with proprietary data). Here's
what I've seen so far:
 * At least 4 instances involved Subversion 1.2.1 on the server-side
 * Most were using some form of Redhat or Fedora Core
 * All were using mod_dav_svn in the backend
 * In every case it was a delta being stored (whether it be self-compressed
   or against another revision).
 * In every case the text rep offset was pointing after the expected start
   of the representation.
 * In every case there was an extra block of data present in the svndiff. In
   one case, it appeared that the extra data was actually a repeat of block
   elsewhere in the stream.
 * In every case the actual svndiff contents were fine (there were no bad
   instructions). The windows themselves seemed to be complete.
 * In every case, there was only one file that exhibited this corruption, all
   other files were fine.
 * In every case, all other offsets within the file pointed exactly where they
   should (meaning that somehow the data was there when we wrote the revision
   out).
 * In every case, it was a file that was affected. Moreover, it was a binary
   file that was affected.
 * In one case, I was actually able to recover the contents of the file
   completely (the very start of the svndiff stream was there).

> One of FSFS's benefits is its write-only-ness. That should make it harder
> for corruption to occur, but it doesn't help if we're we're writing a
> corrupt revision in the first place!

I think you mean "read-only-ness", but I definitely agree. I think there is
something subtly wrong, but I haven't been able to turn up the culprit.
Having the extra data block but all the offset turn out alright implies (to
me anyways) that we did something wrong.

I will say that in 2 cases, they discovered failing hardware (the hard drive
in both cases). However, I'd expect the corruption to be more devastating,
if that was truly the root cause of the problem. Also, most everyone only
had one such corruption, except for the last guy that I helped who had 3 such
problems.

I have notes if you want to see them. :-)

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Nov 23 00:34:26 2005

This is an archived mail posted to the Subversion Dev mailing list.