[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS breakages

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2005-11-28 00:11:04 CET

On Tue, Nov 22, 2005 at 06:33:34PM -0500, John Szakmeister wrote:
> [lots of words about a bug in FSFS]

Ok, there's quite a lot of information there.

I've a couple of thoughts, some of which are probably obvious:

* The fact that a lot of the problems were on Subversion 1.2.1, Redhat,
and mod_dav_svn may be more reflective of the usage of those things in
general, rather than being an indicator of any possible correlation with
this problem.
 - Just to eliminate the possibility, have you any idea whether the
 RedHat/Fedora Subversion packages contain any local patches?

* I'm not sure I fully understand the problem yet, but I can't see how
failing hardware could necessarily have caused us to fail in the way I
think you're indicating.

* There don't appear to have been any major changes to FSFS since 1.2.0,
so this bug, if it is in Subversion itself, probably still exists.

* You mentioned that you managed to get hold of the repository in some
cases. Did you try re-running the transaction to see if the problem
was reproducible or not? (in the cases where you were able to restore
the file's contents, naturally).

* You mentioned that it was only delta representations for binary files
that were affected. Were they particularly large? Were the original
files compressed? (If they were compressed, the self-compressed deltas
would tend to be close to the size of the file, I suspect - and depending
upon the change, regular deltas might be as well)
 - I wonder if the fact that it was binary files was just due to the
 fact that, proportionally, they'd be more likely to take up most of
 the space in a rev file.

To the specific problem:
> * In every case there was an extra block of data present in the svndiff. In
> one case, it appeared that the extra data was actually a repeat of block
> elsewhere in the stream.
> * In every case the actual svndiff contents were fine (there were no bad
> instructions). The windows themselves seemed to be complete.
> * In every case, all other offsets within the file pointed exactly where they
> should (meaning that somehow the data was there when we wrote the revision
> out).
> * In one case, I was actually able to recover the contents of the file
> completely (the very start of the svndiff stream was there).
>

I'm not _quite_ sure I get exactly what the problem was. When you say
that there was an extra block of data in the svndiff, do you mean in the
svndiff itself, or in the representation in the rev file? In other words,
was it the DELTA-ENDREP that was corrupt (containing a valid svndiff
and something else), or was it the svndiff itself that was corrupt
(containing garbage after the used 'new data', or similar).

Where was the extra block of data? Before, after, or inside the correct
data? Did it overwrite any valid data, or was it just 'extra'? (I
guess if you couldn't recover the representation in all the cases,
it was an overwrite?)

In one case, the extra data was a repeat (of what kind of data?).
What was it in the other cases?

In addition to the extra data, what other problems did you see? I think
you mentioned that the node-rev had a 'text' <offset> that pointed in
the wrong place? What did it point to, if anything?

I don't understand 'all other offsets pointed where they should' -
what other offsets are you referring to? (that would indicate that the
data was written correctly originally).

I've spent a while tonight taking a look at the FSFS write process --
and it looks pretty straightforward. Particularly, I can't see how
'DELTA' can be immediately followed by anything other than the start of
an svndiff stream, nor how the offsets in 'text:' can be anything other
than to what we've already written.

The only thing I can really think of is that we're either corrupting one
of the structures in memory - a pool lifetime issue, maybe? - or that
we're corrupting the file when we rewrite it, at transaction commit time.
Neither looks particularly likely.

> I have notes if you want to see them. :-)
>

Ok, yes. It might be a waste of time (we might not be able to work out
what broke), but then again, it might not.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 28 00:11:44 2005

This is an archived mail posted to the Subversion Dev mailing list.