[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS Issue...

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2006-05-30 03:21:48 CEST


I've copied this to the list. I hope you don't mind; it might be good
to get some other input, and it may also spur further discussion.

On Mon, May 29, 2006 at 10:22:11PM +0530, Ranjit Kenaudekar wrote:
> John mentioned your name saying that you are/were working on this
> corruption issue. Could you please let me know the status on this? Is it
> resolved in 1.3.1? If not resolved, what is the timeframe you are
> looking to fix it?

[ For those not following this problem, John Szakmeister and I discovered
(http://svn.haxx.se/dev/archive-2005-12/0159.shtml et seq, plus some
private email) a way in which FSFS revisions can get corrupted when first
written, if a representation write fails and is retried. Each write uses
a different APR file handle, individually buffered, and so when the first
file handle is closed, any remaining buffered data overwrites the start
of the second rep-write. The root cause of the problem is unknown. ]

The situation hasn't moved anywhere since the original messages back in
Decemeber, I'm afraid. It's on the list of things I'd really like to
fix, but the fact that it's hard to fix coupled with the fact that we
could never track down the real cause or a reproduction means that it's
not gone anywhere.

> We actually have seen the repository corruption number of times now (5
> times till now) and want to get over with this problem ASAP.

One thing you could check is the version of APR you're using. So far,
we only have reports of this problem occurring with APR < 0.9.7, and I did
hypothesize at the time that this was possibly due to (or excarberated by)
the fact that APR 0.9.6 and below don't report write errors on buffered
files at all. Of course, APR 0.9.7 was still fairly new then, so it
might also just be a coincidence. Then again, John mentions that he's
not seen the problem as much recently, so perhaps not?

The other things we'd noted are that this seemed to only be a problem for
mod_dav_svn, and that it seemed like it might be related to problems
writing to the disk. Again, our sample size is probably too small to
draw conclusions, but if you can run some tests on the disk, that might
not be a bad idea.

As far as fixing it goes, the right fix is probably to only use a single
file handle per revprop file, open for the lifetime of the transaction
(or transaction root, actually).

This probably depends upon someone writing some mechanism to associate
open FSFS transactions with arbitrary user-data (in this case, the file
handle to use for the revprop file).

I originally posted a conceptual patch that added the information as
library-private data associated with a transaction object (I think),
but this was rejected because it didn't prevent the problem (or may have
caused problems, I forget) if the same transaction was opened via two
different filesystem (svn_fs_t) objects. (There were some other reasons
as well, see http://svn.haxx.se/dev/archive-2005-12/0571.shtml.)

One possible solution that I was thinking about a week or so ago might
be to create a map of { FS GUID, TXN ID => { stuff } }, but I'm concerned
about whether we could race between creating the transaction (essentially
an on-disk operation) and creating an entry in the map (purely a memory
operation). Also, the map would have to maintain a reference-count of
open transaction roots so that the handles were closed when the last
reference was released.

Brane committed some code fairly recently (for BDB 4.4 support) that
looked like it did something similar, and I've been meaning to take a
look at that. But it definitely wasn't trivial.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue May 30 03:22:33 2006

This is an archived mail posted to the Subversion Dev mailing list.