Re: [PATCH] Fixing the FSFS corruption bug, again

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2006-10-02 23:54:58 CEST

On Mon, Oct 02, 2006 at 05:24:38PM -0400, Garrett Rooney wrote:
> This seems to work just fine on OS X and Linux, FWIW, but there's at
> least one thing that worries me. The locking scheme used for managing
> the per-txn stuff really could use some commenting, since I've been
> staring at it for at least half an hour now trying to make sure I've
> got it right.
>

Heh. As I mentioned on IRC, that was part of the code I'd thought I'd
_over_ commented :-)

The relevant comments are in libsvn_fs_fs/fs.h, where the structure is
defined:

  /* A list of shared transaction objects for each transaction that is
     currently active, or NULL if none are. All access to this list,
     including the contents of the objects stored in it, is synchronised
     under TXN_LIST_LOCK. */
  fs_fs_shared_txn_data_t *txns;

I thought that was pretty clear, but I'm probably too close to the
problem; can anyone else suggest a better wording to that or any of the
other comments?

> Just to be clear, there's a single per-filesystem mutex that's used
> within the with_txnlist_lock function to serialize access to both the
> list of transactions and some of the data within the transactions. We
> grab that lock, grab the flock, then note "hey, i've got it locked",
> then drop the mutex. Later on when someone else tries to modify the
> txn they'll be able to lock the per-filesystem mutex, but at that
> point they'll see that someone else is modifying the txn, and error
> out before they try to grab the flock? Other processes will simply
> block when trying to flock.
>

s/some of the data/all of the data in that structure/, but otherwise,
that's entirely correct. Also note that 'access' in the comment above
means reading as well as writing.

> I guess the part that confused me was how we've got a single per-fs
> mutex, but per-txn lock files. In retrospect it makes sense, but it
> was a bit weird at first. Other than that, this all seems fine, and I
> think the confusion could be taken care of by some comments that sum
> up the overall locking strategy, especially the fact that the
> txn_list_lock ends up protecting both the list and the being_written
> field.
>

Well, the way I was thinking was:

- there's ->txn_list_lock, which is a mutex that protects all access to
  the txns list (and also the free-txn member).
- there's txn->being_written, which is the intra-process lock on the
  transaction. It doesn't have to be a heavyweight mutex, since we
  serialise all access to the list.
- finally, there's the transaction lockfiles themselves, which we
  lock if it's safe to attempt to do so.

> Oh, and you've also got some mega-long lines in there that need to be
> wrapped, but I'm sure you would have taken care of that before
> committing.
>

Yes, I was going to ask: is it safe to split translated messages?

_("That is, "
"like this?")

I was worried that doing so might break the translation tools.

My only concern with the approach I've taken is the possibility of
accreting a collection of transactions that have neither been committed
nor aborted; these would lie around in memory in the same way they'd lie
around on disk. On the other hand, that shouldn't be that common a
problem, should it?

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Oct 2 23:55:11 2006

This message: [ Message body ]
Next message: Lieven Govaerts: "Re: [PATCH] fix for issue 2468"
Previous message: C. Michael Pilato: "Re: svn commit: r21736 - in trunk/subversion: svnadmin tests/cmdline"
In reply to: Garrett Rooney: "Re: [PATCH] Fixing the FSFS corruption bug, again"
Next in thread: Malcolm Rowe: "Re: [PATCH] Fixing the FSFS corruption bug, again"
Reply: Malcolm Rowe: "Re: [PATCH] Fixing the FSFS corruption bug, again"
Reply: Garrett Rooney: "Re: [PATCH] Fixing the FSFS corruption bug, again"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]