To abort a Subversion txn (transaction) involves the following steps:
1. Read the txn, making sure it hasn't been committed.
2. Delete the mutable nodes under that txn.
3. Delete any 'copies' table records created in the txn.
4. Delete any 'changes' table records created in the txn.
5. Remove the txn itself from the 'transactions' table.
Today, this is all done in a single Berkeley transaction (or "trail",
if you are familiar with the libsvn_fs code).
The problem with doing all this junk in one trail is that any locks
taken out for the purposes of removing records are held until the
until trail completes. Given a big enough commit which happens to
fail, the cleanup of the failed commit could itself exhaust the
Berkeley DB lock allocation and wedge the repository.
Now, it would be simple to break up the steps of the cleanup into
various trails, but now we introduce timing problems. If some part of
the cleanup fails, what do we want to be left with? A txn that's
missing some or all of its 'copies' records? A txn that's missing
some or all of its 'changes' records? Or what about no txn at all,
but dangling 'copies', 'nodes', and 'changes' table records? The
cosmetic danger is that we might lose the chance to fully cleanup the
txn. But the far more dangerous aspect of breaking this up into
various trails is that, in theory, someone might come along and decide
to start working on this txn again, possible even committing it.
I need not speak of the bad things that could happen if someone was
able to successfully commit a txn that was missing some parts of its
record-keeping data whose absence wouldn't be noted in the commit_txn
To handle this cleanup process correctly, it seems we need a new state
for transactions, to denote that they are "dead". The first action of
svn_fs_abort_txn() is to set the transaction state to "dead", and then
call a new public function svn_fs_cleanup_txn (which takes a
transaction name, not an svn_fs_txn_t object). You are *never*
allowed to open a dead transaction -- you may only call
svn_fs_cleanup_txn() on it.
The new svn_fs_cleanup_txn() would always resume cleanup from wherever
it left off, and always leave enough pointers around so that if
something goes wrong, the cleaup attempt can be re-tried/continued
later. In other words, it leaves no dangling nodes, doesn't lose
references to copies before the copies themselves are gone, etc.
But then again, this could all just be overkill. Maybe it's enough to make
svn_fs_abort_txn() just delete the transaction first, and then go
about cleaning up the other stuff. If that fails, we get back an
error and there is just unreachable cruft in the filesystem that sits
around until the next dump/load cycle.
Thoughts (quickly, please)?
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Tue Nov 11 03:59:03 2003