RE: [RFC] Conceptual locking procedure for database access (was: Re: Subversion's use of Berkeley DB [#11511])

From: Oliver Klozoff <stevieoh_at_fastmail.fm>
Date: 2004-12-14 18:14:54 CET

My Lord, Outlook sucks. It's set for 76-column formatting, but it doesn't
enforce that in the window -- it just hardwraps lines after you hit 'send'.

Take #2:

----------------------------------------------------------------------------
I've been following the thread titled 'Subversion's use of Berkeley DB
[#11511]', and the problem seems to be: We have no easy way to determine if
we need to perform recovery, and the only solution we have so far is to run
recovery every time. Of course, this sucks, because it means two people
can't perform a checkout at the same time (while one person is checking out,
the other person's process is waiting for an exclusive lock to perform
recovery). This is really bad for a public repository where a lot of people
can potentially be doing checkouts (e.g. svn.collab.net).

I have mapped out a potential locking procedure that would make it possible
to detect if any process exited uncleanly, and determine if recovery needs
to be run on the database. It should be implementable using POSIX flock()
(meaning that it should work on Win32). It should also be implementable
using fcntl locking, which is NFS-friendly, but Win32-hostile.

The method, for any who are curious, is based partially on how ext2/ext3
determines if the filesystem was not cleanly unmounted previously: when
mounted, a 'needs_recovery' flag is set in the superblock. When cleanly
unmounted, the flag is cleared. If the flag is set, the filesystem wasn't
cleanly umounted :)

Would you please offer your opinions on:

====================================================

A Reliable Way To Determine If Recovery Is Necessary
(without needing a daemon, or other such things)

new files in the repository directory:
'dirty' and 'dirty-lock'

By the way, when I say 'get' or 'wait for' a lock, I mean *wait* for the
lock (indefinitely). When I say 'try [to get]' a lock, I mean using LOCK_NB
so that we don't block if we can't get the lock.

Startup procedure:
-> wait for lock_ex on 'dirty-lock'
-> try lock_ex on 'dirty'
If locked, read file. If you read a '1', perform recovery.
Otherwise, write a '1'.
-> Get (or convert lock_ex to) lock_sh on 'dirty'.
-> Release lock_ex on 'dirty-lock'.

Shutdown procedure:
-> wait for lock_ex on 'dirty-lock'
-> try to convert (with LOCK_NB) the lock_sh on 'dirty' to lock_ex
If succeeded, write '0' to 'dirty'.
-> Release lock on 'dirty'
-> Release lock on 'dirty-lock'.

How it works:

dirty-lock is required to prevent race conditions of processes exiting
between us trying to get LOCK_EX and then waiting for LOCK_SH.

Startup:

1. Get a lock_ex on 'dirty': flock(fd, LOCK_EX|LOCK_NB)
If this succeeds, we are (at present) the only user who wants to use the
database. If someone else was using the database, their LOCK_SH would
prevent us from getting LOCK_EX.

2. Check for a '1'.
The procedure is designed so that, unless the database is properly closed
and the locks released, the 'dirty' file will contain the value '1'.

3. Write a '1'.
We write this out to indicate to future processes that the database may
be in an inconsitent state. Remember, we only do this if we have a LOCK_EX
on dirty -- if we are the only process who can access the database right
now.

NOTE: This must be done even if we're not going to write to the database
(i.e. this is a checkout, or update, or something). This is because, as the
first process, we are the only ones with write access to 'dirty' -- we have
to mark the database dirty in case a writing process starts while we're
running. A tad inefficient, yes; but to prevent corruption you have to be
safe, not sorry.

4. Convert our lock to LOCK_SH (or get one, if step 1 failed).
This actually acts like a OS-managed semaphore. Holding LOCK_SH prevents
other processes from getting LOCK_EX in step 1, so they know another process
exists (and since that process would have run recovery, they don't need to.)

Shutdown:

1. Try to convert our LOCK_SH on 'dirty' to LOCK_EX.
If this succeeds, then nobody else has a lock on 'dirty' -- which means we
are the last process to exit, that's using the file. We write a '0' out to
indicate that the database has been properly closed.
(If it fails, then somebody else is still accessing the database.)

2. Finally, release our lock on the 'dirty' file.

If, before shutdown completes cleanly, our process gets hosed and crashes
out, the following things will happen:
-> The locks are managed by the OS and will be released when it exits, so
  'dirty' and 'dirty-lock' can only be wedged if the process gets stuck in
   an infinite loop (which will create other issues).
-> The 'dirty' file will remain a '1'. Thus, the next process to access the
   database will see that, and know to run recovery.

The only problem I see with this solution is if one process can crash out
and screw up the database at the same time as another process is still
running; when the other process exits, it will (incorrectly) mark the
database as clean. If needed, this could be mitigated by having adding the
following steps:

-> after acquiring dirty_lock at startup, read the number from it, and write
(that number + 1).
-> after acquiring dirty_lock at exit, read the number from it, and write
(that number - 1).

Then, at startup:
-> If we get a LOCK_EX on dirty (we are the first process), and the number
in the dirty_lock file is NOT equal to 1, run recovery.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 14 18:16:32 2004

This message: [ Message body ]
Next message: Ben Collins-Sussman: "per-path authz and locking"
Previous message: C. Michael Pilato: "Re: [PATCH] svnlook --revprop patch, take two"
In reply to: Oliver Klozoff: "[RFC] Conceptual locking procedure for database access (was: Re: Subversion's use of Berkeley DB [#11511])"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]