[RFC] Conceptual locking procedure for database access (was: Re: Subversion's use of Berkeley DB [#11511])

From: Oliver Klozoff <stevieoh_at_fastmail.fm>
Date: 2004-12-14 17:57:35 CET

I've been following the thread titled 'Subversion's use of Berkeley DB
[#11511]',
and the problem seems to be: We have no easy way to determine if we need
to perform recovery, and the only solution we have so far is to run recovery
every time. Of course, this sucks, because it means two people can't
perform
a checkout at the same time (while one person is checking out, the other
person's
process is waiting for an exclusive lock to perform recovery). This is
really
bad for a public repository where a lot of people can potentially be doing
checkouts (e.g. svn.collab.net).

I have mapped out a potential locking procedure that would make it possible
to
detect if any process exited uncleanly, and determine if recovery needs to
be
run on the database. It should be implementable using POSIX flock()
(meaning
that it should work on Win32). It should also be implementable using fcntl
locking, which is NFS-friendly, but Win32-hostile.

The method, for any who are curious, is based partially on how ext2/ext3
determines if the filesystem was not cleanly unmounted previously: when
mounted, a 'needs_recovery' flag is set in the superblock. When cleanly
unmounted, the flag is cleared. If the flag is set, the filesystem
wasn't cleanly umounted :)

Would you please offer your opinions on:

====================================================

A Reliable Way To Determine If Recovery Is Necessary
(without needing a daemon, or other such things)

new files in the repository directory:
'dirty' and 'dirty-lock'

By the way, when I say 'get' or 'wait for' a lock, I mean *wait* for the
lock
(indefinitely). When I say 'try [to get]' a lock, I mean using LOCK_NB so
that
we don't block if we can't get the lock.

Startup procedure:
-> wait for lock_ex on 'dirty-lock'
-> try lock_ex on 'dirty'
If locked, read file. If you read a '1', perform recovery.
Otherwise, write a '1'.
-> Get (or convert lock_ex to) lock_sh on 'dirty'.
-> Release lock_ex on 'dirty-lock'.

Shutdown procedure:
-> wait for lock_ex on 'dirty-lock'
-> try to convert (with LOCK_NB) the lock_sh on 'dirty' to lock_ex
If succeeded, write '0' to 'dirty'.
-> Release lock on 'dirty'
-> Release lock on 'dirty-lock'.

How it works:

dirty-lock is required to prevent race conditions of processes exiting
between
us trying to get LOCK_EX and then waiting for LOCK_SH.

Startup:

1. Get a lock_ex on 'dirty': flock(fd, LOCK_EX|LOCK_NB)
If this succeeds, we are (at present) the only user who wants to use the
database. If someone else was using the database, their LOCK_SH would
prevent
us from getting LOCK_EX.

2. Check for a '1'.
The procedure is designed so that, unless the database is properly closed
and the locks released, the 'dirty' file will contain the value '1'.

3. Write a '1'.
We write this out to indicate to future processes that the database may
be in an inconsitent state. Remember, we only do this if we have a LOCK_EX
on dirty -- if we are the only process who can access the database right
now.
NOTE: This must be done even if we're not going to write to the database
(i.e.
this is a checkout, or update, or something). This is because, as the first
process, we are the only ones with write access to 'dirty' -- we have to
mark
the database dirty in case a writing process starts while we're running. A
tad inefficient, yes; but to prevent corruption you have to be safe, not
sorry.

4. Convert our lock to LOCK_SH (or get one, if step 1 failed).
This actually acts like a OS-managed semaphore. Holding LOCK_SH
prevents other processes from getting LOCK_EX in step 1, so they know
another
process exists (and since that process would have run recovery, it doesn't
need to.)

Shutdown:

1. Try to convert our LOCK_SH on 'dirty' to LOCK_EX.
If this succeeds, then nobody else has a lock on 'dirty' -- which means
we are the last process to exit, that's using the file. We write a '0'
out to indicate that the database has been properly closed.
(If it fails, then somebody else is still accessing the database.)

2. Finally, release our lock on the 'dirty' file.

If, before shutdown completes cleanly, our process gets hosed and crashes
out,
the following things will happen:
-> The locks are managed by the OS and will be released when it exits,
   so 'dirty' and 'dirty-lock' can only be wedged if the process gets stuck
in
   an infinite loop (which will create other issues).
-> The 'dirty' file will remain a '1'. Thus, the next process to access the
   database will see that, and know to run recovery.

The only problem I see with this solution is if one process can crash out
and
screw up the database at the same time as another process is still running;
when the other process exits, it will (incorrectly) mark the database as
clean.
If needed, this could be mitigated by having adding the following steps:

-> after acquiring dirty_lock at startup, read the number from it, and write
(that number + 1).
-> after acquiring dirty_lock at exit, read the number from it, and write
(that number - 1).

Then, at startup:
-> If we get a LOCK_EX on dirty (we are the first process), and the number
in the dirty_lock file is NOT equal to 1, run recovery.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 14 17:59:52 2004

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: [PATCH] svnlook --revprop patch, take two"
Previous message: C. Michael Pilato: "Re: svn commit: propchange - r12307 - svn:log"
Next in thread: Oliver Klozoff: "RE: [RFC] Conceptual locking procedure for database access (was: Re: Subversion's use of Berkeley DB [#11511])"
Reply: Oliver Klozoff: "RE: [RFC] Conceptual locking procedure for database access (was: Re: Subversion's use of Berkeley DB [#11511])"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]