Karl Fogel <email@example.com> writes:
> Brian Behlendorf <firstname.lastname@example.org> writes:
> > > Yes. You'll need to run it on a local disk.
> Do we definitely need Berkeley DB's transaction guarantees? I mean,
> those low-level transactions make things more convenient for the
> filesystem implementation, but they're not essential (if things work
> the way I think?).
> What I don't know is how *much* more convenient... :-) Jim?
I used to think it would be easy to squeak by without the usual
database-y kinds of facilities. Now, though, that seems really
Transactions are to multiple-reader, multiple-writer databases what
mutexes and condition variables are to multi-theaded programming. You
can always get by with weaker promises from the underlying medium (for
databases, the filesystem; for multi-threaded programming, the
instruction set and your memory/cache system), but it's a pain. And
in fact, the first thing you'd probably do is implement yourself some
mutexes and condition variables. :)
If you want recoverability, you pretty much end up implementing
something like what Berkeley DB has. If someone can suggest an
alternative to Berkeley DB that provides transactions and
recoverability, but also works over NFS, then that would be great. At
this point, it wouldn't be much trouble to switch media.
I think Greg Stein mentioned using mySQL instead of Berkeley DB. I
don't know a lot about mySQL, but my understanding was that it didn't
really care about recoverability much.
Recoverability and quick locking (i.e., no 30-second waits) were
things I was very much looking forward to getting right. I don't care
how those things happen, but cutting those corners would be extremely
disappointing. Fortunately, I don't think anyone's suggesting that.
Here's the manual's bit on Berkeley DB vs. NFS. There are only
certain files that need to be on a local disk. The bulk of the data
can live on NFS-mounted filesystems, although the system will run more
slowly. Note especially the bit at the end about Linux.
- "home directory" refers to the directory containing the various Unix files
that make up the DB.
- "regions" are the shared memory regions used as cache and, on
certain architectures, for locking.
- an "environment" is a group of database tables that can be protected
with a single transactions. The entire Subversion filesystem lives in
a bunch of tables within a single environment.
Berkeley DB Reference Guide: [Prev] [Ref][Next]
When regions are backed by the filesystem, it is a
common error to attempt to create Berkeley DB environments backed by
remote file systems such as the Network File System (NFS) or the
Andrew File System (AFS). Remote filesystems rarely support mapping
files into process memory, and even more rarely support correct
semantics for mutexes after the attempt succeeds. For this reason, we
strongly recommend that the database environment directory reside in a
For remote file systems that do allow system files to be mapped into process
memory, home directories accessed via remote file systems cannot be used
simultaneously from multiple clients. None of the commercial remote file
systems available today implement coherent, distributed shared memory for
remote-mounted files. As a result, different machines will see different
versions of these shared regions and the system behavior is undefined.
Databases, log files and temporary files may be placed on remote
filesystems, *as long as the remote filesystem fully supports standard
POSIX filesystem semantics*, although the application may incur a
performance penalty for doing so. Obviously, NFS-mounted databases
cannot be accessed from more than one Berkeley DB environment (and
therefore from more than one system), at a time since no Berkeley DB
database may be accessed from more than one Berkeley DB environment at
Some Linux releases are known to not support complete semantics for the
POSIX fsync call on NFS-mounted filesystems. No Berkeley DB files
should be placed on NFS-mounted filesystems on these systems.
Copyright Sleepycat Software
Received on Sat Oct 21 14:36:08 2006