On Sun, Jun 09, 2002 at 05:22:10PM -0400, Greg Hudson wrote:
> I didn't mean to imply that rename() was asynchronous when I said it was
> failure-atomic. But apparently the claim was still a bit too strong.
> > Also, some of the more async Unix filesystems don't even guarantee
> > that relaxed form of failure atomicity
> You mean that a failure could leave the target directory entry
> nonexistent? I'm curious which filesystems have this property. (I
> assume you're not counting things like FFS with the async flag set; such
> combinations are documented with big red flags as being unsafe.)
That's exactly the situation (and likewise, on Linux, ext2 in async
mode). Sync IO is *really* expensive. Fantastically expensive. It
slows down many common fs operations by a factor of between ten and
Fortunately, these days most Unixen have either journaled filesystems,
or something like soft updates, so we have the performance of async
writeback with the safety guarantees of sync IO.
However, this is *still* a red herring. The rename safety is largely
irrelevant. There's no benefit from renaming a file synchronously if
the file contents themselves are not safe on disk! You simply must
have some application control on IO syncing or you are not safe. And
with journaled or soft-update filesystems, the renames are safe, but
there are no guarantees when they will happen, so the syncing has to
apply to directories as well as to files.
> > but you can always simulate
> > that by using a combination of link(2) and fsync(2).
> If I'm not mistaken, fsync()ing a directory is a very Linux-specific
> concept. So if there are any non-Linux filesystems which don't provide
> the flaxed failure atomicity guarantee, I'm not sure how one would
> simulate it.
BSD soft updates is async but safe, and you'll still need some form of
fsync to make that safe.
> (Also, link() fails when the target exists. How would you use it to
> emulate relaxed-failure-atomic rename?)
It depends on exactly what end results you are wanting. If you are
going to be using link() and unlink() to get the end results, you'll
often choose a series of ops which does not precisely match any
sequence of renames.
> > > (Some of these guarantees degrade or disappear for remote filesystems,
> > > particularly some vintages of NFS. I don't know all the details
> > > there.)
> > I can help on the details if you want.
> Is rename() any less concurrency-atomic or failure-atomic than with
> local filesystems?
Hah. NFS is a nightmare.
The fundamental problem on NFS is that if the server power-cycles
after completing an op but before sending the ack, then the client
will resend the request and the server may try to do it twice.
This can result in (for example) deletes succeeding but returning
ENOENT, or renames being applied twice. The types of failure modes
you get when the typical production mix of buggy clients and servers
get together is even more painful!
> > Most NFS systems will offer
> > decent fcntl() locking these days, for example, and there are already
> > libraries in existance (used for things like mailbox locking) to
> > emulate NFS locking via atomic creation of temporary lock files on
> > those systems where fcntl() does not work.
> I'm still not sure how to create a lock in older NFS implementations, if
> it's even possible. fcntl() locking will work in some implementations
> but not others. open() with O_EXCL won't work in NFSv2. (Lots of lore
> on the net to corroborate this.) A google search yields suggests:
NFS does not specify locking, but there are complementary protocols
which exist alongside NFS for locking and which are pretty universally
implemented these days.
> * Create a temporary file with a unique name.
> * link() it to the lock file name.
> * Check to make sure your temporary file has a link count of 2.
> I'm not sure why this would work any better than open() with O_EXCL,
Because O_EXCL only helps you if you try to apply it to the shared
lock file. The point of the above recipe is to give you the added
protection of a unique temporary name (chosen randomly) so that you
don't have collisions on the initial create at all.
> unless link() is atomic (in which case why bother with the third
The third step tells you if the link succeeded, even if the link
request reported failure due to the NFS repeated-application scenario
above. Link is still atomic.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Mon Jun 10 11:42:30 2002