On Sun, Mar 04, 2007 at 10:06:00PM -0800, Chris Frost wrote:
> In examining subversion's wc code I have noticed that it atomically creates
> files, including the data, using a create foo_tmp, write foo_tmp, and finally
> rename foo_tmp to foo. This is a pretty cool technique.
> However, is it true that this only works (that is, guarantees the data will
> precede rename to disk) for file systems with journal orderings like ext3's
> ordered journal? (Where data is written prior to related journaled metadata.)
> Popular systems that do not ensure this property include *BSD UFS
> (soft updates do not impose such an ordering rule afaik) and WinNT NTFS
> (metadata may precede data).
If you're asking whether, in the normal course of events, a userspace
program would be in a position to observe the rename but not be able to
read the data, the answer's no. If you're asking whether a crash at the
right time could leave you with a file that doesn't contain all the data,
then the answer's yes, as far as I'm aware.
In the FSFS filesystem implementation, we do something similar, but before
the rename we fsync() the file to ensure the data is flushed to disk,
and, on Linux, we follow up the rename by fsync-ing the target directory,
which flushes the metadata to disk. (As far as I'm aware, other operating
systems don't provide any means to request the metadata be flushed, or
don't need to).
FSFS needs this behaviour to guarantee durability, but it only uses it
at a few critical points. The working copy library doesn't provide the
same guarantees, and would probably be a lot slower if it tried to.
Received on Mon Mar 5 07:23:41 2007
- application/pgp-signature attachment: stored