[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Experiments with FlushFileBuffers on Windows

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Tue, 16 Jun 2015 21:57:41 +0200

Hey there,

One of the links recently provided by Daniel Klima pointed
to a way to enable write caching even on USB devices.
So, I could use my Windows installation for experiments now
without the risk of brick-ing 2 grand worth of disks by pulling
the plug tens of times.

-- Stefan^2.

FlushFileBuffers operates on whole files, not just the parts
written through the respective handle. Not calling it after rename
results in potential data loss. Calling it after rename eliminates
the problem at least in most cases.

I used the attached program to conduct 3 different experiments,
each trying a specific modification / fsync sequence. All would
write to an USB stick which had OS write cache enabled for it
in Windows 7.

All tests run an unlimited number of iterations - until there is an
I/O error (e.g. caused by disconnecting out the drive). For each
run, separate files and different file contents will being written
("run number xyz", repeated many times). So, we can determine
which file contents is complete and correct and whether all files
are present. Each successful iteration is logged to the console.
We expect the data for all these to be complete.

The stick got yanked out at a random point in time, reconnected
after about a minute, chkdsk /f run on it and then the program
output would be compared with the USB stick's content.

Experiment 1: fsync a file written through a different handle.
Write the same contents to two files, write the same contents
100x alternating between the two files. Both files are the same
size >1MB and should be similarly "important" to the OS.
Close both files. Re-open the one written last and fsync it.
This re-open scenario is similar to what we do with the protorev

* 10 runs were made, between 17 and 84 iterations each.
* 10x, the fsync'ed file and its contents has been complete
* 10x, the non-synced files were present and showed the
  correct file size. The contents of the last few of them were
  NUL bytes.

Re-opening a file and fsync'ing it flushes *all* content changes
for that file - at least on Windows. The way we handle the
protorev file is correct.

Experiment 2: fsync before but not after rename
This mimics the core of our "move-in-place" logic: Write a
small-ish file (here: 10 .. 20k to not get folded into the MFT)
with some temporary name, fsync and close it. Rename to
its final name in the same folder.

* 5 runs were made, between 182 and 435 iterations each.
* 1x the final file existed with the correct contents
* 3x the file .temp file existed for the last completed iteration.
* 1x even the final file for the previous iteration contained
  NULs. After that run, chkdsk reported and fixed a large
  number of issues.

Not fsync'ing after rename will lead to data loss even with
NTFS. IOW, we don't have transactional guarantees for
"commit" on Windows servers at the moment.

The last case with the more severe corruption may be due
to the storage device not handling its buffers correctly.
The only thing we can do here is tell people to use battery-
backed storage.

Experiment 3: fsync before but *and* after rename
Same as above but re-open the file after rename and fsync it.

* 10 runs were made, between 127 and 1984 iterations each.
* 7x the final file existed with the correct contents
* 1x the next temp already existed with size 0
  (this is also a correct state; the last complete iteration's
   final file existed with the correct contents)
* 1x the next temp already existed with correct contents
  (correct, same as before)
* 1x the last final file was missing, there was no temp file
  and the previous final file contained invalid data. After
  that run, there were various issues fixed by chkdsk.
  It was also the run with the most iterations.

In 90% of the runs, fsync'ing after rename resulted in
correct disk contents. This is much better than the results
in Experiment 2. The remainder may be due to limitations
of the storage device and has been observed in Exp. 2
as well.

Received on 2015-06-16 21:57:46 CEST

This is an archived mail posted to the Subversion Dev mailing list.