[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Efficient and effective fsync during commit

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Fri, 29 May 2015 17:55:20 +0200

On Fri, May 29, 2015 at 4:14 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> On 28 May 2015 at 20:47, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> wrote:
>> Hi all,
>>
>> Most of us would agree that way we fsync FS changes
>> in FSFS and FSX is slow (~10 commits / sec on a SSD,
>> YMMV) and not even all changes are fully fsync'ed
>> (repo creation, upgrade).
>>
> The first question is it really a problem?

Recently, we had customers wondering why their servers
wouldn't serve more than 20 commits/s (even on enterprise
SSDs and with various OS file system tuning options).
With QA bots constantly creating snapshots and tags,
there isn't too much head room anymore.

> I mean that usually commits
> are not that often. They are maintenance tasks like 'svnadmin load'
> that perform commits very often, but it could be fixed with
> '--fsfs-no-sync' option to 'svnadmin load' like we had for BDB.

That would be a workable approach. Adding a bunch
of if statements. It would not help for svnsync, though.

>> From a high-level perspective, a commit is are simple
>> 3-step process:
>>
>> 1. Write rev contents & props to their final location.
>> They are not accessible until 'current' gets bumped.
>> Write a the new 'current' temporary contents to a temp file.
>> 2. Fsync everything we wrote in step 1 to disk.
>> Still not visible to any other FS API user.
>> 3. Atomically switch to the new 'current' contents and
>> fsync that change.
>>
>> Today, we fsync "locally" as part of whatever copy or
>> move operation we need. That is inefficient because
>> on POSIX platforms, we tend to fsync the same folder
>> more than once, on Windows we would need to sync
>> the files twice (content before and metadata after the
>> rename). Basically, missing the context info, we need
>> to play it safe and do much more work than we would
>> actually have to.
>>
>> In the future, we should implement step 1 as simple
>> non-fsync'ing file operations. Then explicitly sync every
>> file, and on POSIX the folders, once. Step 2 does not
>> have any atomicity requirements. Finally, do the 'current'
>> rename. This also only requires a single fsync b/c
>> the temp file will be in the same folder.
>>
>> On top of that, all operations in step 2 can be run
>> concurrently. I did that for FSX on Linux using aio_fsync
>> and it got 3x as fast. Windows can do something similar.
>> I wrapped that functionality into a "batch_fsync" object
>> with a few methods on it. You simply push paths into it,
>> it drops duplicates, and finally you ask it to fsync all.
>>
> I didn't find any documentation that calling FlushFileBuffers() on one
> handle flushes changes (data and metadata) made using other handle.
> I'm -1 to rely on this without official documentation proof. At least
> for FSFS.

If you assume / suspect that FlushFileBuffers() only
operates on the open handle, i.e. only flushes those
changes made through that thandle, then you assume
that our commit process is seriously broken:

For every PUT, we open the protorev file, append the
respective txdelta and close the file again. Since the
final flush uses yet another handle, this implies that
most of the revision data in each rev file does not get
fsync'ed and may be lost upon power failure.

You might be right. So, if you care about repository
integrity, you should use your MSDN subscription and
ask MS for clarification on FlushFileBuffers() behaviour.
Things we would like to know:

* Does FlushFileBuffers() also flush changes made to
  the same file through different handles? For simplification
  we may assume those other handles got closed and
  were owned by the same process.

* Is calling FlushFileBuffers() on the target of a rename /
  move sufficient to flush all metadata? Does it also
  flush outstanding file content changes?

* Is there a way to efficiently flush multiple files, e.g.
  through something like overlapped I/O?

* Does passing the FILE_FLAG_WRITE_THROUGH and
  FILE_FLAG_NO_BUFFERING flags to CreateFile()
  guarantee that all contents has been stored on disk
  when CloseHandle() returns? (Assuming the HW does
  not lie about its write buffers).

Disclaimer: My understanding of the fsync behaviour
on Windows is based on conjecture, gathered from the
few pieces of information that I could find online. I'm
happy to change my mind once new evidence shows
up. Right now, our implementation seems to be wasteful
and possibly incomplete - which is worse. I would love
to fix both for 1.10.

-- Stefan^2.
Received on 2015-05-29 17:57:09 CEST

This is an archived mail posted to the Subversion Dev mailing list.