[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Efficient and effective fsync during commit

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Fri, 29 May 2015 19:23:05 +0300

On 29 May 2015 at 18:55, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> wrote:
> On Fri, May 29, 2015 at 4:14 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>> On 28 May 2015 at 20:47, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> wrote:
>>> Hi all,
>>>
>>> Most of us would agree that way we fsync FS changes
>>> in FSFS and FSX is slow (~10 commits / sec on a SSD,
>>> YMMV) and not even all changes are fully fsync'ed
>>> (repo creation, upgrade).
>>>
>> The first question is it really a problem?
>
> Recently, we had customers wondering why their servers
> wouldn't serve more than 20 commits/s (even on enterprise
> SSDs and with various OS file system tuning options).
> With QA bots constantly creating snapshots and tags,
> there isn't too much head room anymore.
>
Ack. Was it Windows or Linux?

>> I mean that usually commits
>> are not that often. They are maintenance tasks like 'svnadmin load'
>> that perform commits very often, but it could be fixed with
>> '--fsfs-no-sync' option to 'svnadmin load' like we had for BDB.
>
> That would be a workable approach. Adding a bunch
> of if statements.
Something like 5-15..

> It would not help for svnsync, though.
Agree. We could expose this option in fsfs.conf, like we had in BDB
but it would dangerous for production use even with properly power
loss protected hardware.

>>> From a high-level perspective, a commit is are simple
>>> 3-step process:
>>>
>>> 1. Write rev contents & props to their final location.
>>> They are not accessible until 'current' gets bumped.
>>> Write a the new 'current' temporary contents to a temp file.
>>> 2. Fsync everything we wrote in step 1 to disk.
>>> Still not visible to any other FS API user.
>>> 3. Atomically switch to the new 'current' contents and
>>> fsync that change.
>>>
>>> Today, we fsync "locally" as part of whatever copy or
>>> move operation we need. That is inefficient because
>>> on POSIX platforms, we tend to fsync the same folder
>>> more than once, on Windows we would need to sync
>>> the files twice (content before and metadata after the
>>> rename). Basically, missing the context info, we need
>>> to play it safe and do much more work than we would
>>> actually have to.
>>>
>>> In the future, we should implement step 1 as simple
>>> non-fsync'ing file operations. Then explicitly sync every
>>> file, and on POSIX the folders, once. Step 2 does not
>>> have any atomicity requirements. Finally, do the 'current'
>>> rename. This also only requires a single fsync b/c
>>> the temp file will be in the same folder.
>>>
>>> On top of that, all operations in step 2 can be run
>>> concurrently. I did that for FSX on Linux using aio_fsync
>>> and it got 3x as fast. Windows can do something similar.
>>> I wrapped that functionality into a "batch_fsync" object
>>> with a few methods on it. You simply push paths into it,
>>> it drops duplicates, and finally you ask it to fsync all.
>>>
>> I didn't find any documentation that calling FlushFileBuffers() on one
>> handle flushes changes (data and metadata) made using other handle.
>> I'm -1 to rely on this without official documentation proof. At least
>> for FSFS.
>
> If you assume / suspect that FlushFileBuffers() only
> operates on the open handle, i.e. only flushes those
> changes made through that thandle, then you assume
> that our commit process is seriously broken:
>
> For every PUT, we open the protorev file, append the
> respective txdelta and close the file again. Since the
> final flush uses yet another handle, this implies that
> most of the revision data in each rev file does not get
> fsync'ed and may be lost upon power failure.
>
> You might be right. So, if you care about repository
> integrity, you should use your MSDN subscription and
> ask MS for clarification on FlushFileBuffers() behaviour.
You also may request MSDN subscription and ask MS for clarification or
keep Windows code as it was before.

> Things we would like to know:
>
> * Does FlushFileBuffers() also flush changes made to
> the same file through different handles? For simplification
> we may assume those other handles got closed and
> were owned by the same process.
>

> * Is calling FlushFileBuffers() on the target of a rename /
> move sufficient to flush all metadata? Does it also
> flush outstanding file content changes?
Calling FlushFileBuffers() on target is not sufficient due the problem
I described before [1]: metadata changes are journaled, while data
changes seems are not. So you may get to race condition when move
operation is recorded in journal, while new file content is not
written to disk. On system restart journal will be applied, resulting
empty/old content file in place. That's why source file should be
flushed to disk before move operation.

>
> * Is there a way to efficiently flush multiple files, e.g.
> through something like overlapped I/O?
>
> * Does passing the FILE_FLAG_WRITE_THROUGH and
> FILE_FLAG_NO_BUFFERING flags to CreateFile()
> guarantee that all contents has been stored on disk
> when CloseHandle() returns? (Assuming the HW does
> not lie about its write buffers).
>
FILE_FLAG_NO_BUFFERING is not related to disk caching: it's disable
file buffer it require caller to perform only cluster aligned
operations [2]

With FILE_FLAG_WRITE_THROUGH flag disk cache is not used at all. I.e.
changes goes directly to hardware with special bit to skip internal HW
cache. Nothing is flushed to disk when CloseHandle() returns in this
case.

> Disclaimer: My understanding of the fsync behaviour
> on Windows is based on conjecture, gathered from the
> few pieces of information that I could find online. I'm
> happy to change my mind once new evidence shows
> up. Right now, our implementation seems to be wasteful
> and possibly incomplete - which is worse. I would love
> to fix both for 1.10.
>

[1] http://svn.haxx.se/dev/archive-2013-05/0245.shtml
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com
Received on 2015-05-29 18:23:40 CEST

This is an archived mail posted to the Subversion Dev mailing list.