FSFS replication and the rep cache
From: Julian Foad <julianfoad_at_apache.org>
Date: Tue, 21 Aug 2018 14:12:30 +0100
Hello Philip, Stefan, and other devs.
Doug Robinson of WANdisco asked for my assistance with their particular use of FSFS.
WANdisco intercepts calls into the FSFS API in order to replicate a commit from an originating repository to other repositories. The procedure involves (omitting many details):
1. Subversion builds a commit txn in the usual way on the originating repo.
2. Subversion calls svn_fs_commit_txn(), which is intercepted:
2a. The on-disk txn data is copied to the other repository.
It is considered necessary to achieve bit-for-bit identical contents of the revisions in each repository. The rep-cache contents, on the other hand, can potentially vary on each repository because the updates can potentially fail without failing the commit, unless we do something further to ensure they too remain synchronized (which is a possibility but not otherwise necessary).
So let me describe the problem.
We can consider the FSFS rep-cache processing in three parts:
1. While building a txn, FSFS looks up each new file text rep in the rep-cache and, if found, avoids adding a duplicate copy of it to the txn.
Part 1 is fine because the txn is built on the originating node, with duplicate reps omitted or included as determined by that repository's rep-cache, and will be valid on all repositories regardless of the contents of their local rep-caches.
Part 2 is the problem. The commit is performed separately on each repository, and the result is influenced by the each repository's local rep-cache, but the rep-caches are not guaranteed to have identical content.
We should also review the reasons why bit-for-bit identical revisions are needed. Before FSFS f7 it was necessary that the byte offsets in all preceding revisions were identical across repositories so that the new revision could be replicated without rewriting it. With f7 logical addressing that should no longer be necessary, but I have not reviewed in detail. Other reasons include ease of checking whether replicated repositories are logically identical and ease of repair if one repository suffers corruption. So in principle there is the possibility to retract that requirement, but in practice at present it very probably needs to be kept.
Potential solutions include:
* We could ignore the rep cache during commit-txn (in existing API: set fs_fs_data.rep_sharing_allowed = FALSE), then make a separate call to update the rep cache afterwards.
* We could change FSFS to allow selectively disabling part 2 (look up props) during the 'commit' step while keeping part 3 (update) enabled (split that flag into two), and disable part 2 only.
With these first two options, the rep cache would deduplicate only file contents, and that is fine. Deduplication of properties is relatively minor.
* We could change FSFS such that commit-txn no longer depends on the rep-cache content, by moving the props deduplication to the txn-building phase.
* We could ensure the rep cache is synchronized across repositories before each commit-txn.
I have not yet estimated the effort for the various options, but at first glance splitting the flag into two and turning off property deduplication looks simple while the other three options look significantly harder.
For the options that involve FSFS code changes, WANdisco could fork it but would prefer to use the master version of FSFS.
Could I please hear your thoughts? How appropriate might it be to make such changes in FSFS, if they are potentially beneficial for other users of the FSFS API, or other considerations?
-- - JulianReceived on 2018-08-21 15:12:40 CEST
This is an archived mail posted to the Subversion Dev mailing list.