[RFC] Revision property caching and named atomics

From: Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
Date: Wed, 20 Aug 2014 16:21:52 +0400

We (Sergey Raevskiy <sergey.raevskiy_at_visualsvn.com> and I) had some time to
examine the revision property caching feature, and there are some conclusions
that we would like to share. In brief, there are certain conditions when this
feature can actually *wreak havoc* in terms of the repository consistency and
possible corruptions. This applies to both Subversion trunk_at_1618912 and to
the released Subversion 1.8.10.

The first (and probably, the major) part of the problem is that you cannot
really have mixed configurations of this caching option. Whenever the revprop
caching is enabled in at least one process / thread accessing the filesystem,
it implicitly requires all other processes / threads that might eventually
access the filesystem to also have it enabled. Otherwise, you are pretty much
open to different sorts of erroneous behavior, and there is a reason for this:

- Currently, we only bump the revision property generations in the shared
   memory if the revprop caching is enabled, see switch_to_new_revprop() and
   svn_fs_fs__set_revision_proplist() functions. These generations are used
   as a part of the cache key; if they are out-of-date, the corresponding
   cache entries are invalid.

- In a situation when another process does not have this caching enabled, it
would not bump these generations when changing revision properties (and the
changes would just go to the disk).

- The first process currently does not have a way to see these changes in the
terms of revision property generations, i.e. it would be completely unaware
of the changes, and will still think that the cached entries are up-to-date.

(I have attached a patch with a failing test as an illustation)

As a consequence, things could go pretty bad, and there are certain practical
implications. As long as there is no way to prohibit mixed configurations in
terms of the revprop caching, we basically allow different combinations of the
following parts, ...

1. 'svn' accessing file:/// repositories, revprop caching is disabled

2. 'svnadmin', revprop caching is enabled or disabled in compile-time based
on the build plaform and the availability of certain intrinsics, see
svn_named_atomic__is_efficient()

3. Apache HTTP Server, revprop caching is controlled by the SVNCacheRevProps
directive, disabled by default [1]

4. 'svnserve', revprop caching is controlled by the --cache-revprops switch

...and there are some deadly combinations, which probably are not limited to
the examples below:

- Repository served by both Apache HTTP Server and 'svnserve' with different
  revprop caching options. This makes data corruption feasible, as we could
  sometimes rewrite newer revision properties with cached (old) values, and we
  also lose our test-and-set guarantee when changing them. This could also
  trigger a permanent state when users *cannot change* certain revision
  property until a server restart, i.e. they would be continously receiving
  a SVN_ERR_FS_PROP_BASEVALUE_MISMATCH error due to a false negative
  within the test-and-set logic. Finally, some of the revision property
  changes performed via one protocol would remain completely invisible to
  the users of the other protocol.

- Repository served by Apache HTTP Server with SVNCacheRevProps=on and
modified with 'svnadmin' built with "inefficient" atomics (i.e. disabled
revprop caching), pretty much with the same consequences as above.

The second part of the problem is using the apr_mmap()'ed files for the
repositories located on a network share. According to the documentation, it
is not possible to achieve write coherency with these files (hence, with the
named atomics) on Windows [2]:
[[[
CreateFileMapping()
...

  The exception is related to remote files. Although CreateFileMapping works
  with remote files, it does not keep them coherent. For example, if two
  computers both map a file as writable, and both change the same page, each
  computer only sees its own writes to the page. When the data gets updated on
  the disk, it is not merged (!).
]]]

Furthermore, it looks like there also is a fundamental problem in the way named
atomics are implemented. If you have two applications compiled with different
atomic "efficiencies" (see NA_SYNCHRONIZE_IS_FAST and related defines), which
is possible, for instance, if you are using Subversion executables built under
different Windows versions, you are basically left without synchronization, for
example:

# Application 1, inefficient atomics:

- Locks around the corresponding lock file with SVN_ERR(lock()).

- Starts performing a non-atomic operation, expects it to be serialized
across *all* other possible processes / threads.

...

# Context switches to Application 2 with efficient atomics:

- It executes an atomic operation, say, InterlockedExchangeAdd64().
   This application, however, does not know anything about file locks.
   So, overall, the sequence is *not* thread-safe due to the non-atomicity
   of the operation performed by Application 1.

We do not want to make any conclusions on this topic. However, this seems
to be pretty much broken, and these problems are probably affecting existing
Subversion 1.8 users.

Any thoughts on this?

[1] http://subversion.apache.org/docs/release-notes/1.8.html#revprop-caching
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537

Regards,
Evgeny Kotkov

text/plain attachment: revprop-caching-failing-test-v1.patch.txt

Received on 2014-08-20 14:22:46 CEST

This message: [ Message body ]
Next message: Branko Čibej: "Re: [RFC] Revision property caching"
Previous message: Ben Reser: "Re: Python bindings: core.Stream missing close method?"
Next in thread: Branko Čibej: "Re: [RFC] Revision property caching"
Reply: Branko Čibej: "Re: [RFC] Revision property caching"
Reply: Evgeny Kotkov: "Re: [RFC] Revision property caching and named atomics"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]