Re: FSFS7: 'svnadmin hotcopy' requires write access to the source

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Fri, 19 Jun 2015 16:16:01 +0200

On Fri, Jun 19, 2015 at 2:26 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> writes:
>
> > And the regression should be fixed.
>
> [...]
>
> > There is a trivial fix: Try to acquire the lock and fall back to the
> > non-locking behavior if you get an EACCESS. The nicer way is to check
> > beforehand and make the check slightly more generic. That's what I did in
> > r1686232.
>
> I don't see how r1686232 could possibly work on Windows. The APR_FREADONLY
> check tells us if a file has FILE_ATTRIBUTE_READONLY.

Well, it "works" based on SVN's concept of OS-side readonlyness.
svnadmin_tests.py 51 passes on Windows.

>
> This is orthogonal
> to the DACL of the file â€” i.e., you cannot write or lock a file without
> having
> an appropriate access control entry in the DACL, even if the file does not
> have a read-only attribute. With this fix, the following scenario (attempt
> to hotcopy a source repository while only having R access in DACL) is still
> going to fail:
>
> svnadmin create source
> svnadmin pack source
> icacls source /inheritance:r /grant MyAccountName:(OI)(CI)(R)
> svnadmin hotcopy source target
> svnadmin: E720005: Can't open file 'source\db\pack-lock': Access is
> denied.
>

So, the trigger is restrictive ACLs rather than plain r/o plus
the fact that SVN has no concept of ACLs.

> I cannot say that I feel confident about the other proposed option
> (handling
> EACCES after trying to acquire a lock), as opposed to a plain revert.
> Doing
> so is going to put us into a 12 second retry loop in the common case where
> a Windows account can only read the hotcopy source â€” and that's also going
> to be a quite visible regression.
>

Yes, that would be bad and I don't want that.

> > My point here is that *occasional* failures in a functionality that
> people
> > might use for creating backups is a problem. For example, people running
> > pack in their post-commit hooks may not notice the failure during
> testing,
> > so they don't implement the redo fallback in their backup procedure. Or,
> > they notice the failure and get that unsettling feeling ...
>
> Is this argument for or against serializing hotcopy and pack? Because if
> we
> serialize them, naÃ¯ve post-commit hooks that call 'svnadmin pack'
> synchronously
> could wreak much more havoc. Hotcopy blocks pack, so, if an administrator
> is
> currently performing a backup with 'svnadmin hotcopy', packing will only
> begin
> after the hotcopy is done. Given that hotcopy could take a significant
> amount
> of time, depending on how fast the storage works, users will probably
> witness
> hanging 'svn commit' operations, as the execution of the post-commit hook
> is
> going to be blocked by the hotcopy.
>

IIRC, post-commit is always asynchronous and would not
prevent further commits unless you write specific hook code
to e.g. block in the pre-commit.

However, you do have a point here. The 'svnadmin pack' instances
run from the post-commit hook will pile up (one per new revision)
while hotcopy runs. Under normal circumstances that should not
be a problem (tens of relatively small, blocked processes). But that
is unnecessary: The hotcopy process only needs to block packs
while copying non-packed revs. In the scenario that we are talking
about, this would be a single shard. We should improve the process
accordingly.

The last, but not the least, hotcopy now blocks other hotcopies, so one can
> no longer perform the backups in a parallel manner, e.g., to two different
> backup destinations.
>

That is true and linked to the actual problem with the current code.
I dug through the locking code and found that we require write
access to the lock file to be able to take out an exclusive lock.

Hotcopy would be the first (?) instance in FSFS where we actually
only need a read ("shared") lock. It still prevents pack but allows
for concurrent hotcopies and fixes the access rights issue.

The only problematic bit is that we would simply not use the lock if
there is no lock file. But that is o.k. for our use-case. Furthermore,
'svnadmin upgrade' should auto-create missing lock files regardless.

> > Reverting the whole of r1589284 would be excessive. Simply not using the
> > "with_pack_lock" call would make for a much smaller, less intrusive
> change.
>
> Hmm, what could be less intrusive that just reverting a change? I see it
> as
> that if we stop locking the hotcopy source, we don't require other relevant
> bits of this change, i.e., the supplementary logic that made this locking
> possible. Why should we keep it?
>

I want the locking to ultimately work. I'm o.k. with temporarily
disabling it in 1.9.0 but would love to fix the race error condition
for 1.9.1 (fixing a known issue, not adding a new feature).

> Doing so would also allow us to get rid of the inconsistency in how repos
> and
> fs layers currently perform the hotcopy that I mentioned in [1].

I don't see how that is inconsistent. There is no harm in exposing
an empty repository - which is what you see until the hotcopy is
complete. If there is a problem with that, the same problem is
present in incremental mode. Thus, I claim that the current code
results in a more consistent behaviour.

But above all, I'd rather have consistent state within the FS layer.
The 1.8.x hotcopy code has half-created repositories, manually
shares data structs between source and target repository etc.
This is the main reason why I changed the hotcopy setup for 1.9.

Higher API layers may or may not make the repository available
during hotcopy - at their own discretion. Thus, we still have full
control over what servers expose to their clients.

> Currently,
> if an API user calls svn_fs_hotcopy3() in the non-incremental mode, the
> target
> filesystem is immediately openable, even while the hotcopy operation is
> still
> in progress). This means that anyone calling svn_fs_open2() on this target
> could be working with a filesystem stub without the lock tree or
> db/txn-current
> file, and I don't think there is anything good in that, quoting [1]:
>
> [[[
> The whole idea of the non-incremental hotcopy is that it *does not* make
> the
> destination visible until the hotcopy finishes. This is how the things
> worked
> prior to this changeset and this is how the svn_repos layer currently
> works.
> By writing the format file before actually doing something we do not only
> expose our internal stub to the world (which really is a stub without even
> r0),
> but will also leave this internal stub openable in case the hotcopy fails.
> This stub does not even have to be a normal filesystem and we only need
> it to bootstrap the hotcopy process, but we now behave as if it was!
>
> Finally, the 'repos' and 'fs' layers now behave inconsistently in these
> terms,
> i.e. svn_repos_hotcopy3() still writes the format file only when
> everything is
> done.
> ]]]
>
> What do you think?
>

Back to the original issue, I think this is what we should do:

* Revert r1686232 (but keep the test case added in r1686239)
because it is mainly ineffective.
* 2-line change to the hotcopy code to disable the pack lock
* Release that for 1.9.0

* Make 'svnadmin upgrade' create missing lock files
  (not a strict pre-condition but something we should always
  have done)
* Add a "with_optional_pack_lock" function that does not
  require write access to the repository. Where available, it
  will acquire a shared lock on the pack-lock file. Inter-thread
  synchronization needs to be switched to reader-writer locks.
* Use the new function to prevent packing while hotcopying
  non-packed shards.
* Release this with 1.9.1

Are you o.k. with that plan?

-- Stefan^2.
Received on 2015-06-19 16:17:16 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]