Re: FSFS7: 'svnadmin hotcopy' requires write access to the source

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 18 Jun 2015 21:49:09 +0200

On Thu, Jun 18, 2015 at 3:24 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> writes:
>
> > Sorry, my bad, I was wrong there. I only remembered that we had and still
> > have the pack/hotcopy race condition in 1.8 and then I saw the recursive
> > directory copy calls in the /trunk hotcopy code and had my false
> positive.
> >
> > You are absolutely right. We already copy revisions explicitly and will
> > simply stop upon missing files w/o bumping 'current' (as I had suggested
> > yesterday for a future improvement). And there is plenty of commentary
> > on that issue in the code as well. A user might later retry the hotcopy
> > and they will likely succeed.
>
> Thanks for confirming my thoughts.
>
> > So, at the moment, the pack lock prevents those (in some setups rare)
> > ENOENT errors. I think that apparent reliability (users don't see
> failures
> > once on a blue moon) is worth keeping. I'll try to come up with a patch
> > for the r/o issue tomorrow.
>
> So, we have a regression in how 'svnadmin hotcopy' behaves in 1.9.0-rc2 in
> terms that it's no longer a read-only operation â€” as it used to be. I'd
> say
> that this is much more important than providing a better usability in an
> edge
> case when a repository is being concurrently packed and hotcopied (as we
> now
> agree that the integrity of the hotcopy target is not a suspect).
>

And the regression should be fixed.

> Furthermore, with such a fix we would be providing better usability only
> when
> a lot of preconditions are met, i.e., the filesystem should have format 7,
> it
> should be concurrently packed during hotcopy, the hotcopy should run into a
> certain race condition (attempt copying a file that just got moved to a
> pack)
> and â€” finally, this is *still going to fail* in a situation where the
> account
> performing the backup has read-only access to the repository, and the
> repository is being packed using an account with write access.
>

My point here is that *occasional* failures in a functionality
that people might use for creating backups is a problem.
For example, people running pack in their post-commit hooks
may not notice the failure during testing, so they don't implement
the redo fallback in their backup procedure. Or, they notice the
failure and get that unsettling feeling ...

In the past, the race may not have a major problem but
with FSFS format 7, people will start packing their repos
regularly.

> Maybe this is a nice improvement after all, but not if we find ourselves
> in a
> situation where we need to come up with a non-trivial fix just to get rid
> of
> a regression in the stable branch.

There is a trivial fix: Try to acquire the lock and fall back
to the non-locking behavior if you get an EACCESS.
The nicer way is to check beforehand and make the
check slightly more generic. That's what I did in r1686232.

> As I said in my first e-mail, I think that
> the proper solution would be reverting r1589284, pushing this change â€” with
> a test, perhaps â€” to /branches/1.9.x.

Reverting the whole of r1589284 would be excessive.
Simply not using the "with_pack_lock" call would make
for a much smaller, less intrusive change.

> Then we could think on doing something
> with the aforementioned usability problem, but only after we're done with
> the
> real problem.
>
> I am ready to do this, if no one objects.
>

Well, I do object to the suggested plain revert.
Instead, I propose backporting r1686232,-9.
If there are severe issues with that, I'd opt for
the minimal change in hotcopy_locking_src_body
mentioned above.

-- Stefan^2.
Received on 2015-06-18 21:49:31 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]