[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS7: 'svnadmin hotcopy' requires write access to the source

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 22 Jun 2015 11:42:02 +0200

On Fri, Jun 19, 2015 at 8:37 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> writes:
>
> > Well, it "works" based on SVN's concept of OS-side readonlyness.
> > svnadmin_tests.py 51 passes on Windows.
>
> Did you try the test after doing 'svn merge -c-r1686232'? I tried this on
> both
> Windows and Unix, and the test passed, so it's testing something else than
> the
> discussed issue with db/pack-lock permissions:
>

I didn't retest after switching my initial implementation
to chmod_tree, so I missed that I called the latter with
the wrong p

> > IIRC, post-commit is always asynchronous and would not prevent further
> > commits unless you write specific hook code to e.g. block in the
> > pre-commit.
> >
> > However, you do have a point here. The 'svnadmin pack' instances run
> > from the post-commit hook will pile up (one per new revision) while
> > hotcopy runs. Under normal circumstances that should not be a problem
> > (tens of relatively small, blocked processes).
>
> [...]
>
> They are still synchronous from the client's point of view. Are we going
> to
> tell people that their clients hanging with "Committing transaction..." are
> fine, because other people can still commit during this time? :)
> Furthermore,
> I think that tens of hanging 'svnadmin' processes are a problem. How many
> commits can happen during a backup of a large repository under a highload
> situation? 500? 750? More? It certainly depends on the environment, but
> I wouldn't be pleased to see half a thousand svnadmin processes just
> hanging
> around.
>

Admins can always fork potentially long-running operations
from the hook process. If they really want to get fancy, they
could even write trivial deamon script, have only a single
instance of it and let the hooks append to a TODO-list of
repos to process ...

So, if you have multiple active repositories and still want
to limit the number of svnadmin processes, you can do it.
That's not only to keep memory usage down but also to
limit I/O stress in situations when the server is under high
load already (many commits to various repos).

> [...]
>
> > Back to the original issue, I think this is what we should do:
> >
> > * Revert r1686232 (but keep the test case added in r1686239) because it
> > is mainly ineffective.
>
> Perhaps a better reason for reverting r1686232 would be that it doesn't
> really
> solve the problem, at least on Windows? The test itself isn't a regression
> test, as it passes without the fix on at least two platforms.
>

It fixed one aspect, the behavior when the write flag as missing.

> > * 2-line change to the hotcopy code to disable the pack lock
> > * Release that for 1.9.0
>
> I believe that we disagree on this part, as I think that it's not really
> that
> important how big the change is, but rather what it does. Reverting a
> change
> is atomic, reverting a part of the change is not.

Selectively reverting a change != HEAD is not atomic by any
reasonable definition. I was about to write a lengthy tractate on
the misperception of "atomicity" in SVN and SCM in generl but
then I found that is too dry and boring. It's a conversion best had
with a beer or two to help through the drier parts.

> Furthermore, the hotcopy
> bootstrapping code is known to be stable in 1.8.x, and disabling just the
> locking code means that every other related change was done for no reason.
> We don't really know if there are any other issues with what we did; it
> would
> be fine to keep it for a purpose, but given that we are about to stop
> locking
> over db/pack-lock, I don't see a reason to keep the supplementary changes.
>

I tried to simply apply the reverse delta and got 4 or 5 conflicts.
So, there is no trivial way go back to the old behavior. Also,
you made various changes the hotcopy code in 1.9 which
IMO undermines the "known stable" claim for 1.8.

> > * Make 'svnadmin upgrade' create missing lock files (not a strict
> > pre-condition but something we should always have done)
> > * Add a "with_optional_pack_lock" function that does not require write
> > access to the repository. Where available, it will acquire a shared
> lock
> > on the pack-lock file. Inter-thread synchronization needs to be
> switched
> > to reader-writer locks.
> > * Use the new function to prevent packing while hotcopying non-packed
> shards.
> > * Release this with 1.9.1
> >
> > Are you o.k. with that plan?
>
> Frankly, this sounds like too much for a patch release — especially, since
> the
> hotcopy-pack-ENOENT issue is not a regression. I propose an alternative
> plan:
>
> 1. Revert the changeset that added locking over db/pack-lock in the hotcopy
> source (r1589284) and all supplementary changes that are no longer
> necessary.
>

r1686554 removes the intermedite source repo locking layer.
You may try reverting the remainder r1589284, resolve the
conflicts and review the result. If you then still think that the
resulting patch is an improvement, by all means commit it.

2. Nominate and backport them to 1.9.x (this step resolves the regression).
>

I'll do that in a minute.

I also found an issue with the way we hotcopy the rep-cache.db
from r/o repositories and will include that patch as well. After that,
I'll check whether 1.8 has the same issue.

> 3. Fix the problem with a race between hotcopy and pack causing ENOENTs,
> while keeping 'svnadmin hotcopy' a read-only and lockless operation for
> the source repository, as it always was. I believe that we can add a
> retry
> logic into hotcopy_revisions().

Yes, please.

> Doing so means that in case of a race,
> hotcopy сould have to be restarted from the very first shard — in order
> not to have {unpacked-unpacked-packed-packed} shard sequences in the
> destination, but I believe that we can work it out.
>

You only need to restart from the first shard that was non-packed
when you started the hotcopy. For the incremental pack setup,
this would be just 1 shard (and probably still be cached by the OS
etc.).

> 4. The latter fix is going to eventually become a part of Subversion 1.10.
>

Sadly, yes. But I guess 1.9 users can work around this issue by
wrapping their hotcopy invocation with some simple retry script.

> This plan, to my mind, has the following benefits:
>
> - We minimize the potential destabilization of /branches/1.9.x by restoring
> necessary parts of the code to their 1.8.x state. Furthermore, we do not
> keep the code that was once required to support taking the db/pack-lock
> file.
>
> - We only backport a fix for a regression that happened between 1.8.x and
> 1.9.x
> into /branches/1.9.x. We don't try to push potentially destabilizing
> changes
> (shared lock on the db/pack-lock file?) into a stable branch.
>
> - The fix for the ENOENT problem works for sharded repositories
> irrespectively
> of their format and doesn't turn an initially read-only operation into an
> operation that sometimes requires locking over the source. It doesn't
> affect multiple parallel hotcopies and doesn't somehow block repository
> packing.
>

What tipped the balance for me is that the r/o issue is most likely
encountered as a result of ACLs which apparently we don't have
the means to deal with through APR. So, yes to the above but the
inability to fix the common case is what rendered my change invaiable.

How does this sound?
>

Good. I prepared the patches on trunk and will now add them to
STATUS for review.

-- Stefan^2.
Received on 2015-06-22 11:43:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.