[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Wed, 14 Apr 2010 11:46:54 -0400

Paul Burba wrote:
> After thinking about this some more I see three options:
>
> A) Keep the pre-927243 behavior as the default and thus support the
> incremental dump use case by default. Add a new option to svnadmin
> load, --skip-missing-merge-sources (or maybe
> --filter-missing-merge-sources) to activate the filtering of mergeinfo
> sources outside of the dump stream (i.e. make the r927243 changes only
> take affect with this option). The obvious drawback is that admins
> must know to use this option. They would still be able to partially
> dump a repos then load it into an empty (or unrelated) repos and have
> bogus mergeinfo.
>
> B) If the load stream's mergeinfo contains a merge source-rev pair
> that predates the start of the load stream, but exists in the target
> repository, then allow it to be loaded, otherwise filter it. The
> drawbacks are two: First, performance; we'd need to check every
> path/rev pair of incoming mergeinfo which certainly isn't going to
> speed up a load*. Second, it may be mere coincidence that the
> path/rev exists in the target repository at the start of the load.
>
> C) Revert r927243 and move the fix into svnadmin load: Give an error
> when doing a partial dump that contains mergeinfo with revisions that
> predate the starting rev of the dump and require the use of a new
> --missing-merge-source=[skip|allow] option to successfully complete
> the dump (i.e. something analogous to svndumpfilter's
> --skip-missing-merge-sources).
>
> Of course there is always:
>
> D) <Insert your brilliant idea here>

[Thinking aloud here.]

We have analogous behavior today with the handling of copied paths. There
are two aspects of this handling which might serve as models in this new
scenario:

1. 'svnadmin dump' warns when it is in incremental mode and must generate a
copy action from a source that predates the beginning of the dump window,
but it's only a warning and the dump continues.

2. 'svnadmin load' does nothing smart, trusting that the dump it's being
fed is a sensible one. If it gets a request to copy some path/rev that
doesn't exist, stuff errors out and the repository is left in whatever state
it was in prior to the revision that failed to load.

Assuming similar behavior for mergeinfo handling, we have, at a minimum:

1. 'svnadmin dump' warns when it is in incremental mode and must generate
mergeinfo from a merge source that predates the beginning of the dump
window, but it's only a warning and the dump continues.

Now, there's some question here about how to handle part (2). In the copied
path case, 'svnadmin load' can be stupid because the failure case is a noisy
one -- stuff errors out from deep within the FS layer. That's not quite the
same failure case for mergeinfo, where bad mergeinfo would be silently
recorded without error. So we *could* try to validate the mergeinfo somehow
against the history recorded in the target repository. But there follows
another complication: it's really not so hard to get bad (if perhaps
innocuous) mergeinfo into a repository by normal means, and you certainly
wouldn't expect that to cost you the ability to dump and load your repository!

So I'm really left with this:

2. 'svnadmin load' does nothing smart, trusting that the dump it's being
fed is a sensible one.

As a net change against our shipping code, that's really just, what, the
addition of warnings in 'svnadmin dump'?

The question then becomes, "How do users deal with legitimate partial dumps
that will be loaded atop something other than loads from previous
incremental windows?" I think they do this the same way they handle the
copy case: either hand-hacking the dumpstream or using 'svndumpfilter'. So
maybe 'svndumpfilter' grows the logic and options required to pare away
mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
stream range")?

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2010-04-14 17:47:28 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.