[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Thu, 15 Apr 2010 14:03:57 +0100

On Wed, 2010-04-14, C. Michael Pilato wrote:
> Paul Burba wrote:
> > After thinking about this some more I see three options:
> >
> > A) Keep the pre-927243 behavior as the default and thus support the
> > incremental dump use case by default. Add a new option to svnadmin
> > load, --skip-missing-merge-sources (or maybe
> > --filter-missing-merge-sources) to activate the filtering of mergeinfo
> > sources outside of the dump stream (i.e. make the r927243 changes only
> > take affect with this option). The obvious drawback is that admins
> > must know to use this option. They would still be able to partially
> > dump a repos then load it into an empty (or unrelated) repos and have
> > bogus mergeinfo.
> >
> > B) If the load stream's mergeinfo contains a merge source-rev pair
> > that predates the start of the load stream, but exists in the target
> > repository, then allow it to be loaded, otherwise filter it. The
> > drawbacks are two: First, performance; we'd need to check every
> > path/rev pair of incoming mergeinfo which certainly isn't going to
> > speed up a load*. Second, it may be mere coincidence that the
> > path/rev exists in the target repository at the start of the load.
> >
> > C) Revert r927243 and move the fix into svnadmin load: Give an error
> > when doing a partial dump that contains mergeinfo with revisions that
> > predate the starting rev of the dump and require the use of a new
> > --missing-merge-source=[skip|allow] option to successfully complete
> > the dump (i.e. something analogous to svndumpfilter's
> > --skip-missing-merge-sources).
> >
> > Of course there is always:
> >
> > D) <Insert your brilliant idea here>
>
> [Thinking aloud here.]
>
> We have analogous behavior today with the handling of copied paths. There
> are two aspects of this handling which might serve as models in this new
> scenario:
>
> 1. 'svnadmin dump' warns when it is in incremental mode and must generate a
> copy action from a source that predates the beginning of the dump window,
> but it's only a warning and the dump continues.

> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being
> fed is a sensible one. If it gets a request to copy some path/rev that
> doesn't exist, stuff errors out and the repository is left in whatever state
> it was in prior to the revision that failed to load.
>
> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
>
> 1. 'svnadmin dump' warns when it is in incremental mode and must generate
> mergeinfo from a merge source that predates the beginning of the dump
> window, but it's only a warning and the dump continues.

+1 on that, both for consistency with copy-source info and because it's
a good policy anyway.

A detail: I haven't tried it, but would want it to print this warning
(and the copy-source one) at the end of the dump, and not just in the
middle of the long list of "Dumping rX" messages.

> Now, there's some question here about how to handle part (2). In the copied
> path case, 'svnadmin load' can be stupid because the failure case is a noisy
> one -- stuff errors out from deep within the FS layer. That's not quite the
> same failure case for mergeinfo, where bad mergeinfo would be silently
> recorded without error. So we *could* try to validate the mergeinfo somehow
> against the history recorded in the target repository. But there follows
> another complication: it's really not so hard to get bad (if perhaps
> innocuous) mergeinfo into a repository by normal means, and you certainly
> wouldn't expect that to cost you the ability to dump and load your repository!
>
> So I'm really left with this:
>
> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being
> fed is a sensible one.

Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
warning if it refers to a non-existent source. The admin then gets to
figure out whether it was bad in the first place or because of his/her
partial-dump/partial-load scenario.

However, it depends how efficient the checking is. If that would make
the 'load' really slow, I can see that not being wanted.

- Julian

> As a net change against our shipping code, that's really just, what, the
> addition of warnings in 'svnadmin dump'?
>
> The question then becomes, "How do users deal with legitimate partial dumps
> that will be loaded atop something other than loads from previous
> incremental windows?" I think they do this the same way they handle the
> copy case: either hand-hacking the dumpstream or using 'svndumpfilter'. So
> maybe 'svndumpfilter' grows the logic and options required to pare away
> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
> stream range")?
>
Received on 2010-04-15 15:04:32 CEST

This is an archived mail posted to the Subversion Dev mailing list.