[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)

From: Paul Burba <ptburba_at_gmail.com>
Date: Tue, 13 Apr 2010 11:39:05 -0400

On Wed, Mar 31, 2010 at 10:05 AM, Paul Burba <ptburba_at_gmail.com> wrote:
> Mike and I were discussing the changes I made in
> http://svn.apache.org/viewvc?view=revision&revision=927243 to fix
> issue #3020 and which were backported to 1.6.x. There is a regression
> in that fix and I am changing my vote to -1 and pulling it from 1.6.x
> (and today's roll of 1.6.10).
>
> The fix in r927243 addressed the problem of mergeinfo in a partial
> dump of a repository, specifically:
>
> We dump -r(X>1):Y from repos A then load that dump into repos B. If
> there is mergeinfo in the loaded revisions it may refer to revisions <
> X. r927423 strips out these ranges. This is fine if the partial dump
> of repos A is done in one step, e.g,
>
> svnadmin dump reposA -r200:300 > A.200.300.partial.dump
> svnadmin load reposB < A.200.300.partial.dump
>
> because those revisions don't refer to valid history re the
> mergeinfo's merge source.
>
> Unfortunately this fix breaks a (likely much more) common use case:
> Dumping a complete repository in multiple steps and then loading each
> chunk to the new repository, e.g.:
>
> svnadmin dump reposA -r0:100 > A.0.100.dump
> svnadmin dump reposA -r101:200 --incremental > A.101.200.dump
> svnadmin dump reposA -r201:300 --incremental > A.201.300.dump
>
> svnadmin load reposB < A.0.100.dump
> svnadmin load reposB < A.101.200.dump
> svnadmin load reposB < A.201.300.dump
>
> In this case, valid mergeinfo may be filtered from the 2nd and or 3rd load.
>
> I'll work on a fix that can handle both use cases, but for now I am
> changing my vote to -1 and reverting this backport.
>
> Paul

After thinking about this some more I see three options:

A) Keep the pre-927243 behavior as the default and thus support the
incremental dump use case by default. Add a new option to svnadmin
load, --skip-missing-merge-sources (or maybe
--filter-missing-merge-sources) to activate the filtering of mergeinfo
sources outside of the dump stream (i.e. make the r927243 changes only
take affect with this option). The obvious drawback is that admins
must know to use this option. They would still be able to partially
dump a repos then load it into an empty (or unrelated) repos and have
bogus mergeinfo.

B) If the load stream's mergeinfo contains a merge source-rev pair
that predates the start of the load stream, but exists in the target
repository, then allow it to be loaded, otherwise filter it. The
drawbacks are two: First, performance; we'd need to check every
path/rev pair of incoming mergeinfo which certainly isn't going to
speed up a load*. Second, it may be mere coincidence that the
path/rev exists in the target repository at the start of the load.

C) Revert r927243 and move the fix into svnadmin load: Give an error
when doing a partial dump that contains mergeinfo with revisions that
predate the starting rev of the dump and require the use of a new
--missing-merge-source=[skip|allow] option to successfully complete
the dump (i.e. something analogous to svndumpfilter's
--skip-missing-merge-sources).

Of course there is always:

D) <Insert your brilliant idea here>

Any preferences as to what approach to take? Or ideas for a superior fix?

I lean towards A), since B) subjects every load to the overhead of
checking for valid merge sources, even if nothing is ultimately
filtered. If the incremental dump use case really is more common, we
are punishing those admins with nothing to show for it. C) is ugly
because you might not get an error until the dump is well underway
(maybe hours along). Then you have to start all over again with the
new option.

Paul

* Testing a crude patch right now with some incremental dumps of the
old S.T.O. repos to get an feel for how serious of a penalty this
might be.
Received on 2010-04-13 17:39:36 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.