[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Tue, 20 Apr 2010 23:08:26 -0400

Paul Burba wrote:
> Hi Mike and Julian,
>
> Sorry for the delayed response; been working on/thinking about other
> mergey stuff lately...
>
> A few inline comments and then a proposed course of action at the end.
>
> On Thu, Apr 15, 2010 at 9:28 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>> The question then becomes, "How do users deal with legitimate partial dumps
>> that will be loaded atop something other than loads from previous
>> incremental windows?" I think they do this the same way they handle the
>> copy case: either hand-hacking the dumpstream or using 'svndumpfilter'. So
>> maybe 'svndumpfilter' grows the logic and options required to pare away
>> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
>> stream range")?
>
> I vote for svndumpfilter growing the logic to filter ranges predating
> the start of the dump stream in the same way it strips mergeinfo with
> merge sources filtered from the dump stream. Basically this means
> moving the r927243 functionality to svndumpfilter. No really sure we
> need a new option, seems we could overload
> --skip-missing-merge-sources to include this behavior. Thoughts?

+1

>> Julian Foad wrote:
>>> On Wed, 2010-04-14, C. Michael Pilato wrote:
>>>> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
>>>>
>>>> 1. 'svnadmin dump' warns when it is in incremental mode and must generate
>>>> mergeinfo from a merge source that predates the beginning of the dump
>>>> window, but it's only a warning and the dump continues.
>
> Why only --incremental mode? We should warn for both --incremental
> dumps *and* non-incremental dumps because in the latter case we might
> be specifying a dump range -rX:Y where rX>0* and there might be
> mergeinfo that refers to revs < X. And while we might want to be
> clever about skipping the warning when no range is specified (i.e.
> defaulting to -r0:HEAD), even then we should warn because issue #3020
> might already have resulted in a r1 with mergeinfo in the starting
> repository.

You're right, of course. I was thinking "partial dump" and saying
"incremental". My bad.

>>>> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being
>>>> fed is a sensible one.
>>> Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
>>> warning if it refers to a non-existent source. The admin then gets to
>>> figure out whether it was bad in the first place or because of his/her
>>> partial-dump/partial-load scenario.
>>>
>>> However, it depends how efficient the checking is. If that would make
>>> the 'load' really slow, I can see that not being wanted.

[...]

> No unsurprisingly, this took significantly (62%) longer, 00:05:39.609.

(My vote was for 'svnadmin load' does nothing smart, and I stand by it.)

> Taking what you've had to say into consideration, I propose the following:
>
> 1) Revert http://svn.apache.org/viewvc?view=revision&revision=927243.
> This stops svnadmin load from automatically filtering mergeinfo
> referring to revisions outside of the load stream.
>
> 2) Reapply or reimplement the part of r327243 that accounts for the
> case where the first loaded revision in a load stream has mergeinfo,
> see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc10
> item #1.

(r927243, you mean.)

> 3) Keep the inline copy-source warnings as they exist today in
> svnadmin dump, but add a new end-of-process summary warning e.g.:
>
> * Dumped revision 504.
> * Dumped revision 505.
> * Dumped revision 506.
> * Dumped revision 507.
> WARNING: Referencing data in revision 250, which is older than the oldest
> WARNING: dumped revision (504). Loading this dump into an empty repository
> WARNING: will fail.
> * Dumped revision 58.
> * Dumped revision 59.
> * Dumped revision 510.
> * Dumped revision 511.
> .
> .
> .
> WARNING: The range of revisions dumped contained references to copy
> sources outside that range.
>
> 4) Implement inline and summary functionality analogous to 3) but for
> mergeinfo ranges outside of the dump stream, e.g.:
>
> * Dumped revision 504.
> * Dumped revision 505.
> * Dumped revision 506.
> * Dumped revision 507.
> WARNING: Mergeinfo referencing revision(s) prior to revision 250,
> which is older
> WARNING: than the oldest dumped revision (504). Loading this dump may result
> WARNING: in invalid mergeinfo.
> * Dumped revision 58.
> * Dumped revision 59.
> * Dumped revision 510.
> * Dumped revision 511.
> .
> .
> .
> WARNING: The range of revisions dumped contained references to
> mergeinfo outside that range.
>
> Also, as mentioned earlier, I think this warning should kick in
> regardless of whether --incremental is used.
>
> 5) Overload svndumpfilter's --skip-missing-merge-sources option to
> also remove mergeinfo that predates the starting revision of the dump
> stream.

+1 all over this stuff.

> 6) Open: How does svnadmin load deal with mergeinfo that doesn't exist
> in the load steam or the target repos:
>
> A) It doesn't.
>
> B) Only checks that the revs are valid (fast, but
> can allow mergeinfo that describes invalid
> mergesource:rev pairs)
>
> C) Find an efficient way to test for valid
> mergesource:rev pairs

For now, A (or maybe B). Then we have our Paul back. :-)

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
Received on 2010-04-21 05:09:04 CEST

This is an archived mail posted to the Subversion Dev mailing list.