Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)

From: Paul Burba <ptburba_at_gmail.com>
Date: Tue, 3 Aug 2010 14:35:21 -0400

Sweeping through my TODO list turned up this thread.

I'll commit this last patch tomorrow if there are no objections.

I pause a day only because I recall Mike saying he was planning to
take a look at this.

Paul

On Tue, May 11, 2010 at 4:00 PM, Paul Burba <ptburba_at_gmail.com> wrote:
> On Tue, Apr 20, 2010 at 11:08 PM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>> Paul Burba wrote:
>>> Hi Mike and Julian,
>>>
>>> Sorry for the delayed response; been working on/thinking about other
>>> mergey stuff lately...
>>>
>>> A few inline comments and then a proposed course of action at the end.
>>>
>>> On Thu, Apr 15, 2010 at 9:28 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>>>> The question then becomes, "How do users deal with legitimate partial dumps
>>>> that will be loaded atop something other than loads from previous
>>>> incremental windows?" I think they do this the same way they handle the
>>>> copy case: either hand-hacking the dumpstream or using 'svndumpfilter'. So
>>>> maybe 'svndumpfilter' grows the logic and options required to pare away
>>>> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
>>>> stream range")?
>>>
>>> I vote for svndumpfilter growing the logic to filter ranges predating
>>> the start of the dump stream in the same way it strips mergeinfo with
>>> merge sources filtered from the dump stream. Basically this means
>>> moving the r927243 functionality to svndumpfilter. No really sure we
>>> need a new option, seems we could overload
>>> --skip-missing-merge-sources to include this behavior. Thoughts?
>>
>> +1
>>
>>>> Julian Foad wrote:
>>>>> On Wed, 2010-04-14, C. Michael Pilato wrote:
>>>>>> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
>>>>>>
>>>>>> 1. 'svnadmin dump' warns when it is in incremental mode and must generate
>>>>>> mergeinfo from a merge source that predates the beginning of the dump
>>>>>> window, but it's only a warning and the dump continues.
>>>
>>> Why only --incremental mode? We should warn for both --incremental
>>> dumps *and* non-incremental dumps because in the latter case we might
>>> be specifying a dump range -rX:Y where rX>0* and there might be
>>> mergeinfo that refers to revs < X. And while we might want to be
>>> clever about skipping the warning when no range is specified (i.e.
>>> defaulting to -r0:HEAD), even then we should warn because issue #3020
>>> might already have resulted in a r1 with mergeinfo in the starting
>>> repository.
>>
>> You're right, of course. I was thinking "partial dump" and saying
>> "incremental". My bad.
>>
>>>>>> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being
>>>>>> fed is a sensible one.
>>>>> Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
>>>>> warning if it refers to a non-existent source. The admin then gets to
>>>>> figure out whether it was bad in the first place or because of his/her
>>>>> partial-dump/partial-load scenario.
>>>>>
>>>>> However, it depends how efficient the checking is. If that would make
>>>>> the 'load' really slow, I can see that not being wanted.
>>
>> [...]
>>
>>> No unsurprisingly, this took significantly (62%) longer, 00:05:39.609.
>>
>> (My vote was for 'svnadmin load' does nothing smart, and I stand by it.)
>>
>>> Taking what you've had to say into consideration, I propose the following:
>>>
>>> 1) Revert http://svn.apache.org/viewvc?view=revision&revision=927243.
>>> This stops svnadmin load from automatically filtering mergeinfo
>>> referring to revisions outside of the load stream.
>>>
>>> 2) Reapply or reimplement the part of r327243 that accounts for the
>>> case where the first loaded revision in a load stream has mergeinfo,
>>> see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc10
>>> item #1.
>>
>> (r927243, you mean.)
>>
>>> 3) Keep the inline copy-source warnings as they exist today in
>>> svnadmin dump, but add a new end-of-process summary warning e.g.:
>>>
>>> * Dumped revision 504.
>>> * Dumped revision 505.
>>> * Dumped revision 506.
>>> * Dumped revision 507.
>>> WARNING: Referencing data in revision 250, which is older than the oldest
>>> WARNING: dumped revision (504). Loading this dump into an empty repository
>>> WARNING: will fail.
>>> * Dumped revision 58.
>>> * Dumped revision 59.
>>> * Dumped revision 510.
>>> * Dumped revision 511.
>>> .
>>> .
>>> .
>>> WARNING: The range of revisions dumped contained references to copy
>>> sources outside that range.
>>>
>>> 4) Implement inline and summary functionality analogous to 3) but for
>>> mergeinfo ranges outside of the dump stream, e.g.:
>>>
>>> * Dumped revision 504.
>>> * Dumped revision 505.
>>> * Dumped revision 506.
>>> * Dumped revision 507.
>>> WARNING: Mergeinfo referencing revision(s) prior to revision 250,
>>> which is older
>>> WARNING: than the oldest dumped revision (504). Loading this dump may result
>>> WARNING: in invalid mergeinfo.
>>> * Dumped revision 58.
>>> * Dumped revision 59.
>>> * Dumped revision 510.
>>> * Dumped revision 511.
>>> .
>>> .
>>> .
>>> WARNING: The range of revisions dumped contained references to
>>> mergeinfo outside that range.
>>>
>>> Also, as mentioned earlier, I think this warning should kick in
>>> regardless of whether --incremental is used.
>>>
>>> 5) Overload svndumpfilter's --skip-missing-merge-sources option to
>>> also remove mergeinfo that predates the starting revision of the dump
>>> stream.
>>
>> +1 all over this stuff.
>>
>>> 6) Open: How does svnadmin load deal with mergeinfo that doesn't exist
>>> in the load steam or the target repos:
>>>
>>> A) It doesn't.
>>>
>>> B) Only checks that the revs are valid (fast, but
>>> can allow mergeinfo that describes invalid
>>> mergesource:rev pairs)
>>>
>>> C) Find an efficient way to test for valid
>>> mergesource:rev pairs
>>
>> For now, A (or maybe B). Then we have our Paul back. :-)
>>
>
> Been down the rabbit hole on-and-off the last couple weeks with this.
> Did everything we discussed so far in this thread, plus a slew of new
> bugs, see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc15
> and onward.
>
> At any rate, after my recent work we are left with one common(?) use
> case still broken:
>
> Loading a sequence of incremental dumps when the fist load in the
> sequence is into a *non-empty* repostitory.
>
> This is tested in svnadmin_tests.py 20 'don't filter mergeinfo revs
> from incremental dump', specifically see '# PART 4: Load a a series of
> incremental dumps to an non-empty repository."
>
> Basically the problem is this:
>
> 1) Given repository ReposA with X revisions in it.
>
> Note: Previously this use case was also broken if ReposB
> was empty to start, but this now works (assuming none of
> the incremental dumps in step 3 were svndumpfiltered in
> such a way as to remove/renumber revisions).
>
> 2) Given repository ReposB with Y revisions in it.
>
> 3) Incrementally (but fully) dump ReposA
>
> svnadmin dump ReposA -r0:100 > A.0.100.dump
> svnadmin dump ReposA -r101:200 --incremental > A.101.200.dump
> svnadmin dump ReposA -r201:300 --incremental > A.201.300.dump
> .
> .
> .
> svnadmin dump ReposA -rW:X --incremental > A.W.X.dump
>
> 4) Load the ReposA incremental dumps to ReposB
>
> svnadmin load ReposB --parent-dir /some/subtree < A.0.100.dump
> svnadmin load ReposB --parent-dir /some/subtree < A.101.200.dump
> svnadmin load ReposB --parent-dir /some/subtree < A.201.300.dump
> .
> .
> .
> svnadmin load ReposB --parent-dir /some/subtree < A.W.X.dump
>
> The problem arises when one of the incremental loads contains
> references to mergeinfo source revisions that predate the start of the
> dump. For example, say A.201.300.dump contains mergeinfo referencing
> r82. When this is loaded into ReposB the reference should be changed
> to r(82+Y) to reflect the fact that ReposB has Y revisions in it
> before we ever started loading parts of ReposA. Currently no change
> is made and the source rev stays at r82, which is obviously incorrect.
>
> The attached patch fixes this problem. It takes all mergeinfo which
> predates the first revision in the dump stream and adjusts it by the
> difference between the head rev in ReposB and the current dump stream
> revision. Yes, this is a fairly fragile solution (as is our copyfrom
> sources mapping, which this is based on). If unrelated commits are
> made to ReposB between loads of the incremental dumps, then the
> mergeinfo source revs will be wrong; but it is *always* wrong right
> now, at least this would fix this use case. Opinions appreciated.
>
> And yes, there are still other potential problems: Think incremental
> dumps that are dumpfiltered such that some revs are dropped or
> renumbered. But I'm not sure we really want to solve *every* problem
> in this space, I'm not even sure we can (without bringing the
> performance of svnadmin load to its knees). I think this patch brings
> us to "good enough" with the caveat that ill-advised use of svnadmin
> dump/load and/or svndumpfilter can result in incorrect mergeinfo.
>
> Paul
>

text/plain attachment: FinalIssue3020.log.txt

text/plain attachment: FinalIssue3020.fix.diff.txt

Received on 2010-08-03 20:36:00 CEST

This message: [ Message body ]
Next message: Paul Burba: "Re: Mergeinfo not inherited due to file externals"
Previous message: Greg Stein: "Re: svn commit: r981989 - /subversion/trunk/subversion/libsvn_wc/status.c"
Next in thread: Julian Foad: "Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)"
Reply: Julian Foad: "Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)"
Reply: C. Michael Pilato: "Re: [RFC] Incremental dumps and mergeinfo (Was: Vetoing latest issue #3020 fix in 1.6.10)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]