RE: different problems with merging and mergeinfo

From: Brandt, Servatius (External) <servatius.brandt.external_at_ts.fujitsu.com>
Date: Fri, 12 Nov 2010 14:34:14 +0100

Igor Radic wrote on November 11, 2010 9:48 PM
>
> [...]
> 1. Problem 1 - merging range influences merging results
> It is logical that merging range does NOT influence merging results.
> Meaning, I should get the same results when merging 100 revisions at
> once or 10 times 10 revisions.
> But we have noticed it is not so - in all cases the problem was somehow
> related with a deletion of file/directory.

Please look into issue 3324 which has been solved recently. If this is
your problem, it will be fixed in one of the upcoming releases. See:

http://subversion.tigris.org/issues/show_bug.cgi?id=3324

> Usually the difference is only seen in mergeinfo.
> But once we even had very strange case where two branches were created
> and the same range was merged from TRUNK (but in different steps).
> We ended up with 2 exact branches (content and mergeinfo) - at least
> according to Tortoise repo-browser comparison.
> But one branch could be re-integrated into TRUNK, and the other one
> could NOT.
> Are their any known limitations when choosing merging ranges?
> Our current rule is: when choosing merging range, the revision which
> deletes at least one file/folder must be first in the range.
> Does this make sense?

I assume that the two branches were created from the same trunk revision
(otherwise, the mergeinfo would very likely be different). If the same
merges have been made on both branches, but in finer steps on one of
them, the reintegrate merges to the trunk should work the same way. If
there was a difference (e.g. you got a tree conflict originating from
the deletion when trying to reintegrate one branch but not when
reintegrating the other one), this might have been caused by the bug
described in issue 3324.

You did not specify what reintegrate problem you encountered. If the
problem was not in the final merge step but already in the early phase,
when --reintegrate checks the reintegrate source, this could have been
caused by branching off from different trunk revisions. There is less
chance for these checks to fail in the branch copied from the newer
revision. But this does not really fit to your description of identical
content and mergeinfo (except you manipulated the latter one).

> 2. Problem 2 - order of actions influences mergeinfo
> I have a very simple example.
> I have TRUNK and two branches (X and Y) created from TRUNK.
> Now the work is finished in both branches and I want to re-integrate
> them back to TRUNK.

So you have merged everything from the trunk up to the HEAD revision
into both branches.

> But in the meantime, one file/folder in branch Y has mergeinfo (although
> it has no mergeinfo in TRUNK).

So Y has some subtree mergeinfo originating from some merge to
a subdirectory or file within Y.

> If I first merge branch X and then branch Y, the same file/folder in
> TRUNK will get mergeinfo - the one as in branch Y (except for TRUNK
> entries, of course), plus the final information about the merge of
> branch Y.

Yes, the subtree mergeinfo of your file/subdirectory is taken over from
Y into the corresponding file/subdirectory in the trunk, supplemented by
the mergeinfo for the reintegrate merge of Y to the trunk you are
performing. Note that any later merge to the root directory of the
trunk will also change the mergeinfo in that file/subdirectory.

> But if I first merge branch Y and then branch X, the mergeinfo will also
> contain information about merge of branch X.

For the same reason: as any later merge after the merge of Y will update
the subtree mergeinfo, you will also see there information about the
reintegrate merge of X if you reintegrate X later than Y.

> I understand why this happens, but I don't understand why such behaviour
> is wanted.
> I mean, what exactly do I get with this information when it is obvious
> that in branch X this particular mergeinfo was not added/changed?

To understand why this happens: whenever you do some operation that
needs mergeinfo ("svn log -g", "svn blame -g", "svn merge") on
a subdirectory or file within your trunk, it looks for explicit
mergeinfo in that subdirectory/file. If there is none, it looks up
the parent directory etc. This causes "svn log -g" to work on
a subdirectory/file that does not have explicit mergeinfo when there is
one on the root directory of your trunk.

But if you now use "svn log -g" on that subdirectory/file within your
trunk that has got some explicit mergeinfo originating from a former
subtree merge to the corresponding subdirectory/file in Y, the search
for explicit mergeinfo on parent directory stops as it already found
this one. Therefore, this mergeinfo needs to have the complete
information about all merges, also the reintegrate of X done later than
that of Y.

In the above explanation I simplified a bit: Provided that the later
merge of X did not affect that subdirectory/file, the mergeinfo for the
reintegrate merge of X does not have to be propagated to the subtree
mergeinfo, and I think newer clients do optimize a bit here.

> 3. Problem 3 - how should "only mark as merged" option work?
> Again, a small example: there is branch X, then branch Y created from
> it, and branch Z created from branch Y.
> At first, all 3 branches have exactly the same contents and no mergeinfo.
> Now, I perform either test 1 or test 2, getting 2 different results.
> - Test 1:
> revision 10: change file A in branch X
> revision 11: merge revision 10 (branch X) to branch Y -> this changes
> file A and adds mergeinfo for X:10
> revision 12: merge revision 11 (branch Y) to branch Z -> this changes
> file A and adds mergeinfo for X:10 and Y:11

This merge from Y to Z does three things:
a) it merges the change on A from Y to Z
b) it merges the mergeinfo change (X:10) from Y to Z
c) it adds mergeinfo for the merge being done here: Y:11

> revision 13: re-integrate branch Z to branch Y -> no content is changed,
> mergeinfo is added for Z:12
>
> - Test 2:
> revision 10: change file A in branch X
> revision 11: merge revision 10 (branch X) to branch Y -> this changes
> file A and adds mergeinfo for X:10
> revision 12: accidentally perform the same change of file A in branch Z
> revision 13: now that it is clear that this is the same change as in
> Y:11, only mark as merged Y:11 -> no content is changed, mergeinfo is
> added for Y:11 (but not for X:10)

Revision 11 in test 2 consists of two changes: the change to file A
in Y got by the merge from X and the mergeinfo X:10 that was added to Y.
In the previous step (revision 12 in test 2), you only replayed the
first of these changes to Z, but not the second one. Revision 12 in
test 2 did only operation a) from revision 12 in test 1.

Before you can make the result of revision 12 in test 2 look like
a merge from Y to Z as in test1, you must first replay the other missing
change b).

So first do a record-only merge of X:10 to Z to add the same mergeinfo
a real merge would have got from merging the X:10 mergeinfo on Y to
Z (this replays action b from test 1), and then do a record-only merge
of Y:11 to Z to make it look like a real merge from Y to Z (this replays
action c from test 1).

The --record-only option does just what it says: it does not merge
any changes (so it does not merge mergeinfo changes from the source
branch), it only records as merged what you specified.

> revision 14: re-integrate branch Z to branch Y -> no content is changed,
> mergeinfo is added for Z:12-13, but also removed for X:10

The reintegrate merge does NOT replay all changes on Z one after the
other to Y. It is much simpler: it computes the difference between
Y and Z and applies this difference to Y. In order not to remove former
changes on Y, it is very important that ALL changes on Y have been
merged to Z before reintegrating Z to Y. Otherwise, the difference
would not only add changes from Z to Y but also remove former changes
from Y which have not been merged to Z, when applied to Y. (I am
simplifying to make this better to understand.)

The reintegrate merge performs some safety checks to ensure that you
have not left out any revisions from Y on your merges to Z. It uses the
merge tracking information to do so. Now in test 2, you have actually
skipped such a merge (Y:11) (and instead made a separate file change
with the same modification in revision 12). This would have prevented
the reintegrate merge from doing anything (to be correct: provided that
another merge from Y to Z was done after revision 13).

But then in revision 13, you promised by a record-only merge that you
actually did a merge of Y:11, although you forgot about action b.
The reintegrate merge believes you and does its job. The result is that
the thing (mergeinfo) you forgot to add on Z is thrown out on Y.

> In both tests the content is exactly the same.
> But in second test mergeinfo for X:10 is missing - meaning branch X was
> not merged to branch Y.
> Is this behaviour wanted?
> Why is it like that?
> What if we want also to keep mergeinfo changes when marking a revision
> as merged?
> Should we then also mark those other changes as merged?
> In other words - is this a bug in Subversion or it is exactly how it
> should work and instead we should use it a bit differently to achieve
> what we expect?

I think the questions are answered from the explanation above. This is
not a bug in Subversion. If you do an indirect merge (merging a merge
result: (X->Y)->Z), this is easiest understood by considering the
mergeinfo change of the merge revision X->Y itself an object of merging
on your merge to Z, to which mergeinfo for this merge to Z is just added
at the end of the operation.

> 4. Problem 4 - corrupt and obsolete mergeinfo
> Well, this is the most important question of all.
> In our project we have reached approximately revision 28000.
> In the beginning of our project we didn't know exactly how we should use
> merges and we made some bad actions.

Like removal of the trunk and moving/renaming a branch to be the new
trunk then, instead of using a 2-URL merge for reintegrating. I am
really embarrassed about that. :-) But this was in Subversion 1.2 times,
so this does not affect merge tracking, which was introduced in
Subversion 1.5.

> We have also removed some mergeinfos or manipulated it manually.
> Also lately we made some suspicious merges and corrections of mergeinfo.

As Paul Burba said in a presentation about "Merging and Merge Tracking":

"DO NOT hand edit or remove svn:mergeinfo properties unless you are sure
you know what you are doing (and recheck yourself)."

So you should not manipulate the mergeinfo. You might think some
mergeinfo is incorrect although it is not. (For example, when you do
all merges on the root directory, but in one branch a subdirectory has
been deleted and then copied again (not merged!) from a different
branch, it gets very complicated.)

OK, to "know what you are doing", read the Subversion book (nightly 1.6
version) thoroughly, and read all white papers about mergeinfo. This is
a lot of work, but otherwise you will come to wrong conclusions. Even
then, as Paul said, recheck yourself!

> Our mergeinfo contains information since the beginning in revision 1.

Actually later since merge tracking was introduced in Subversion 1.5.

> Let's say that we need mergeinfo for last 6 months of work and
> everything that is older than that can be considered obsolete.

Be careful with such assumptions. The mergeinfo can be helpful
particularly for very old changes everyone in your team forgot about.
For instance, if you have doubts about some area in the code, you can
use "svn blame -g" to see where each line came from (even when it was
reintegrated from a branch that has been deleted long ago), and you can
get information from the log messages why the line was changed and what
other changes were made along. (On the TortoiseSVN blame dialog, choose
"Include merge info".)

Of course, even without mergeinfo, log messages from the deleted
branches are still available in the repository. But it would be much
harder to find them, if you don't keep the references where the merge
results came from.

You are not forced to make use of the mergeinfo, but you should not
prevent your colleagues from using it. You may make their life
unnecessarily difficult, and you may cause unneeded costs for your
company.

Generally, it is a bad idea to remove information from a source control
system, which is made to keep information from history.

> Is it possible that we somehow remove some obsolete information?

As there is no obsolete information, this is not possible. But if you
like, just ignore the information.

> Is such action dangerous?

That depends on several factors: what additional effort these actions
will cause in future, whether the cause of these actions is recognised
by anybody in your team, whether this information is passed to your
boss, whether he likes you, and how he reacts. Such actions will not
crash the repository. Such actions may make the work of your team less
efficient. Source control is intended to make that work more efficient.

> If we are allowed to do that, what are the limitations or rules when
> doing that?

Do not ask anybody for permission except your boss, so you can blame him
later. But be honest and tell him that you want to do things in your
special way not documented in the Subversion book and that you want to
remove Subversion-internal information maintained or used by the
Subversion commands documented in the Subversion book (maintained by
"svn merge", used by "svn log -g", "svn blame -g", "svn mergeinfo").

If you ignore all warnings, there are no limitations or rules.

> I suppose we should do such action in TRUNK first, am I right?

Not at all. The trunk is where your colleagues will continue to work
all the time, and this is the place where they most benefit from the
mergeinfo on log and blame commands (when using the option).

> I also suppose we should only remove mergeinfo about old dead branches,
> not the ones that are still active and being merged to.

Removing mergeinfo in branches being merged to would be very silly.
You would have to track what has been merged yourself to avoid conflicts
on re-merging revisions that have already been merged. And
reintegrating such a branch would be dangerous.

You should also not remove mergeinfo about dead branches, because this
information may be very helpful (svn log -g, svn blame -g), and there
would be no benefit from removing it.

Some general remark: we have two kinds of information about the history
in a Subversion repository. The first is the ancestry: where a file
comes from, usually from a previous revision on the same path, but some
times from a previous revision on a different path, e.g. when a branch
has been taken. The ancestry is not so helpful for a merge revision,
because the ancestry of the merge source tells more about that change as
the ancestry of the merge target. The ancestry builds the copied-from
information, the mergeinfo builds the merged-from information, which
connects to the ancestry of the merge source.

Both the copied-from and merged-from information are helpful. The
merged-from information can be removed (or manipulated), although
it is not wise to do so, but the copied-from information can't -
hmm... I think there is a way... but I won't betray it. :-)

Finally, let's talk about bogus mergeinfo left behind from 1.5 and early
1.6 clients. Can this be removed?

First answer: yes, it can, although there is no need to do so. If you
do, leave this to someone who read and understood Paul's white papers
about mergeinfo and "knows what (s)he is doing (and rechecks
him/herself)", particularly as there is such a person in your team.

Second answer: Why bother about 1% of bogus mergeinfo when it does not
harm and when you can benefit from the 99% of correct mergeinfo?
I prefer to benefit from the mergeinfo in 99% of the cases when doing
"svn log -g", even if I would need some extra time to interpret the
results in the 1% where the mergeinfo is wrong, because there is a big
time saving overall. This is much better than suffering of removed
mergeinfo in 100% of the cases when I use "svn log -g". And Subversion
does recognize (some) bogus mergeinfo, so I won't even get wrong
information for the whole 1% of it. For more about this, see below.

Third answer: There is no problem in removing (or correcting) bogus
mergeinfo (only the bogus one!!), if it is done by someone "who knows
what (s)he is doing". Then you could benefit from 100% correct
mergeinfo.

Another remark about bogus mergeinfo. I have had the experience that in
some cases an "svn log -g" did not supply the log entries from the
merged revisions. This happens when Subversion recognizes bogus
mergeinfo and ignores it. This behavior has been recently improved: in
future, Subversion will only ignore the bogus part of the mergeinfo but
honor the rest of it. See issue 3270.

http://subversion.tigris.org/issues/show_bug.cgi?id=3270

> In advance, thanks a lot for your answers.
> I hope you can help me and my colleagues.

I hope I did.

- Servatius
Received on 2010-11-12 14:34:55 CET

This message: [ Message body ]
Next message: San Martino: "Re: Question about performance and space"
Previous message: Erik Huelsmann: "Re: svnserve.exe (Win32) using 2GB of memory and then crashing?"
In reply to: Igor Radic: "different problems with merging and mergeinfo"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]