[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: inconsistency between mergeinfo records

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Fri, 10 Jul 2015 01:23:46 +0200

On Thu, Jun 25, 2015 at 3:29 PM, Stefan Hett <stefan_at_egosoft.com> wrote:

> Hi,
> as promised, answering the remaining questions now:

Hi Stefan,

First of all, thank you for the detailed feedback! It is very helpful.

I spent the last two weeks refactoring and reworking the tool. The main
* explicit --verbose mode, much quieter without it
* progress output
* only one common 'normalize' sub-command; actions selected by options
* 'analyze' and the new 'remove-branches' sub-command use the same code
  as 'normalize' and should therefore be consistent
* faster processing with large number of branches and / or high latency

I must also admit that the old 'normalize' command has a flaw that would
in the removal of sub-tree mergeinfo that was NOT redundant. The 'analyze'
output was correct, though, and the problem only manifested when sub-tree
mergeinfo could be completely removed. To check whether you have been
do the following:

* c/o the revision before any m/i changes were committed.
* run the latest tool 'normalize --remove-redundant --remove-obsoletes'
* run 'svn pg svn:mergeinfo -R /path/to/working/copy --xml | grep "path= " '
  to get a list of nodes that still have mergeinfo on them
* run the same 'svn pg ...' command on the committed changes produced by
  the old tool
* compare the output, looking for m/i that only the old tool removed
* if need be, manually fix them

>> If you have any time requirements/considerations on your side which
>> would require/benefit from earlier feedback, pls let me know.
> Right now, we are all working towards the 1.9 RC. Feedback
> in May or June would be nice.
> The key question that I like to see answered is "Does the
> tool do something useful?" For instance, it might become
> ineffective in complex setups, we might need to add detection
> of "mismatched" branches etc. We might also end up with
> mergeinfo that is technically smaller but neither faster to
> process nor easier to understand.
> Overall I think this is a really great tool and is really valuable to
> administrators who have been running larger instances over a longer period
> of time.
> Initially the output of the analysis-log is kinda bloated. In my initial
> run the output produces a 2MB log-file. After reducing the amount of
> mergeinfo records (using normalization and dropping merginfos from obsolete
> branches) the output is quite good/reasonable. Some kind of documentation
> explaining the different output statements mean and what the admin/user
> could do about it would be helpful though I think.

The commands have comprehensive documention now.
The "what to do about it" part is yet to be addressed.

> Also it'd be good to add a more automated "one-step" command to simplify
> the usage even further. So a user/admin could simply start the tool (for
> instance svn-mergeinfo-normalizer clean-up-mergeinfo [path]
> -drop-obsolete-branches) which would more or less equal running the tool
> several times in the following sequence:
> svn-mergeinfo-normalizer.exe clear-obsoletes [path]
> svn-mergeinfo-normalizer.exe normalize [path]
> svn-mergeinfo-normalizer.exe combine-ranges [path]
> svn-mergeinfo-normalizer.exe analyse [path] -stats
> (where I'd envision the -stats param for the analyse command would print
> out a summary of how many remaining mergeinfos could not be normalized (if
> any) and pointing the user to run the full analysis step to get a more
> detailed output).

Try the new command structure and options. Is that roughly what you had in

> For the long term I hope that the functionality provided by this tool
> would become obsolete and the issues for which you have to use this tool
> are dealt with directly in the SVN core so these would not surface at all
> anymore (aka: no need to normalize mergeinfos manually).

Newer releases of SVN try to elide sub-tree mergeinfo as they go.
However, they can't be as thourough as this tool (for performance
reasons) and will not "fix" old mergeinfo. The one thing that it will
probably never do is remove mergeinfo for deleted branches because
that is a potentially destructive operation and only o.k. if you never
want to merge from those deleted branches again (99.9% of users).

A completely rewritten branching and mergeing logic may solve
the problem on a fundamental level.

> So, there are the things that I'd love to get some feedback on:
> * Does the tool work at all (no crashes, nothing obviously stupid)?
> I experienced no crashes and the output was quite clear to me (after
> facing the initial quite bloated analysis output ).
> * Is the result of each reduction stage correct (as far as one can tell)?
> Already pointed out a few cases in my other replies. Will start a new
> thread to keep this with the further remaining cases I think I found.
> * Is the tool feedback intelligible? How could that be improved?
> As suggested above some means to get a more statistical output especially
> for the initial run might be helpful. The header information atm is already
> a good start, but maybe adding/cleaning-up the output a bit further to
> produce maybe some statistic log would be more useful for the first run.
> For instance atm the analysis-output reports the actual non-existing
> branches for each path the tool checks-out. In my case that's around 100
> branches for each of the 400 paths... -> over 40.000 lines of branch info.
> More useful would be a list at the top with branches being obsolete (it's
> implicit that all subdirectories into the branch is obsolete if the parent
> path is non-existand).
> With the added reporting of obsolete branches this is even worse now.

With the latest changes, 'analyze' will only show "offending" branches and
details by default. In --verbose mode, all branches are listed, but only
once per
node (plus a summary of remaining branches).

Also, there is now a summary listing of all deleted branches that were

> The other thing might be to add some stat-output to normalize /
> combine-ranges / clear-obsoletes to report how many mergeinfo entries could
> be normalized, or how many obsolete paths were removed.
> Since the commands can take a few minutes to run, some kind of "progress
> output" might also be useful, so the user knows the process did not
> deadlock or ran into an endless loop.

There is progress info now while the log gets downloaded and for the
'normalize' command processing when not in --verbose mode.

> * How effective is each stage / mergeinfo reduction command?
> * How often does it completely elide sub-tree mergeinfo?
> * What typical scenarios prevented sub-tree mergeinfo elision?
> I guess this was already answered by sending you the log files.

Yup. In particular, combining ranges was more effective than expected.

> Up to here, you don't need to commit anything. If you are
> convinced that the tool works correctly, you may commit
> the results into some toy copy of your repository. Then the
> following would be interesting:
> * Are merges based on the reduced mergeinfo faster?
> * Do merges based on the reduced mergeinfo use less memory?
> * Any anomalies?
> I didn't spot any anomalies so far. With regards on performance and
> memory consumptions I can't provide any numbers. One common use-case which
> is now significantly faster though is to merge changes from one to the
> other branch, since it now only contains a few nodes with mergeinfos while
> before it had to commit up to 400 nodes changes... So this to us is a
> really significant improvement.

I think the tool will be shipped with 1.10. The only problematic part is
many vendors don't ship the tools but only core binaries. Maybe, it gets
merged into another tool.

-- Stefan^2.
Received on 2015-07-10 01:23:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.