Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> https://issues.apache.org/jira/browse/SVN-4667
[...]
>>
>> The branches involved have subtree mergeinfo on over 3500 files, each referring
>> to about 350 branches on average, and just over 1 revision range on average per
>> mergeinfo line. Average path length is under 100 bytes.
>
> What is the result of 'svn pg "svn:mergeinfo" -R | wc -c'?
120 MB.
> > [...]
> > tools "svn-mergeinfo-normalizer" and "svn-clean-mergeinfo.pl" both also fail to
> > execute in the available RAM.
>
> You may run svn-mergeinfo-normalizer on arbitrary sub-trees.
Yes, and I may explore this further. I will note that we're already
dealing with a subtree (the attempted merges and the mergeinfo reported
above all refer to a subtree of the entire branch) as a whole-branch
merge had become impossible since some time ago.
> A lot of memory will be used to hold that part of the repository
> history that is relevant to the branches mentioned in the m/i.
> This may easily grow to several GB if there have been tens of
> millions of changes.
The number of revisions in the repository is about 1 million.
> If the tool manages to read the mergeinfo, it will print m/i
> stats before fetching the log. Does it get to this stage?
I'll see if I can find out.
[...]
>> I would like to try a different approach. We read, parse and store all the
>> mergeinfo, whereas I believe our merge algorithm is only interested in the
>> mergeinfo that refers to one of exactly two branches ('source' and 'target') in
>> a typical merge. The algorithm never searches the 'graph' of merge ancestry
>> beyond those two branches. We should be able to read, parse and store only the
>> mergeinfo we need.
>
> That seems to be the path to take. I would have assumed that we only
> need the m/i for the source branch as the target m/i is implied as
> being all of the target history.
>
> > Another possible approach could be to store subtree mergeinfo in a "delta" form
> > relative to a parent path's mergeinfo.
>
> I can see two problems here. First, you can only use the new scheme
> after all "relevant", i.e. merging, clients have been upgraded.
No, I meant just convert it to delta form when reading it into memory. I
wasn't proposing a format change of the stored svn:mergeinfo property.
> More importantly, the in-memory data model would need to be something
> delta-like. That sounds like a lot of code-churn.
Sure, not trivial!
Thanks for the interest.
- Julian
Received on 2017-01-04 16:02:19 CET