[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: #4667, Merge uses large amount of memory

From: Julian Foad <julianfoad_at_apache.org>
Date: Wed, 4 Jan 2017 15:02:09 +0000

Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> https://issues.apache.org/jira/browse/SVN-4667
[...]
>>
>> The branches involved have subtree mergeinfo on over 3500 files, each referring
>> to about 350 branches on average, and just over 1 revision range on average per
>> mergeinfo line. Average path length is under 100 bytes.
>
> What is the result of 'svn pg "svn:mergeinfo" -R | wc -c'?

120 MB.

> > [...]
> > tools "svn-mergeinfo-normalizer" and "svn-clean-mergeinfo.pl" both also fail to
> > execute in the available RAM.
>
> You may run svn-mergeinfo-normalizer on arbitrary sub-trees.

Yes, and I may explore this further. I will note that we're already
dealing with a subtree (the attempted merges and the mergeinfo reported
above all refer to a subtree of the entire branch) as a whole-branch
merge had become impossible since some time ago.

> A lot of memory will be used to hold that part of the repository
> history that is relevant to the branches mentioned in the m/i.
> This may easily grow to several GB if there have been tens of
> millions of changes.

The number of revisions in the repository is about 1 million.

> If the tool manages to read the mergeinfo, it will print m/i
> stats before fetching the log. Does it get to this stage?

I'll see if I can find out.

[...]
>> I would like to try a different approach. We read, parse and store all the
>> mergeinfo, whereas I believe our merge algorithm is only interested in the
>> mergeinfo that refers to one of exactly two branches ('source' and 'target') in
>> a typical merge. The algorithm never searches the 'graph' of merge ancestry
>> beyond those two branches. We should be able to read, parse and store only the
>> mergeinfo we need.
>
> That seems to be the path to take. I would have assumed that we only
> need the m/i for the source branch as the target m/i is implied as
> being all of the target history.
>
> > Another possible approach could be to store subtree mergeinfo in a "delta" form
> > relative to a parent path's mergeinfo.
>
> I can see two problems here. First, you can only use the new scheme
> after all "relevant", i.e. merging, clients have been upgraded.

No, I meant just convert it to delta form when reading it into memory. I
wasn't proposing a format change of the stored svn:mergeinfo property.

> More importantly, the in-memory data model would need to be something
> delta-like. That sounds like a lot of code-churn.

Sure, not trivial!

Thanks for the interest.

- Julian
Received on 2017-01-04 16:02:19 CET

This is an archived mail posted to the Subversion Dev mailing list.