Re: Rename tracking for merges

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 1 May 2012 17:04:51 +0100 (BST)

Hi Stefan.

Overall comment: yes, I think something like this could form the basis of a data storage for tracking renames.

We should think a bit more about how we would expect this to be used initially -- whether we make 'svn copy' and/or 'svn move' populate this data, or provide some other UI, initially.

Stefan Fuhrmann wrote:
> Data spec:
>
> * Introduce a new revprop svn:mergehints
> Rationales:
> - Being a move is a property of the change,
> not the node / node content.
> - The user must be able to identify renames /
> moves after the fact. She may also revise
> that decision afterwards.
> - Since add and del may be committed in separate
> revisions, an online analysis may be expensive
> ab become more sophisticated in the future.
> Having the revprop, we can use a separate tool
> that analysis the repo and stores the results
> in said revprop(s).

Agreed.

> * Info in svn:mergehints is non-authoritative.
> A client may ignore them in parts or entirely
> but it must issue a warning when it does.
> Rationales:
> - The intention is to aid the merge algorithm
> in its decision making - not to force it to
> mess up your source tree.
> - In some cases, the hints may be conflicting
> or not matching the actual change history
> (e.g. typo in path name)
> - In some cases, the algorithm might not support
> a specific hint or some of its parameters
> (backward compatibility)

Agreed.

> * Use the following format
> [...]
> Rationales:
> - We can add arbitrary keywords in later versions
> - Some descriptions may not fit well into a single
> line. So, allow for sub-info to be put on extra lines
> - Make the top-level hints and keywords clearly
> identifiable such that unsupported ones can be
> skipped generically.

The rationale makes sense. It's useful to see this format written down as a basis for discussion,
but too early to comment on the details. Two questions:

1. Are the paths mentioned in these hints full path-within-repository, or are they path-within-branch? If the latter, then we need to define a way of identifying branch roots, don't we?

2. This is one-way info (like our copy history in the
repository). Is this sufficient or should we have two-way info that has forward pointers as well as backward pointers? The algorithms that we're thinking of for merge, at the moment, work perfectly well with one-way info. But with our copy history we found in the end that it would have been much better to have two-way info. I don't know how we could reasonably do this with rename hints, without great risk of the forward and backward hints being incompatible.

> * "continue" from-path[@peg] [ from-rev ] to-path
> Specifies that the merge-relevant change history
> of from-path,from-rev is continued by to-path,
> current rev carrying this revprop. From simplicity,
> from-rev can be the deletion rev but the deletion
> will not become part of the combined history.
> Rationales:
> - Could also be called "link" but that might be
> mistaken for OS file system links.
> - Allow for @peg rev just for consistency with
> other UI
> - From-rev is useful to bridge gaps between the
> rev in which from-path was deleted and the one
> carrying this merge hint.

OK, this is fine. Minor nit: I'd simplify the from-rev specification to:

* "continue" from-path@[from-rev] to-path
from-rev is the deletion rev (the deletion will not
become part of the combined history); if omitted, it
means the deletion is in the current rev carrying
this revprop.

> * "ignore" path [[from-rev:]to-rev]
> [...]

I think this "ignore" feature needs higher level design.

> Semantics in a merge from path A to path B:
>
> * All mergehints from the YCA are being evaluated in
> revision order, separately for A and B.

OK.

> * If A and B are unrelated (no YCA exists), "continue"
> hints will be ignored.

If A and B are unrelated (no YCA exists), we don't support automatic merging (that is, merge tracking), so the case is rather uninteresting and probably out of scope. But I don't see why we should deliberately ignore the rename info on the source branch for that kind of merge. It would still be useful for seeing where a node has been renamed during its lifetime in the source branch. We wouldn't look at rename info on the target branch because we wouldn't be looking at a series of changes along the target branch. But if, as I say, this case is out of scope, we don't need to discuss it.

> * "continue" and "ignore" do not affect the merge of
> tree changes. However, the merge might use them to
> resolve tree conflicts. But that part will not be
> specified here.

[As I wrote on IRC...]

Not sure what you mean by not affecting the merge of tree changes.

I think "continue" should be used to track the renames of every path, files and directories, to find which path in the source branch corresponds to which path in the target branch. If the path in the source branch then gets deleted or renamed, the merge should do the same to the corresponding path (that it, the one matched by following the "continue" info) in the target branch. In that sense, it should apply to tree changes. Is there some specific kind of change or scenario where we'd want to deliberately ignore "continue" info?

If a path (say X/foo) is added in branch A, then that new path (X/foo) won't have any "follow" info, and no connection to a path in branch B. The merge should add a path with name "foo" inside the path in B that corresponds to X in A: so, tracking the "follow" info that leads to "X" in A, that same info leads to a path in B that might be "X2", so we create "X2/foo".

My overall thinking is shaped by the idea that we should treat a file and a directory the same way. That is, a file has properties and text-content, and when we merge a change into a file we merge a text-change and a properties-change into it. A directory has properties and a set of children, and so when we merge a change into a directory we merge a children-change (that is, we do any adds and deletes and moves, and we recurse into all the subdirectories) and a property-change.

So I think it's helpful to define an algorithm in a recursive way (recursing into directories, that is).

In terms of execution, that might be implemented as: first go through the whole tree noting what changes are needed, then go through the whole tree again making the tree-changes, then go through the tree again making the text-changes (on files) and property-changes (on any node kind). Or it could be implemented as: go through the whole tree just once, making all the changes to one node before processing the next node.

The point is, regardless of how it's implemented, if the behavioural description treats "a change to a node" as a single concept that applies equally to a file or a directory, I find that easier to understand and accept.

> * Starting from the YCA, the "continue" information
> is being collected and applied. For each node, this
> yields a list of changes (node, rev) on the source
> side A and a target path (node) on the target side B.
> If beneficial to the merge algorithm, we can also
> derive the change list (node, rev) of the target side.

OK.

> * "ignore" is then applied to reduce the entries in
> the source lists, i.e. "ignore" takes precedence over
> "continue".

Don't know; see above.

> * Finally, all text changes get applied to their
> respective target paths.

And property changes and tree changes, I assume; or do you think those need to happen at some other point?

> Notes:
>
> * The hints are being evaluated at merge time. Later
> changes done by the user will not "mend" previous
> merges but may help to produce better results in the
> future.

Agreed.

> * The design allows for N:1, 1:N and N:M merges
> (e.g. multiple continuations to the same target)> but we will support 1:1 only in our initial implementation.

OK, I get what you mean. We should explain that more fully, but it's advanced usage, so not too important, but I like it.

> Technically, we can use that feature to model changes
> in file granularity.

What does that mean?

- Julian
Received on 2012-05-01 18:05:28 CEST

This message: [ Message body ]
Next message: Philip Martin: "Re: svn ci performance issue with 1.7.x and nfs mounted working copies"
Previous message: Stefan Fuhrmann: "Rename tracking for merges"
In reply to: Stefan Fuhrmann: "Rename tracking for merges"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]