[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Merge tracking proposal

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2006-05-04 12:55:10 CEST


I've read through and come up with a list of questions as they occurred to me.
  Some are rather open-ended; most are basically asking for more information or
explanation. I don't expect you to answer them all in full (e.g. providing a
full set of use cases and examples in response to my "Use cases?" question).
If you think about them and try to answer most of them within the next draft of
the proposal, I'll read that and see if there is anything still unclear.

Daniel Berlin wrote:
> As part of this, I have come up with a design I plan on implementing for
> tracking what revisions have been merged where, in a manner that is
> suitable for use by various other operations (history sensitive merging,
> etc).

Scope? That is, what are the limits of the kinds of merging this is intended
to support? Automatic merges client-side? Server-side? Manual merges (i.e.
where no "svn merge" command was used)? More?

> In doing so, I reviewed the use cases that were kindly written up, and
> believe that most if not all of them can be accomplished with this
> design.

Can you give references to these use cases?

Examples of how this design works in practice in these cases?

(Trunk-to-release-branch, feature-branch-to-trunk, repeated merging, vendor
branch, undoing a change, ...)

It would be informative to apply this algorithm to the merges that have already
been done on Subversion's repository, to see what the result is. For instance,
that might give a reasonable indication of whether the lists of revisions are
going to grow too long to be considered human-usable, as someone wondered.

> Goals:
> The overarching goal here is to track the revision numbers being merged
> by a merge operation, and keeping this information in the right places
> as various operations (copy, delete, add, etc) are performed.
> The goals of this design are:
> 1. To be able to track this down to what files in a working copy and be
> able to determine what files have had what revisions merged into them.
> 2. To not need to contact the server more than we already do now to
> determine which revisions have been merged in a file or directory (ie
> some contact is acceptable, asking the server about each file is not).
> 3. To be able to edit merge information in a human editable form.
> 4. For the information to be stored in a space efficient manner, and to
> be able to determine the revisions merged into a given file/director in
> a time efficient manner.
> 5. Still getting a conservatively correct answer (not worse than what we
> have now) when no merge info is specified.
> 6. To be able to collect, transmit, and keep this information up to date
> as much as possible on the client side.
> 7. To be able to index this information in the future order to answer
> queries
> Specific Non-goals for *this design* include:
> 1. Doing actual history sensitive merging
> 2. Curing cancer (aka being all things to all people)

> The one argument i continually have with myself is whether to store info
> in revprops, or just on dirs and files. If you want to try to
> convincingly argue one way or the other, go for it. Certainly, I think
> it makes certain semantics clearer on what operations do below and how
> to proceed easier, the question is whether it is efficient enough time
> wise when we go to retrieve merge info, and whether it complicates what
> merge has to do too much. It also removes all of the listed
> pre-reqs :).

In this design, what purpose does the rev-prop serve? Aren't you using it for
just the same purposes that you would use a property on the repository root
directory, and yet, being a rev-prop, it has completely different behaviour? I
don't see why you would want to do that.

(I've now seen your later comment that you're on the brink of throwing away the
rev-prop part of this proposal. +1 on that.)

> One could also try to argue that we should start with exactly the same
> cases svnmerge does (IE only allow merge info at the wc roots, only
> store it on that directory, etc), with a nicer integrated interface, and
> try to expand it from there. I am open to such an argument as well. :)

> Information storage
> The first question that many people ask is "where should we store the
> merge information" (what we store will be covered next).

Well, they may ask, but it doesn't make much sense to discuss this until we
know what information is to be stored.

> A merge info property, named SVN_MERGE_PROPERTY (not the real name, I
> have made it a constant so we can have a large bikeshed about what to
> really call it) stored in the revision properties, directory properties,
> and file properties.
> Each will store the *full, complete* list of current merged in changes,

Complete list of what? The merge-prop on an item (say directory /d1/d2) shall
list all the changes that have ever been merged into this item, including
indirectly (via merging a change that partly consisted of a previous merge),
and including any merges to its parent (/d1) or grandparents that modified it?

Is there significant duplication of information among these lists? (I can't
tell yet.) If so, that is likely to make manual editing unsafe.

> as far as it knows. This ensures that the merge algorithm and other
> consumers do not have to walk back revisions in order to get the
> transitive closure of the revision list.

Could you expand on this? I don't follow especially "walk back revisions" and
"transitive closure".

> The way we choose which of file, dir, revprop merge info to use in case
> of conflicts simple system of inheritance[1] where the "most specific"
> place wins. This means that if the property is set on a file, that
> completely overrides the directory and revision level properties.

> As for what is stored:

> revisionline -> PATHNAME COLON revisionlist
> top -> revisionline (NEWLINE revisionline)*

Semantics? This merge history ("top"), existing on a file, dir or repo,
specifies all the changes that have ever been merged into this object (file,
dir or repo) within this repository. It specifies the sources of the merges,
(and thus two or more pathnames may be required to represent one source object
at different revisions due to renaming). Is that right?

What is the peg revision for PATHNAME? Something like "rev" for each "rev" in
the list, such that a single "revisionline" can list changes taken from more
than one source object?

The merge history for a file is a subset of the history lines for its dir, and
the history of the dir similarly of its immediate parent, so on upwards? Or
not - are intermediate dirs allowed to have no history? How is that
relationship maintained?

How do you handle the indirect merge situation (merging a change that contains
a previous merge)? Do the revision numbers of both the earlier, little merge,
and the later, bigger merge that includes the little one, appear in the
list(s)? For instance,

   r10 modifies /branch1/f and /branch1/g

   r12 merges r10 from /branch1 into /branch2
     /branch2 says "/branch1:10"

   r14 merges r10 from /branch1/f into /trunk/f
     /trunk/f says "/branch1/f:10"

   r16 merges r12 from /branch2 into /trunk (carefully avoiding repeating the
r14 part of r10, as it's already known to be here)

What do /trunk, /trunk/f, /trunk/g say?

> svn add: No change to merge info
> svn delete: No direct change to merge info (indirectly, because the
> props go away, so does the merge info for the file)

I half-understood that some parents/grandparents might store copies of the
merge info that is on this object. If so, and if you don't explicitly remove
the other copies of this info from the parent dir(s), won't obsolete history
build up, that is not incorrect but is annoying?

(I can see that it may be difficult or impossible to determine what info can be
removed from the parents.)

> svn rename: No change to merge info
> svn copy: Copies the merge info from the source path to the destination
> path, if any.
> This includes copying info from revprops, if necessary, by determining
> if the merge info exists in a revprop for the last changed commit for
> the source path, and copying it to the new revprop if it does (someone
> probably needs to check if this is the right semantic :P)
> All copies are full-copies of the merge information.
> svn merge: Adds or subtracts to the merge info, according to the
> following:
> Where to put the info:
> 1. If the merge target is a single file, the merge info goes to the
> property SVN_MERGE_INFO set on that file.
> 2. If the merge target is a non-wc-root directory, the merge info goes
> to the property SVN_MERGE_INFO set on the directory
> 3. If the merge target is a wc-root directory, the merge info goes to
> the property SVN_MERGE_INFO set on the revprop.

Why the difference between wc-root and non-wc-root? How do you determine
whether a directory specified in a client operation is a wc-root or non-wc-root?

I saw a later message saying that by "wc-root" you meant "longest common
ancestor path" of a commit operation, but I still don't understand. A commit
is not necessarily going to be done until well after the merge command and
potentially other merges and other operations have been done in the WC.

Is this all to do with the fact that you need write access to the properties of
some parent directory which may not be present or may not be locked for write

> What info is put:
> 1. If you are merging in reverse, revisions are subtracted from the
> revision lines, but we never write out anti-revisions. Thus, if you
> subtract all the merged revisions, you just get an empty list, and if
> you do a reverse merge from there, you still get an empty list
> 2. If you are merging forward, the revision(s) you are merging is added
> to the revision line

These (1 and 2) seem reasonable.

When a merge has been performed in the WC but not yet committed, and a merge
has been committed to the repository in the meantime, how is "svn update" going
to merge the latest repository version of the merge-history property into the
WC version of it - (a) when the update goes smoothly, and (b) when the update
has conflicts?

As you have had various bits of feedback already, I think it would be useful if
you could post the latest revision of the proposal soon, regardless of how much
it addresses my questions.

- Julian

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 4 12:55:41 2006

This is an archived mail posted to the Subversion Dev mailing list.