Re: short question about merge [PROPOSAL] vs. tree-deltas

From: Tom Lord <lord_at_emf.net>
Date: 2003-04-17 21:08:34 CEST

Let's say that I say: `svn merge ORIG MOD TGT'.

The nested contents of MOD may be arbitrarily rearranged from ORIG.

The merge algorithm has to look at the two trees and figure out what
in ORIG corresponds to which in MOD. It has to know what has been
renamed, and what to compare to what.

In the general case, with absolutely no usage restrictions, this is
basically impossible. Noderev ancestry, a hypothetical "perfect
copy/rename history for each noderev" -- both give results that are
either ambiguous, or that rely on disambiguation rules that are hard
to control and mess up merging generally.[1]

The only way to make this a tractable problem is with restrictions
(either enforced or advisory) on how people use copy and rename.
The best restrictions I've been able to think of are:

        1) "Copy" is normally used only to create tags and branches,
           and only on the top-level directories of project trees.
           It's almost never used to copy anything _into_ a project
           tree, unless that copy is copying in files and dirs that
           don't already exist in the target tree. "Copy" is never
           used _within_ a project tree. [I'll slightly relax that
           last constraint later on.]

        2) "Rename" is used only to rename tags and branches, or to
           rename files and directories _within_ a project tree, but
           it is almost never used between project trees, except when
           the renamed-into tree doesn't already have _any_ of the
           files being added by the rename. In other words, "rename"
           isn't permitted to create "two copies of the same logical
           file" within a project tree.

That's _fairly_ simple advice to give users. Better still: it'd
be really easy to implement some higher-level commands, let's call
them "scm-copy" and "scm-rename" which enforce those restrictions.

One problem, though, is that sometimes I certainly want "nested
projects", and even source trees that contain multiple (but variant)
instances of a given "nested project". Nested projects can easily
violate the invariants that the copy/rename restrictions attempt to
preserve.

But that might be ok: if I can convince the merge commands to stop at
sub-project boundaries. That is, if I merge in some project tree, if
the merge simply ignores any "nested projects" then there's no
problem. I can merge those subtrees in a separate step -- or even
make a little script that merges them recursively, when it can be told
succinctly what to merge with what.

Under those restrictions, when I "add" a file to a project tree, I
could think of that file as acquiring a "logical identity" in that
tree. "Rename" preserves that identity. If I wind up adding a copy
of the file to a different branch of the same project: it has the same
"logical identity" in that other branch. The invariant here is that
each project tree contains at most one file with a given logical
identity.

So, why muck around with rename histories at all? I simply don't need
them and they're a bear to implement efficiently and accurately.
Instead, whenever I "add" a file to a project tree, I can give it a
cookie -- a property that contains a unique ID representing the
"logical identity" of the file. (I guess this would be a property on
the noderev, automatically inherited to all descendent noderevs
unless explicitly overridden.) If a merge operation copies that file
to another branch, it'll still have the same cookie. When a merge
operator needs to compare two trees, it doesn't have to muck with
rename histories at all -- it can just compare those id cookies.

Earlier, I said "Copy is never used _within_ a project tree." We can
relax that constraint at the cost of making intra-tree copies more
expensive. Copy _within_ a tree has to change the logical ids of the
copied objects. I guess that could either be done eagerly, or with a
lazy mechanism comparable to the way new copy_ids are assigned, though
even if done lazily, a client must never observe two path@x noderevs
within the same project tree to have the same id. (The other
restrictions about copy and rename can be relaxed similarly, at
similar expense.)

The big thing is that every noderev has this "id" property; normally
that property is the same for all revisions of a given node_id.copy_id
and all other node_id.copy_ids descended from those; "project trees"
have at most one noderev with a given id; copying within a project
tree assigns new ids to the copied objects. If you need two subtrees
of a project tree which have duplicate ids -- then those must be
nested subprojects and merge will never see them while working on the
parent tree.

While v.a.p. only ever compares two trees which have a direct ancestry
relationship, the id cookies make it possible to compare any two
trees, regardless of ancestry. There's plenty of _other_ merge
techniques that can make good use of that, in case you're not so sure
yet that v.a.p. is the One True Merge Technique.

I had a moment of darkness where I thought: you know, it'll never fly.
Svn hackers will just balk at the idea of introducing project tree
boundaries and restrictions based on them --- even if you talk about
implementing them without disturbing the lower level project-less
generality of the fs. But I also had the hope that, maybe if they
bang their heads long enough against the brick wall of trying to solve
the tree-delta mechanism absent these restrictions, the project-tree
trick will start to look like a pretty clean and simple alternative
that doesn't take away much if any flexibility.

-t

[1] I can explain that in detail but wanted to keep this post short.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Apr 17 20:58:14 2003

This message: [ Message body ]
Next message: Greg Hudson: "Re: New user, file timestamp question."
Previous message: Seth Delackner: "Re: segfault on recover AND dump. is my repository doomed?"
In reply to: Sander Striker: "RE: short question about merge [PROPOSAL] vs. tree-deltas"
Next in thread: Sander Striker: "RE: short question about merge [PROPOSAL] vs. tree-deltas"
Reply: Sander Striker: "RE: short question about merge [PROPOSAL] vs. tree-deltas"
Reply: Greg Stein: "Re: short question about merge [PROPOSAL] vs. tree-deltas"
Reply: Jack Repenning: "RE: short question about merge [PROPOSAL] vs. tree-deltas"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]