Re: Moves in FSFS

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 17 Sep 2013 11:55:07 +0100 (BST)

>> Branko ÄŒibej <brane_at_wandisco.com> writes:
>>> That said, I still do not understand why a different ID would be needed
>>> before the copy-on-write happens. Is it because the client doesn't have
>>> the full history available? [...]

Hi Brane.

Ref. <http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics>.

A different id (than node-id+copy-id) is needed because I prefer to
describe the semantics of moves (on the server) in such a way, not because of anything
to do with the client side, nor anything to do with existing or potential editor designs.

Some important move-tracking query APIs are ones which will map between paths in one revision and "corresponding" paths in another revision.Â For these purposes I believe the abstract model we need to present is one in which copying a directory creates new lines of history ("node-lines") for all nodes in the subtree, even though their content may not have diverged yet.Â In other words, for the purposes of a query that asks "where" (at what path) in revision REV2 we would find the node corresponding to that at PATH1_at_REV1, it should behave the same *as if* copies were always full copies and never lazy.Â (Conversely, in merging, for the purpose of finding a common ancestor of changes to be merged, it may be easier to work with the late branching / lazy-copy model, as the nearest common ancestor can be nearer.)

The "node-line" concept is merely a tool to aid in the definition of the semantics.Â I am not suggesting here that the node-line-id should be transmitted to the client or used in the editor APIs.Â (Those are separate discussions in which we may or may not want to use such a concept.)

Do you object to my using this invented concept as a tool within the semantic specification, or do you object to this abstract concept being made concrete and stored and exposed?Â Do you disagree with the semantics I defined, or find it hard to interpret, or is it that you would prefer to describe the same thing in a different way?Â I'm not clear.

The same semantics could of course be defined in other ways.Â However the definition as I've written it clearly doesn't work if we just write (node-id, copy-id) in place of (node-line-id).Â Here is an example of how it makes a difference.

Start with

Â r10:
Â Â trunk
Â Â Â Â Â /A
Â Â Â Â Â /B

branch the trunk:

Â r20:
Â Â trunk
Â Â Â Â Â /A
Â Â Â Â Â /B
Â Â Â branch
Â Â Â Â Â /A (pointer to /trunk/A)
Â Â Â Â Â /B (pointer to /trunk/B)

modify branch/A:

Â r30:
Â Â trunk
Â Â Â Â Â /A
Â Â Â Â Â /B
Â Â Â branch
Â Â Â Â Â /A
Â Â Â Â Â /B (pointer to /trunk/B)

Now
let's say we're diffing branch_at_20 and branch_at_30.Â I want to be able to
report a mapping between each path in branch_at_20 and the path in r30
corresponding to "the same node", where "the same node" is to be defined
in some way that makes sense for tracking moves.Â In this simple
example, there are not even any moves, and so I want the move-tracking
code to be able to deduce the following 1:1 path-mapping between
branch_at_20 and branch_at_20:

Â PATH_at_20Â Â Â Â Â PATH_at_30
Â branch Â Â <->Â branch
Â branch/AÂ <->Â branch/A
Â branch/BÂ <->Â branch/B

It certainly must not report a simple (node-id, copy-id) correspondence, because that would look something like:

Â PATH_at_20Â Â Â Â Â Â PATH_at_30
Â branch Â Â <->Â branch
Â branch/AÂ <->Â trunk/AÂ # or (nil) as it's out of tree-scope
Â (nil) Â Â Â <->Â branch/A
Â branch/BÂ <->Â branch/B
which breaks the mapping between branch/A_at_20 and branch/A_at_30.

Hi Philip.

Branko ÄŒibej wrote:
> Philip Martin wrote:
>> Another way to provide the moves between arbitrary revisions is to have
>> an id to path map per revision which allows the FS to find the path
>> associated with a given id.Â However with lazy-copy this map is harder
>> to implement.

Harder in the sense that a naive map from each node-line-id to each reachable path in the revision would require adding N entries to the map when copying a subtree of N nodes, thus making copy no longer O(1).Â To maintain O(1) copies we'd need something cleverer.

In my present definition of move semantics, the ids used in this map would be what I call "node-line" ids, not the raw (node-id, copy-id) pairs.Â How copy-ids work is thus irrelevant to me.Â (Reading between the lines, I think with your questions about how copy-id assignment works you meant to question how copy-id could possibly be used to answer move tracking queries, whereas Brane answered them as direct questions about how copy-id assignment currently works.)

- Julian
Received on 2013-09-17 13:00:29 CEST

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: Moves in FSFS"
Previous message: Bert Huijben: "RE: Regression in 1.8.x - checksum mismatch when using svn via http/apache"
In reply to: Branko ÄŒibej: "Re: Moves in FSFS"
Next in thread: Branko ÄŒibej: "Re: Moves in FSFS"
Reply: Branko ÄŒibej: "Re: Moves in FSFS"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]