Maintaining NodeID sanity

From: Bill Tutt <billtut_at_microsoft.com>
Date: 2002-05-08 02:22:42 CEST

Branko came up with a decent idea to restore NodeID sanity to the
universe in the face of the weird side-effects of O(1) copies.

The thinking goes something like this:

We want to keep the NodeID the same for branches, but different for
copies.

We want to only change the NodeID on the destination of the copy.

A copy is really almost a branch. Let's think about just branches for
the moment.

So, we need a way to distinguish a NodeChange besides just the normal
(NodeID, ChangeSetID) data.

Let's add a BranchID to the PK of a NodeChange so that you have (NodeID,
BranchID, ChangeSetID) as the PK of NodeChange.

So when you "branch <src> <dest>" the result of the branch is a new
NodeChange that looks like this:
(<src NodeID>, <some opaque BranchID type>, <branch operation's
ChangeSetID>)

Now all we need is to know exactly what a BranchID is.

Branko's idea was to define a BranchID as:
NodeChangePKOf(parentOf(<dest>)).

The problem with that is that the PK contains BranchID which is now
recursively defined. That's good for nested branches, but bad for a
schema.

One alternative is to add a unique column to NodeChange: NodeChangeID.
NodeChangeID is a monotonically increasing integer (possibly a
fixed-precision decimal considering that NodeChangeIDs will be used much
more quickly than NodeIDs) that is unique for each NodeChange.

NodeChangeID is not an additional part of the PK. It is an AK
(alternative key).

BranchID can now be defined to be: NodeChangeID of <dest>.

When the user creates a branch of path "/foo" to "/bar" the following
can now happen:
Create a new NodeChange of ("/foo".NodeID, BranchID ==
this.NodeChangeID, ChangeSetID) and associate it with "/bar". This
doesn't set any of the copy related properties, and only changes the
BranchID.

When you are committing new changes to things under /bar (or revision
bumps of /bar), any new nodes will have the BranchID set to "/bar''s
BranchID.

The problem now becomes what I do when I have one of these complicated
nested branch cases.

e.g.:

/foo/main/A
/foo/alt/A

where /foo/alt is a branch of /foo/main.

I then: "branch /foo /bar"
I then make a modification to "/bar/alt/A".

What BranchID does A's NodeChange row now have?
If we lazily flatten sub-branches then it could be "/bar"'s BranchID.
If we lazily create sub-branches then it could be "/bar/alt/"'s
BranchID.

Flattening isn't terribly friendly to our users, so let's think about
what we have to do if we lazily create our sub-branches.

When examining /bar/alt we notice that the existing BranchID != /bar's
BranchID. This may mean any number of things:
* It might be a copy, since we haven't told you how to differentiate
between a copy and a branch. If it existed inside of a copy, the
BranchID is unimportant since if it needs to change to be /bar's
BranchID.

* It might be a branch, and if so, we need to create a new NodeChange
for alt, containing a new BranchID. The origin of this branch is
recorded as being /foo/alt.
Once we have this new BranchID, we can then use it while creating the
new NodeChange for /bar/alt/A.

But, how do we detect that /bar/alt is a branch instead of a copy? (i.e.
it might be a copy of a branch)

Alt.BranchID points to /foo/alt's data. (The location of the original
branch.) When I look that data up, I'll notice that the copy information
is absent. This tells me I have a branch. If I had a copy, the copy
properties would be filled in.

The only additional annoyance is that copy/branch destination nodes
don't record the path that they were created at. Given the DAGgy nature
of our data store, it would certainly help if they included what their
creation path was.

i.e.: /bar was created at path /bar.

To sum up:
  Add a BranchID to the PK of NodeChange.
  Add a NodeChangeID as an AK of NodeChange. (i.e. monotonically
increasing integer that uniquely, but not usefully defines a NodeChange
row)
  Use the NodeChangeID as the BranchID when creating copies/branches.
  Add a CreatedPath to NodeChange that is only used for copy/branch
operations.

Alternatively, you could get away without having NodeChangeID since it
requires a separate BDB table to repeat the PK of NodeChange to point to
the correct NodeChange row.

If we do the above we get the following benefits:
* NodeIDs stay the same for branches, but change for copies.
* NodeIDs are attached to the path that they're created in. No more
non-deterministic migrating of NodeID usages depending on which side of
a copy got modified first.
* We get the easy possibility of handling O(1) branches, sub-branches,
and sub-sub-branches, etc....
* Computing the next NodeID to use for a NodeChange alteration is much
quicker.
(It's built up as you traverse the path space looking for NodeChange's
to create new revisions of.)
* svn log for branched items includes all of the history of the node
before the branch. Since the inline history revision data is preserved.
* The addition of created path to copy/branch points reduces poor svn
log's workload when told to figure out what it needs to do, but can't
easily output where a nested branch/copy came from.

Sounds like utter goodness to me. :)

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 8 02:23:48 2002

This message: [ Message body ]
Next message: Philip Martin: "Re: svn commit: rev 1895 - trunk/subversion/clients/cmdline trunk/subversion/tests/clients/cmdline/svntest"
Previous message: Glenn A. Thompson: "Re: revision numbering (was: Re: Multiple projects in one repository!)"
Next in thread: Karl Fogel: "Re: Maintaining NodeID sanity"
Reply: Karl Fogel: "Re: Maintaining NodeID sanity"
Reply: Glenn A. Thompson: "Re: Maintaining NodeID sanity"
Maybe reply: cmpilato_at_collab.net: "Re: Maintaining NodeID sanity"
Maybe reply: cmpilato_at_collab.net: "Re: Maintaining NodeID sanity"
Maybe reply: cmpilato_at_collab.net: "Re: Maintaining NodeID sanity"
Maybe reply: cmpilato_at_collab.net: "Re: Maintaining NodeID sanity"
Maybe reply: cmpilato_at_collab.net: "Re: Maintaining NodeID sanity"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]