Re: Maintaining NodeID sanity

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-05-08 02:35:59 CEST

I'm printing this out and reading it at home right now, Bill (it's,
uh, a bit dense, and so am I :-) ). I'm curious if you've been
watching my recent annotations to

http://subversion.tigris.org/issues/show_bug.cgi?id=654

They describe some concerns I have about the new NodeID scheme. Not
showstoppers yet, afaict, but I'd like to know what you (and Brane,
and everyone else) think of them...

Thanks,
-K

"Bill Tutt" <billtut@microsoft.com> writes:
> Branko came up with a decent idea to restore NodeID sanity to the
> universe in the face of the weird side-effects of O(1) copies.
>
> The thinking goes something like this:
>
> We want to keep the NodeID the same for branches, but different for
> copies.
>
> We want to only change the NodeID on the destination of the copy.
>
> A copy is really almost a branch. Let's think about just branches for
> the moment.
>
> So, we need a way to distinguish a NodeChange besides just the normal
> (NodeID, ChangeSetID) data.
>
> Let's add a BranchID to the PK of a NodeChange so that you have (NodeID,
> BranchID, ChangeSetID) as the PK of NodeChange.
>
> So when you "branch <src> <dest>" the result of the branch is a new
> NodeChange that looks like this:
> (<src NodeID>, <some opaque BranchID type>, <branch operation's
> ChangeSetID>)
>
> Now all we need is to know exactly what a BranchID is.
>
> Branko's idea was to define a BranchID as:
> NodeChangePKOf(parentOf(<dest>)).
>
> The problem with that is that the PK contains BranchID which is now
> recursively defined. That's good for nested branches, but bad for a
> schema.
>
> One alternative is to add a unique column to NodeChange: NodeChangeID.
> NodeChangeID is a monotonically increasing integer (possibly a
> fixed-precision decimal considering that NodeChangeIDs will be used much
> more quickly than NodeIDs) that is unique for each NodeChange.
>
> NodeChangeID is not an additional part of the PK. It is an AK
> (alternative key).
>
> BranchID can now be defined to be: NodeChangeID of <dest>.
>
> When the user creates a branch of path "/foo" to "/bar" the following
> can now happen:
> Create a new NodeChange of ("/foo".NodeID, BranchID ==
> this.NodeChangeID, ChangeSetID) and associate it with "/bar". This
> doesn't set any of the copy related properties, and only changes the
> BranchID.
>
> When you are committing new changes to things under /bar (or revision
> bumps of /bar), any new nodes will have the BranchID set to "/bar''s
> BranchID.
>
> The problem now becomes what I do when I have one of these complicated
> nested branch cases.
>
> e.g.:
>
> /foo/main/A
> /foo/alt/A
>
> where /foo/alt is a branch of /foo/main.
>
> I then: "branch /foo /bar"
> I then make a modification to "/bar/alt/A".
>
> What BranchID does A's NodeChange row now have?
> If we lazily flatten sub-branches then it could be "/bar"'s BranchID.
> If we lazily create sub-branches then it could be "/bar/alt/"'s
> BranchID.
>
> Flattening isn't terribly friendly to our users, so let's think about
> what we have to do if we lazily create our sub-branches.
>
> When examining /bar/alt we notice that the existing BranchID != /bar's
> BranchID. This may mean any number of things:
> * It might be a copy, since we haven't told you how to differentiate
> between a copy and a branch. If it existed inside of a copy, the
> BranchID is unimportant since if it needs to change to be /bar's
> BranchID.
>
> * It might be a branch, and if so, we need to create a new NodeChange
> for alt, containing a new BranchID. The origin of this branch is
> recorded as being /foo/alt.
> Once we have this new BranchID, we can then use it while creating the
> new NodeChange for /bar/alt/A.
>
> But, how do we detect that /bar/alt is a branch instead of a copy? (i.e.
> it might be a copy of a branch)
>
> Alt.BranchID points to /foo/alt's data. (The location of the original
> branch.) When I look that data up, I'll notice that the copy information
> is absent. This tells me I have a branch. If I had a copy, the copy
> properties would be filled in.
>
> The only additional annoyance is that copy/branch destination nodes
> don't record the path that they were created at. Given the DAGgy nature
> of our data store, it would certainly help if they included what their
> creation path was.
>
> i.e.: /bar was created at path /bar.
>
> To sum up:
> Add a BranchID to the PK of NodeChange.
> Add a NodeChangeID as an AK of NodeChange. (i.e. monotonically
> increasing integer that uniquely, but not usefully defines a NodeChange
> row)
> Use the NodeChangeID as the BranchID when creating copies/branches.
> Add a CreatedPath to NodeChange that is only used for copy/branch
> operations.
>
> Alternatively, you could get away without having NodeChangeID since it
> requires a separate BDB table to repeat the PK of NodeChange to point to
> the correct NodeChange row.
>
> If we do the above we get the following benefits:
> * NodeIDs stay the same for branches, but change for copies.
> * NodeIDs are attached to the path that they're created in. No more
> non-deterministic migrating of NodeID usages depending on which side of
> a copy got modified first.
> * We get the easy possibility of handling O(1) branches, sub-branches,
> and sub-sub-branches, etc....
> * Computing the next NodeID to use for a NodeChange alteration is much
> quicker.
> (It's built up as you traverse the path space looking for NodeChange's
> to create new revisions of.)
> * svn log for branched items includes all of the history of the node
> before the branch. Since the inline history revision data is preserved.
> * The addition of created path to copy/branch points reduces poor svn
> log's workload when told to figure out what it needs to do, but can't
> easily output where a nested branch/copy came from.
>
> Sounds like utter goodness to me. :)
>
> Bill
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 8 02:36:01 2002

This message: [ Message body ]
Next message: Glenn A. Thompson: "Re: Maintaining NodeID sanity"
Previous message: Philip Martin: "Re: svn commit: rev 1895 - trunk/subversion/clients/cmdline trunk/subversion/tests/clients/cmdline/svntest"
In reply to: Bill Tutt: "Maintaining NodeID sanity"
Next in thread: Bill Tutt: "RE: Maintaining NodeID sanity"
Reply: Bill Tutt: "RE: Maintaining NodeID sanity"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]