RE: Maintaining NodeID sanity
From: Bill Tutt <rassilon_at_lyra.org>
Date: 2002-05-08 02:59:38 CEST
I'm sorry for it being a bit dense; I wanted to get the idea out and
I'm afraid I'm not on the issues list atm, but I'll be adding myself to
Efficiently accessing ancestor lists:
Bill's reply:
Just because we're moving the data out of the PK doesn't mean we can't
Put another way: Blob's are bad in index keys (i.e. Primary key), but
For BDB you could always create a separate NodeChangeAnecstorBlob table
Karl was wondering about whether it's important or not for ChangeSet's
Bill's reply:
Bill
---- Do you want a dangerous fugitive staying in your flat? No. Well, don't upset him and he'll be a nice fugitive staying in your flat. > -----Original Message----- > From: Karl Fogel [mailto:kfogel@newton.ch.collab.net] > Sent: Tuesday, May 07, 2002 5:36 PM > To: Bill Tutt > Cc: Subversion Dev list > Subject: Re: Maintaining NodeID sanity > > I'm printing this out and reading it at home right now, Bill (it's, > uh, a bit dense, and so am I :-) ). I'm curious if you've been > watching my recent annotations to > > http://subversion.tigris.org/issues/show_bug.cgi?id=654 > > They describe some concerns I have about the new NodeID scheme. Not > showstoppers yet, afaict, but I'd like to know what you (and Brane, > and everyone else) think of them... > > Thanks, > -K > > "Bill Tutt" <billtut@microsoft.com> writes: > > Branko came up with a decent idea to restore NodeID sanity to the > > universe in the face of the weird side-effects of O(1) copies. > > > > The thinking goes something like this: > > > > We want to keep the NodeID the same for branches, but different for > > copies. > > > > We want to only change the NodeID on the destination of the copy. > > > > A copy is really almost a branch. Let's think about just branches for > > the moment. > > > > So, we need a way to distinguish a NodeChange besides just the normal > > (NodeID, ChangeSetID) data. > > > > Let's add a BranchID to the PK of a NodeChange so that you have (NodeID, > > BranchID, ChangeSetID) as the PK of NodeChange. > > > > So when you "branch <src> <dest>" the result of the branch is a new > > NodeChange that looks like this: > > (<src NodeID>, <some opaque BranchID type>, <branch operation's > > ChangeSetID>) > > > > Now all we need is to know exactly what a BranchID is. > > > > Branko's idea was to define a BranchID as: > > NodeChangePKOf(parentOf(<dest>)). > > > > The problem with that is that the PK contains BranchID which is now > > recursively defined. That's good for nested branches, but bad for a > > schema. > > > > One alternative is to add a unique column to NodeChange: NodeChangeID. > > NodeChangeID is a monotonically increasing integer (possibly a > > fixed-precision decimal considering that NodeChangeIDs will be used much > > more quickly than NodeIDs) that is unique for each NodeChange. > > > > NodeChangeID is not an additional part of the PK. It is an AK > > (alternative key). > > > > BranchID can now be defined to be: NodeChangeID of <dest>. > > > > When the user creates a branch of path "/foo" to "/bar" the following > > can now happen: > > Create a new NodeChange of ("/foo".NodeID, BranchID == > > this.NodeChangeID, ChangeSetID) and associate it with "/bar". This > > doesn't set any of the copy related properties, and only changes the > > BranchID. > > > > When you are committing new changes to things under /bar (or revision > > bumps of /bar), any new nodes will have the BranchID set to "/bar''s > > BranchID. > > > > The problem now becomes what I do when I have one of these complicated > > nested branch cases. > > > > e.g.: > > > > /foo/main/A > > /foo/alt/A > > > > where /foo/alt is a branch of /foo/main. > > > > I then: "branch /foo /bar" > > I then make a modification to "/bar/alt/A". > > > > What BranchID does A's NodeChange row now have? > > If we lazily flatten sub-branches then it could be "/bar"'s BranchID. > > If we lazily create sub-branches then it could be "/bar/alt/"'s > > BranchID. > > > > Flattening isn't terribly friendly to our users, so let's think about > > what we have to do if we lazily create our sub-branches. > > > > When examining /bar/alt we notice that the existing BranchID != /bar's > > BranchID. This may mean any number of things: > > * It might be a copy, since we haven't told you how to differentiate > > between a copy and a branch. If it existed inside of a copy, the > > BranchID is unimportant since if it needs to change to be /bar's > > BranchID. > > > > * It might be a branch, and if so, we need to create a new NodeChange > > for alt, containing a new BranchID. The origin of this branch is > > recorded as being /foo/alt. > > Once we have this new BranchID, we can then use it while creating the > > new NodeChange for /bar/alt/A. > > > > But, how do we detect that /bar/alt is a branch instead of a copy? (i.e. > > it might be a copy of a branch) > > > > Alt.BranchID points to /foo/alt's data. (The location of the original > > branch.) When I look that data up, I'll notice that the copy information > > is absent. This tells me I have a branch. If I had a copy, the copy > > properties would be filled in. > > > > The only additional annoyance is that copy/branch destination nodes > > don't record the path that they were created at. Given the DAGgy nature > > of our data store, it would certainly help if they included what their > > creation path was. > > > > i.e.: /bar was created at path /bar. > > > > To sum up: > > Add a BranchID to the PK of NodeChange. > > Add a NodeChangeID as an AK of NodeChange. (i.e. monotonically > > increasing integer that uniquely, but not usefully defines a NodeChange > > row) > > Use the NodeChangeID as the BranchID when creating copies/branches. > > Add a CreatedPath to NodeChange that is only used for copy/branch > > operations. > > > > Alternatively, you could get away without having NodeChangeID since it > > requires a separate BDB table to repeat the PK of NodeChange to point to > > the correct NodeChange row. > > > > If we do the above we get the following benefits: > > * NodeIDs stay the same for branches, but change for copies. > > * NodeIDs are attached to the path that they're created in. No more > > non-deterministic migrating of NodeID usages depending on which side of > > a copy got modified first. > > * We get the easy possibility of handling O(1) branches, sub-branches, > > and sub-sub-branches, etc.... > > * Computing the next NodeID to use for a NodeChange alteration is much > > quicker. > > (It's built up as you traverse the path space looking for NodeChange's > > to create new revisions of.) > > * svn log for branched items includes all of the history of the node > > before the branch. Since the inline history revision data is preserved. > > * The addition of created path to copy/branch points reduces poor svn > > log's workload when told to figure out what it needs to do, but can't > > easily output where a nested branch/copy came from. > > > > Sounds like utter goodness to me. :) > > > > Bill > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org > > For additional commands, e-mail: dev-help@subversion.tigris.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org For additional commands, e-mail: dev-help@subversion.tigris.orgReceived on Wed May 8 03:00:34 2002 |
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.