[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Keeping NodeRevisionIDs from being unbounded

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-03-08 19:05:03 CET

Wow, Bill, great thinking! I'm sold, except for some minor comments,
see below:

"Bill Tutt" <rassilon@lyra.org> writes:
> NodeChange is identified by (NodeID, TransactionID)
> (NodeID, TransactionID) has a set of predecessors (NodeID,
> TransactionID)
> Ok, this handles our notion of ancestry, and a notion of being able to
> handle pending private trees. Now we need to correctly order
> NodeChanges.
>
> The transaction identified by TransactionID was commited as
> RepositoryVersionNumber.
>
> Node has a sequence of changes ordered by (NodeID,
> RepositoryVersionNumber)
> This implies that finding a Node’s predecessor in the sequence of
> NodeChanges does become slightly more expensive. But it does accomplish
> a couple of pretty cool things:
>
> * We only have to insert one row associating the TransactionID with the
> RepositoryVersionNumber during svn_fs_commit_txn. As opposed to setting
> the SOURCE-REV to RepositoryVersionNumber during stabalization.

Actually, I doubt that this stabilization walk is a great expense.
Currently, we already have to walk all those nodes to mark them as
immutable; but even more to the point, the expense of doing a commit
lies not in the final stabilizations, but in building the data into
the txn in the first place. Getting rid of the O(N) stabilization
walk won't help much in practice -- N is just the number of changed
nodes, and we're do other O(N) things whose constant factor is much
higher anyway.

Given that, I actually think we should keep the final stabilization
walk and use it to set each NodeChange to

  (NodeID, RepositoryVersionNumber)

that is, just replace all the TransactionIDs with the new revision
number. Sure, this is no more "theoretically" correct than using the
TransactionID, but it's more comprehensible to anyone who has to debug
the filesystem. It's natural for the "Change" component of
"NodeChange" to be, well, the revision in which the change happened.
[I understand that there is no need in any case to permanently record
a mapping from one to the other -- just examine the Changed portion of
the revision's root NodeChanged :-) ].

So in as-yet-uncommitted NodeChanges, maybe the Change portion should
be the TransactionID, and part of stabilization would be to change
those to the new revision number?

But: in discussion with Greg Stein, Mike, and Ben just now, we
discovered that we won't need the stabilization walk anymore, because
we won't need explicit mutability flags. Instead, we can just compare
our txn id with the one on the node in question -- if they're the
same, we should consider this mutable, otherwise, the node is to be
treated as immutable (either it's committed, or it's mutable only for
some *other* txn). Heh!

Hmm, given that, I'm not sure that walking to convert TxnID to RevNum
is worthwhile, but would like your thoughts on this...

> * The svn log problem becomes much simpler

Can you describe this in more detail?

> * NodeChanges no longer have an unbounded PK which makes transitioning
> to SQL later much easier.

Amen -- huge win. :-)

> * I don't know whether you noticed in the above discussion that these
> changes would allow you to rename Transaction to Changeset, and that now
> finding the list of files modified in a given RepositoryVersionNumber
> bump is much easier.

:-) Another huge win.

> So, how badly does this affect the current predecessor and ancestor
> finding algorithms?
> * It really doesn’t necessarily alter the cost of common ancestor
> finding.
> NodeChange’s set of predecessors (in terms of just the direct line based
> on NodeID) could be determined by a string blob of successive
> TransactionIDs. (e.g. 1.2.3.4.5.6 would mean that this Node changed in
> Transactions 1-6) This takes care of the common immediate predecessor
> set has a size of 1 case. (i.e. not from a merge) Merges still could be
> recorded as they are now as additional chunks of data on a NodeChange.
> (Note: svn currently needs to fix more spots in the svn fs code to use
> svn_fs_check_related for common ancestor checking)

One cost that does change is finding the immediate predecessor of a
given node revision. Currently, it is usually constant time and
requires no DB access -- you can do it *entirely* by looking at the
NodeID, and when you can't, you can tell that by inspection instantly.

In the new system (whether we use the TransactionID or RevisionID),
we'll have to record ancestry explicitly. Frankly, I think this might
be an advantage -- it opens the door to storing multiple ancestry
(i.e., merge information) in a natural way.

> So to sum up, we gain Changeset tracking, reduce the cost of committing
> a SVN Transaction, preserve our current cost of ancestor tracking, and
> make the SVN data model more amenable to being represented by a SQL
> Store.

Me like. :-)

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Mar 8 18:54:37 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.