While doing the data model for SVN I ran into fun issue #573:
(NodeRevisionIDs have unbounded length)
The thinking goes something like this:
A Node is the concept of a chunk of data independent to its point in
time.
A Node has an ordered sequence of changes. Lets call this a NodeChange.
NodeChanges need to exist in the public directory DAG, and as members of
the private pending transaction directory DAG.
NodeChanges are identified by: (NodeID, SomeOrderingID)
SomeOrderingID just can't be the ordering of the NodeChange in the
history of the Node. Otherwise, we’d have to update each NodeChange row
during svn_commit_txn. This is bad. Let's see if we can get away with
only having to insert one row when we commit an SVN transaction.
Lets see what happens if we use the current SVN TransactionID as our
SomeOrderingID.
NodeChange is identified by (NodeID, TransactionID)
(NodeID, TransactionID) has a set of predecessors (NodeID,
TransactionID)
Ok, this handles our notion of ancestry, and a notion of being able to
handle pending private trees. Now we need to correctly order
NodeChanges.
The transaction identified by TransactionID was commited as
RepositoryVersionNumber.
Node has a sequence of changes ordered by (NodeID,
RepositoryVersionNumber)
This implies that finding a Node’s predecessor in the sequence of
NodeChanges does become slightly more expensive. But it does accomplish
a couple of pretty cool things:
* We only have to insert one row associating the TransactionID with the
RepositoryVersionNumber during svn_fs_commit_txn. As opposed to setting
the SOURCE-REV to RepositoryVersionNumber during stabalization.
* The svn log problem becomes much simpler
* NodeChanges no longer have an unbounded PK which makes transitioning
to SQL later much easier.
* I don't know whether you noticed in the above discussion that these
changes would allow you to rename Transaction to Changeset, and that now
finding the list of files modified in a given RepositoryVersionNumber
bump is much easier.
So, how badly does this affect the current predecessor and ancestor
finding algorithms?
* It really doesn’t necessarily alter the cost of common ancestor
finding.
NodeChange’s set of predecessors (in terms of just the direct line based
on NodeID) could be determined by a string blob of successive
TransactionIDs. (e.g. 1.2.3.4.5.6 would mean that this Node changed in
Transactions 1-6) This takes care of the common immediate predecessor
set has a size of 1 case. (i.e. not from a merge) Merges still could be
recorded as they are now as additional chunks of data on a NodeChange.
(Note: svn currently needs to fix more spots in the svn fs code to use
svn_fs_check_related for common ancestor checking)
So to sum up, we gain Changeset tracking, reduce the cost of committing
a SVN Transaction, preserve our current cost of ancestor tracking, and
make the SVN data model more amenable to being represented by a SQL
Store.
Thoughts?
Bill
----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Mar 7 17:47:30 2002