[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Maintaining NodeID sanity

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-05-08 03:25:43 CEST

"Bill Tutt" <rassilon@lyra.org> writes:
> Efficiently accessing ancestor lists:
> Karl's concerned with the change from NodeRevisionID to (NodeID,
> ChangeSetID) because it removes the ancestry information from the row.
>
> Bill's reply:
> This is true. I think you still want to store the entire predecessor
> list inline using some kind of (hopefully server side merge history
> recording friendly) string format. Here's an example format that allows
> server side merge history recording.
> E.g: in a Python like way:
> # Ancestor history list:
> [
> # Initial revision NodeChangePK info:
> [(NodeID1, ChangeSet1)],
> # Incremental change of this Node:
> [(NodeID1, ChangeSet2)],
> # A merge from a different file in the repository (again, just the
> # NodeChangePK facts here):
> [(NodeID2, ChangeSet2)],
> # A star merge from two separate files in the repository:
> [(NodeID2, ChangeSet3), (NodeID3, ChangeSet4)]
> ]
>
> Just because we're moving the data out of the PK doesn't mean we can't
> keep the data around in the row. We just don't want to use it in the PK.
> :)

Sure... But the common case so far has been successive single-ancestor
changes to each node (where the single "ancestor" is the predecessor,
of course), and we've been able to record that in a highly compressed
way by using place-value notation: When you see that your ID is
`X.some_integer', you know to count nodes X.1 through
X.(some_integer-1) as ancestors. This trick doesn't work when the
node id is `X.unique_txn_id'.

So my problem isn't that the information is gone from the primary key;
we can still store it in the row, that's fine. The question is,
*what* are we storing in the row? If the representation of the
ancestry line grows linearly with the ancestry line itself, I submit
that we have a problem. Subversion is supposed to scale :-), not just
in data size, but number of changes to a given node.

> Put another way: Blob's are bad in index keys (i.e. Primary key), but
> are perfectly ok as columns in your table.
>
> For BDB you could always create a separate NodeChangeAnecstorBlob table
> if it's deemed that it makes more perf sense that way.

This is the assertion I need to be comforted about... :-)

Blobs are okay unless you're searching in them often for information.
If we have to constantly load and/or scan a whole blob to find out
whether node FOO is an ancestor of BAR or not, that seems like a bit
of downgrade from the current system.

Am I worrying about nothing? Look in the filesystem at where we do
ancestry and predecessor checks... I just don't see how it can
possibly be efficient enough to use this ancestry mechanism :-(.

> Karl was wondering about whether it's important or not for ChangeSet's
> to be incrementally ordered on a NodeChange's versioning timeline. i.e.
> point in time 1 could have a ChangeSetID that's larger than the
> NodeChange of the Node at point in time 2.
>
> Bill's reply:
> Yes this is possible to have, but it doesn't cause us any problems,
> since an ordering still exists. You just have to join to the Repository
> Revision Number -> ChangeSetID data. It imposes a little extra overhead,
> but in general shouldn't be that nasty. Especially since repository
> point in times -> ChangeSetID mappings are completely non-alterable,
> ever... (i.e. they're easily cacheable)

Agreed (I think Mike Pilato and I came to the same conclusion today).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 8 03:25:54 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.