"Bill Tutt" <rassilon@lyra.org> writes:
> Efficiently accessing ancestor lists:
> Karl's concerned with the change from NodeRevisionID to (NodeID,
> ChangeSetID) because it removes the ancestry information from the row.
> 
> Bill's reply:
> This is true. I think you still want to store the entire predecessor
> list inline using some kind of (hopefully server side merge history
> recording friendly) string format. Here's an example format that allows
> server side merge history recording. 
> E.g: in a Python like way:
> # Ancestor history list: 
>   [
> # Initial revision NodeChangePK info:
>     [(NodeID1, ChangeSet1)],
> # Incremental change of this Node:
>     [(NodeID1, ChangeSet2)],
> # A merge from a different file in the repository (again, just the 
> #   NodeChangePK facts here):
>     [(NodeID2, ChangeSet2)],
> # A star merge from two separate files in the repository:
>     [(NodeID2, ChangeSet3), (NodeID3, ChangeSet4)]
>   ]
> 
> Just because we're moving the data out of the PK doesn't mean we can't
> keep the data around in the row. We just don't want to use it in the PK.
> :)
Sure... But the common case so far has been successive single-ancestor
changes to each node (where the single "ancestor" is the predecessor,
of course), and we've been able to record that in a highly compressed
way by using place-value notation: When you see that your ID is
`X.some_integer', you know to count nodes X.1 through
X.(some_integer-1) as ancestors.  This trick doesn't work when the
node id is `X.unique_txn_id'.
So my problem isn't that the information is gone from the primary key;
we can still store it in the row, that's fine.  The question is,
*what* are we storing in the row?  If the representation of the
ancestry line grows linearly with the ancestry line itself, I submit
that we have a problem.  Subversion is supposed to scale :-), not just
in data size, but number of changes to a given node.
> Put another way: Blob's are bad in index keys (i.e. Primary key), but
> are perfectly ok as columns in your table.
> 
> For BDB you could always create a separate NodeChangeAnecstorBlob table
> if it's deemed that it makes more perf sense that way.
This is the assertion I need to be comforted about... :-)
Blobs are okay unless you're searching in them often for information.
If we have to constantly load and/or scan a whole blob to find out
whether node FOO is an ancestor of BAR or not, that seems like a bit
of downgrade from the current system.
Am I worrying about nothing?  Look in the filesystem at where we do
ancestry and predecessor checks... I just don't see how it can
possibly be efficient enough to use this ancestry mechanism :-(.
> Karl was wondering about whether it's important or not for ChangeSet's
> to be incrementally ordered on a NodeChange's versioning timeline. i.e.
> point in time 1 could have a ChangeSetID that's larger than the
> NodeChange of the Node at point in time 2.
> 
> Bill's reply:
> Yes this is possible to have, but it doesn't cause us any problems,
> since an ordering still exists. You just have to join to the Repository
> Revision Number -> ChangeSetID data. It imposes a little extra overhead,
> but in general shouldn't be that nasty. Especially since repository
> point in times -> ChangeSetID mappings are completely non-alterable,
> ever... (i.e. they're easily cacheable)
Agreed (I think Mike Pilato and I came to the same conclusion today).
-K
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May  8 03:25:54 2002