> From: Greg Hudson [mailto:ghudson@MIT.EDU]
> On 8 Mar 2002, Karl Fogel wrote:
> > I think it won't be necessary to do that. If we have to ask for
> > predecessors at all, the number of askings is going to be
> > to the size of the change, and having to do a number of DB access
> > proportional to the size of the change is no big deal -- I mean,
> > is a *commit*, after all, we just sent the data over the network.
> I'm a little nervous about the implications for skip-deltas, assuming
> ever do them (and I hope we do). To do the re-deltifications on a
> we need to walk back some number of predecessors from the current
> node-revision ID--only two on average, but sometimes a whole bunch.
> particular, at each power of two we re-deltify the oldest revision
> the head, requiring a complete walk of the predecessors back to the
> of the node.
> For this kind of operation, it would be most ideal to have random
> to the node-revision ID list for a node, e.g. by forcing the
> node-revisions to be 100.1, 100.2, etc.. Forcing that does require
> "stabilization walk" at the end of a commit which we'd like to get rid
> I guess that could argue for Branko's opportunistic re-deltification
> approach, but I don't really like how that approach only yields
> performance the second time you access an old revision.
In a follow on email to my original note, I mentioned that the only
extra cost involved in finding the set of predecessors of a NodeChange
(aka node-revision) is to just load the NodeChange row and parse the
skel for just the in-row predecessor string, and then do whatever you
need to do once you acquire that information. Having a column that is
unbounded is much saner than having your PK be unbounded. This applies
to SQL stores specifically, but eventually BDB would probably encounter
issues with such unbounded PKs in any event.
The proposal in no way drastically increases (i.e. dependant on the
number of ancestors) the # of database accesses it takes to find the
ancestry set. It goes from 0 accesses given a NodeRevisionID to 1 given
Things would of course be entirely different if I was suggesting storing
ancestry information only in a normalized fashion, but I'm not. I
wouldn't mind having a normalized store as well though.
In a separate email thread Greg Stein mentioned that he wanted to move
the deltification pass outside of the actual svn transaction commit
path. (i.e. outside of the BDB transaction that we currently use for
commiting our SVN transaction) I think Greg suggested doing this on an
asynchronous basis after the svn transaction commit succeeds. Doing the
deltification in a different (or indeed many different) BDB transactions
will drastically reduce the lock usage for having to possibly touch so
many more rows. Esp. since BDB doesn't seem to be too intelligent about
Does this make sense?
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Sat Mar 9 22:54:42 2002