> The practical question then is, how would we change the repository
> format to store skiplist deltas?
I'm sorry, I was thinking about the wrong set of operations. We don't
need skiplist deltas, and we don't need to change the repository format,
and this should be an easy job. We go with Branko's "re-deltifying
against the head" idea, but instead of doing it at constant intervals,
we do it at exponential intervals. (I don't know which Branko had in
mind.) This gives us all the performance advantages I cited at less of
a space penalty. I'll call this idea "skip-deltas."
Right now, when we commit node-revision 100.N, we commit it as plaintext
and deltify 100.(N-1) to point to it. We'll do the same thing except
we'll potentially deltify some more node-revisions: 100.(N-2),
100.(N-4), and so on. The number of additional node-revisions we modify
is the number of 0s at the end of the binary representation of N
(assuming we number N starting from 0; if we number starting from 1, use
N-1). Following this rule, a chain of nine node-revisions should look
like:
0 --------------------------------------------> 8
4 --------------------> 8
2 --------> 4 6 --------> 8
1 --> 2 3---> 4 5 --> 6 7 --> 8
It takes three deltas to get from node-rev 1 to node-rev 8, which is the
logarithmic behavior we were going for. We still suffer a space
penalty, because those long arrows are generally more expensive than
short ones should be, but it's much simpler than dealing with full
skiplists. (And the space penalty is only a factor of 2, I think, since
the average arrow length is 2.)
Some notes on node branches
---------------------------
I'm not sure we're doing a good job of dealing with branches right now.
Based on the code Daniel was modifying, I think when we branch we are
blindly re-deltifying the predecessor node to point into the branch.
Which works, but doesn't necessarily yield good performance. What RCS
does looks like:
1.1 --> 1.2 --> 1.3 --> 1-4 --> 1.5
1.3 <-- 1.3.1.1 <-- 1.3.1.2 --< 1.3.1.3
Switching to backwards-branch behavior is pretty easy with or without
skip-deltas. In the current world, if we noted that we were on a
branch, then instead of deltifying the predecessor node to point at us,
we would deltify ourself to point at the predecessor node. With
skip-deltas, instead of deltifying a bunch of predecessors to point at
us, we deltify ourself to point at an ancestor node some distance back,
the distance being 2 to the power of the number of 0s at the end of the
binary representation of the last component of our ID.
Unfortunately, there is a performance pitfall with backwards-branch
behavior in Subversion, because the "trunk" of a node is determined
simply by whoever modifies it first. If I copy a directory, modify a
few files in the copy, and never touch them again but instead continue
to modify them heavily in the original, then those files' plaintexts
will get further and further away from the "real" head revision. Not
sure what to do about that. In the average case we'd probably still do
better than the current behavior.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:06 2006