[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Whither delta combiner

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-02-11 19:55:06 CET

Greg Hudson <ghudson@MIT.EDU> writes:
> I'm sorry, I was thinking about the wrong set of operations. We don't
> need skiplist deltas, and we don't need to change the repository format,
> and this should be an easy job. We go with Branko's "re-deltifying
> against the head" idea, but instead of doing it at constant intervals,
> we do it at exponential intervals. (I don't know which Branko had in
> mind.) This gives us all the performance advantages I cited at less of
> a space penalty. I'll call this idea "skip-deltas."

Gack, yes, my goodness, +1 and all that. In fact I think this is what
I was half-consciously imagining when you were first describing
skip-lists (hence I misunderstood the explanation). Thank you for
seeing it clearly.

> Right now, when we commit node-revision 100.N, we commit it as plaintext
> and deltify 100.(N-1) to point to it. We'll do the same thing except
> we'll potentially deltify some more node-revisions: 100.(N-2),
> 100.(N-4), and so on. The number of additional node-revisions we modify
> is the number of 0s at the end of the binary representation of N
> (assuming we number N starting from 0; if we number starting from 1, use
> N-1). Following this rule, a chain of nine node-revisions should look
> like:
>
> 0 --------------------------------------------> 8
> 4 --------------------> 8
> 2 --------> 4 6 --------> 8
> 1 --> 2 3---> 4 5 --> 6 7 --> 8
>
> It takes three deltas to get from node-rev 1 to node-rev 8, which is the
> logarithmic behavior we were going for. We still suffer a space
> penalty, because those long arrows are generally more expensive than
> short ones should be, but it's much simpler than dealing with full
> skiplists. (And the space penalty is only a factor of 2, I think, since
> the average arrow length is 2.)

Yesssssssssss.

> Some notes on node branches
> ---------------------------
>
> I'm not sure we're doing a good job of dealing with branches right now.
> Based on the code Daniel was modifying, I think when we branch we are
> blindly re-deltifying the predecessor node to point into the branch.
> Which works, but doesn't necessarily yield good performance. What RCS
> does looks like:
>
> 1.1 --> 1.2 --> 1.3 --> 1-4 --> 1.5
> 1.3 <-- 1.3.1.1 <-- 1.3.1.2 --< 1.3.1.3
>
> Switching to backwards-branch behavior is pretty easy with or without
> skip-deltas. In the current world, if we noted that we were on a
> branch, then instead of deltifying the predecessor node to point at us,
> we would deltify ourself to point at the predecessor node. With
> skip-deltas, instead of deltifying a bunch of predecessors to point at
> us, we deltify ourself to point at an ancestor node some distance back,
> the distance being 2 to the power of the number of 0s at the end of the
> binary representation of the last component of our ID.

Hmmm, are you sure this isn't confusing "branch nodes" with "branches
as in lines-of-development"?

The RCS algorithm is about lines of development. We handle those by
doing an fs copy, and in an fs copy, a new node id is created,
unrelated to the node id of the copy source. There is copy-history
metadata stored in the node's header that allows you to know the
source, but there's no way from looking at node IDs to tell that a
copy has happened.

Therefore, the first node in a copied line is stored fulltext, and
thereafter the node behaves like any other: subsequent revisions are
initialized fulltext, and previous ones get deltified against it. I
guess this leaves us with more fulltexts than strictly necessary, but
it doesn't seem like a big penalty, as it happens at most once per
branch, and then only for branched files, not branched directories.

[Or am I missing something here?]

The only time both a branch id and the id it branched off can survive
in the repository is when the node is a directory, I believe; and
that's no longer relevant for deltification. By the way, see

  http://subversion.tigris.org/issues/show_bug.cgi?id=573

for more on branch IDs, though peripheral to deltification concerns.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:06 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.