On 8 Feb 2002, Karl Fogel wrote:
> Hmmm. Okay, so skiplists are a normal space-time tradeoff?
Yes, you can actually use deterministic skip lists to get guaranteed
bounds (skip lists normally refer to the non-deterministic version).
> We store
> more delta data, but we reach older revisions faster.
>
> This sounds similar to a couple of other ideas that have been floating
> around:
>
> 1. "Checkpoint fulltexts" -- every N node revisions, store a
> fulltext instead of a delta, so that the maximum number of
> undeltifications to reach any given fulltext is never greater
> than N. Various change-sensitive heuristics can be used to set
> N; it doesn't have to be the same for every node, or even within
> every node. (I think I first pushed for this, primarily because
> I like saying "checkpoint fulltext" over and over and waiting
> for someone to ask what I mean by it.)
>
> 2. Redeltify old node revisions against head, as they get further
> from head. (I think Branko first proposed this, probably
> because he got sick of me saying "checkpoint fulltext" over and
> over.)
>
> Both proposals trade delta space for faster retrieval of old
> revisions; haven't done a formal proof, but I'm pretty sure both
> methods end up being exactly the same tradeoff, and I think it's about
> the same as the skiplist tradeoff, too. (I haven't thought as much
> about the skiplists, so this is just an off-the-cuff reaction.)
>
> I don't see any particular reason to prefer one method over the
> others, as far as performance goes, but checkpoint fulltexts (did I
> mention how much I *love* saying that?) are probably the most natural
> to implement. Easier to simply not make a delta, than to make more
> deltas :-), and undeltification is the same algorithm as we currently
> use -- namely, keep going until you hit a fulltext.
Actually, skiplist deltas would probably do better for you, because you
get guarantees, and it's much easier to implement. Why? The heuristics are
tricky otherwise, and your entire performance depends on them. So it's
either test a billion different scenarios to see if we have good
heuristics, or use something we already know the properties of, and are
guaranteed certain characteristics.
One thing we should also be doing is fulltext compression. Rather than
give up, or store full fulltexts, we could at least delta them against an
empty string.
Since vdelta is based off lempel-ziv, it'll give you some compression
(better than compress, worse than gzip, according to the paper).
Right now, we just leave it completely fulltext.
--Dan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006