[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Whither delta combiner

From: Daniel Berlin <dan_at_dberlin.org>
Date: 2002-02-09 23:25:59 CET

On Sat, 9 Feb 2002, Branko [UTF-8] ÄŒibej wrote:

> Daniel Berlin wrote:
>
> >>>>Not at all. Vdelta itself compresses. That has nothing to do with the
> >>>>encoding of the block-copy instructions.
> >>>>
> >>>Yes,yes, i remembered it runs over the target too.
> >>>
> >>>But turning everything into a delta (fulltexts = delta against empty
> >>>string) would mean we'd never need to worry
> >>>about fulltexts. If fulltexts simply became deltas against the empty
> >>>string, then everything becomes a delta combination problem.
> >>>
> >>>We'd effectively just make sure delta chains don't become too long for a
> >>>given revision, by combining deltas until the chain of deltas necessary
> >>>for a given revision was < N or something.
> >>>
> >>>Not sure if that's better or worse.
> >>>
> >>Oh, runnung a compress-only vdelta at checkpoints would definitely be
> >>better than storing fulltexts.
> >>
> >Here's the thing though, you'd never need to run a compress-only vdelta.
> >
> >If the original is stored as a compress-only vdelta, then checkpoints are
> >just combined delta of all previous deltas.
> >The combined delta would have all the previous deltas, including the
> >delta against the empty string, and thus, be able to regen from the empty string.
> >
> Sorry, you lost me here. I'm stupid tonight. Could you explain, in
> simple terms even I can understand, how this would wotrk?

Sure, let me explain how you can do it with fulltext first, then it's a
simple step to remove the fulltext.

The model below might not exactly match reality in subversion, it's just
for modeling purposes.

The first revision is a fulltext, as usual.
At a given checkpoint, rather than store the fulltext, we store a
combined delta consisting of all the deltas before us, back to the full
text.

Given:

ft -> d1 -> d2 -> d3 -> d4 -> d5 -> d6 -> d7 -> d8

We want to make d4 into a checkpoint.

Storing a fulltext at a checkpoint would give you:

ft -> d1 -> d2 -> d3 -> ft2 (ft<-d1<-d2<-d3<-d4) -> d5 -> d6 -> d7 -> d8

To get the fulltext of the file as it is after delta d8, you'd
grab d8+d7+d6+d5, combine them, and apply it to ft2.

To get ft2, we need to apply 4 deltas to the fulltext, then store that
fulltext.

Storing a combined delta at a checkpoint:

ft -> d1 -> d2 -> d3 -> newd4 (d1+d2+d3+d4) -> d5 -> d6 -> d7 -> d8

To get the fulltext of the file as it is after delta d8, you'd
grab newd4+d5+d6+d7+d8, combine them, and apply it to the original fulltext.

To get newd4, we need not apply any deltas, we just combine 4 deltas, and
store that.

This takes less space (both needed size during processing and
diskstorage), and is faster, assuming delta combination is faster.

However, this still needs the original fulltext to be able to generate a
later fulltext. It just lets you skip applying d3,d2,d1 along the way.

Of course, the fulltext isn't necessary, and removing it makes life a bit
easier.

Storing a combined delta at a checkpoint, with a delta against empty as
the starting point

New original:
d0 -> d1 -> d2 -> d3 -> d4 -> d5 -> d6 -> d7 -> d8

To get the fulltext of the file as it is after d8, without a checkpoint,
we just take all the deltas, combine them, and apply it to the empty
string.

Let's checkpoint d4 again by combining deltas.

d0 -> d1 -> d2 -> d3 -> newd4 (d0+d1+d2+d3+d4) -> d5 -> d6 -> d7 -> d8

To get the fulltext after d8, now we just need to grab newd4+d5+d6+d7+d8,
combine them, and apply against the empty string. No need to ever
deal with a real fulltext, we can just stop after newd4

To get newd4, we need not apply any deltas again, we just combine 5
deltas, and store that.

This takes up even less space than storing a fulltext original, and if
delta combining is faster than application of deltas, will be faster for
checkpointing than storing intermediate fulltexts.

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:06 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.