[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: deltification semi-rewrite starting now

From: Branko Èibej <brane_at_xbc.nu>
Date: 2001-09-24 23:45:48 CEST

kfogel@collab.net wrote:

>Mike Pilato and I have just reviewed Branko's deltification proposal,
>found at
>
> notes/delta-indexing-and-composition.txt
>
>and like what we see :-). We have a couple of questions that probably
>Branko can answer quickly, but basically we're going to start
>implementing it now, completion anticipated in 2 weeks max (thank
>goodness all the strings/reps separation is already done, so that
>whole wheel doesn't need to be reinvented).
>
>The plan is that we'll also implement a new `svnadmin' subcommand for
>deltifying and undeltifying revisions, or particular paths within
>revisions. That way, administrators have a way to make certain trees
>very efficient to retrieve -- for example, one might want to do this
>to a tagged release -- and also gives us an obvious way to deltify the
>storage of the current svn repository without perturbing the revision
>numbers. :-)
>
>Branko, a couple of questions regarding your lovely design:
>
>>So, here's my proposal
>>
>>1) Change the delta representation to index and store delta windows
>>separately
>>
>> DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
>> WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
>> OFFSET ::= number ;
>> REP-OFFSET ::= number;
>>
>>
>>The REP-KEY and REP-OFFSET in WINDOW are optional because, if the
>>differences between two file revisions is large enough, the diff could
>>in fact be larger than a compression-only vdelta of the text region. In
>>that case it makes more sense to compress the window than to store a diff.
>>
>
>We're not sure what REP-OFFSET is for.
>
>We're pretty sure we understand OFFSET. It's the offset into the
>reconstructed fulltext. The OFFSETs increase with each WINDOW in a
>DELTA, and you can tell a given window's reconstruction range either
>by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.
>
>Hopefully that's a correct summary. :-)
>
Yes, that is exactly right.

>But what is REP-OFFSET? We understand the REP-KEY that precedes it.
>That's simply the representation against whose fulltext this delta
>applies, right?
>
Let me think ... Yes.

> But why would we want an offset into that rep? We
>had thought the relevant offset(s) are part of the svndiff encoding.
>Is it a way of magically jumping over a certain number of windows and
>landing on the right one, in next-most-immediate source
>representation, or is it something else?
>
Although the offset is implicit in the svndiff, in real life you want to
find the source (fulltext) *before* decoding the window. Also, as I
noted, you might want to just use a (self-referencing) vdelta compress
instead of a diff, if the result of the compression is smaller than the
diff.

Hmm. It's been a long time since I wrote that, and as usual I left some
of the reasoning out. I'll have to think about this again. I sort of
remember it had to do with true random access to the text.

>We're still thinking about this, but maybe you can put us out of our
>misery quickly. :-)
>
Thanks, you just got me worrying about it. :-)

>Also, did you mean
>
> WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;
>
>i.e., with parens, rather than without? Yes, it would work without
>being a sublist, but for maintainability a sublist might be
>preferable...
>
I meant without params, but obviously it doesn't hurt to make a sublist
out of it. Use whatever you find more aesthetically pleasing. :-)

>Anyway, we can start coding right away, while awaiting clarification.
>Found no holes in the proposal; agree that there is a slight storage
>penalty, but the memory usage and speed gains are so overwhelming that
>it would be petty to complain about the *very* gently-sloped, albeit
>linear, increase in storage per deltified file.
>
Wonderful. Now I /really/ have to dust off and finish the delta combiner.

>The replacing of distant diffs with ones nearer the fulltext is a
>great idea; we'll probably wait on that until after the basic rewrite
>is done, however, as it is an optimization, though a very effective
>one.
>
Yes, it's an optimization only. What's more, it can be done entirely
off-line.

-- 
Brane �ibej   <brane_at_xbc.nu>            http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:42 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.