Mike Pilato and I have just reviewed Branko's deltification proposal,
found at
notes/delta-indexing-and-composition.txt
and like what we see :-). We have a couple of questions that probably
Branko can answer quickly, but basically we're going to start
implementing it now, completion anticipated in 2 weeks max (thank
goodness all the strings/reps separation is already done, so that
whole wheel doesn't need to be reinvented).
The plan is that we'll also implement a new `svnadmin' subcommand for
deltifying and undeltifying revisions, or particular paths within
revisions. That way, administrators have a way to make certain trees
very efficient to retrieve -- for example, one might want to do this
to a tagged release -- and also gives us an obvious way to deltify the
storage of the current svn repository without perturbing the revision
numbers. :-)
Branko, a couple of questions regarding your lovely design:
> So, here's my proposal
>
> 1) Change the delta representation to index and store delta windows
> separately
>
> DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
> WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
> OFFSET ::= number ;
> REP-OFFSET ::= number;
>
>
> The REP-KEY and REP-OFFSET in WINDOW are optional because, if the
> differences between two file revisions is large enough, the diff could
> in fact be larger than a compression-only vdelta of the text region. In
> that case it makes more sense to compress the window than to store a diff.
We're not sure what REP-OFFSET is for.
We're pretty sure we understand OFFSET. It's the offset into the
reconstructed fulltext. The OFFSETs increase with each WINDOW in a
DELTA, and you can tell a given window's reconstruction range either
by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.
Hopefully that's a correct summary. :-)
But what is REP-OFFSET? We understand the REP-KEY that precedes it.
That's simply the representation against whose fulltext this delta
applies, right? But why would we want an offset into that rep? We
had thought the relevant offset(s) are part of the svndiff encoding.
Is it a way of magically jumping over a certain number of windows and
landing on the right one, in next-most-immediate source
representation, or is it something else?
We're still thinking about this, but maybe you can put us out of our
misery quickly. :-)
Also, did you mean
WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;
i.e., with parens, rather than without? Yes, it would work without
being a sublist, but for maintainability a sublist might be
preferable...
Anyway, we can start coding right away, while awaiting clarification.
Found no holes in the proposal; agree that there is a slight storage
penalty, but the memory usage and speed gains are so overwhelming that
it would be petty to complain about the *very* gently-sloped, albeit
linear, increase in storage per deltified file.
The replacing of distant diffs with ones nearer the fulltext is a
great idea; we'll probably wait on that until after the basic rewrite
is done, however, as it is an optimization, though a very effective
one.
-Karl and Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:42 2006