Greg Hudson wrote:
> I need to understand some things about the text delta format:
>
> 1. Why do we need it? (I'm not suggesting that we don't need
> it, but I want to understand why it's there.) Suppose we
> just had functions which did:
>
> <source stream, target stream> -> vcdiff stream
> <source stream, vcdiff stream> -> target stream
>
> What wouldn't this cover?
Well, for one thing, we may want to support different external
delta encodings someday. Diff + zip might be one, for CVS
compatibility.
>
> 2. Do windows within a text delta use restricted portions of
> the source and destination streams for copy instructions,
Well, at least in my implementation, they do :-)
In fact, they have to, otherwise you can't really call it a window.
(Which reminds me: svn_txdelta_window_t should store the sizes of the
source and target bits of the window.)
> or can the operations refer to any location within the
> source and destination streams? It looks like the latter,
> but I don't see how we can apply text deltas streamily in
> that case.
>
> For comparison on point #2: in the vcdiff format, instructions in a
> window can only refer to a particular part of the source and
> destination stream, but vcdiff doesn't intrinsically limit windows to
> moving forward within the source file. That is, window 1 could refer
> to a source data segment starting at an offset of 100K, and window 2
> could refer to a source data segment starting at 0K. So even
> disregarding the text delta intermediate, we would need to further
> restrict the vcdiff format to be able to apply diffs to streamy
> sources.
Good point. Maybe we can assume that we'll always have the whole
source available locally (for random access), which seems reasonable,
except that it violates the "Costs are proportional to change size"
assumption. Thoughts, anyone?
OTOH, the fact that vcdiff supports non-contiguous source windows
doesn't mean we can actually generate such deltas. You'd need a
pretty smart windowing algorithm, which we don't have now and
probably won't have soon. Let's burn that bridge when we get to it.
> > 4) Write text delta patch (source stream + text delta stream ->
> > target stream)
>
> I'll grab this task and leave you with the others, if that's okay.
Fine.
Branko wrote:
> > - Scan long inserts for byte runs that may make the generated
> > vcdiff smaller;
>
> I don't think this will help; inserts shouldn't contain any byte runs
> of significant length. The vdelta block copy generation algorithm
> will typically generate something like the following for a byte run:
>
> INSERT aaaa
> COPY <length of run minus 4>, <current position minus 4>
>
> Converting this to a RUN instruction would save a little space, but
> it's not quite as easy to recognize.
"INSERT aaaa" can be converted to "RUN 4 a", which is 40% smaller
if you use instruction table 0 (o.k., 2 bytes smaller :-).
Well, maybe that's overdoing it a bit ...
Brane
Received on Sat Oct 21 14:36:08 2006