Re: Tree delta patch format - a bit of process

From: Charles Acknin <charlesacknin_at_gmail.com>
Date: 2007-04-03 15:47:14 CEST

On 4/1/07, Erik Huelsmann <ehuels@gmail.com> wrote:
> [ This mail may seem directed mostly at Charles and Nicolas, but I
> think it may apply to all of us ]
>
>
> As you may or may not know, we've had discussions about the patch
> format and what it should look like before. As before, we've started
> out this time (again) with a proposal for a format. This proposal then
> triggers a lot of reactions, many of which point out flaws in the
> design.
>
> I have wanted the patch discussion to materialize for a long time now,
> so I'm hoping this time will be *it*.
>
> I see a problem with how we've gone about achieving this goal: we've
> forgotten to define (or assumed implicit) the use-cases for this patch
> format.
>
> So, I'd like us to come up with a number of use-cases, then decide
> which ones we're actually going to solve. From that, we can distill
> the requirements for the format. From there on we can implement it and
> use that as a solid basis to conquer the world. :-)

There are a couple of _patch format_ use-cases that I have in mind, randomly:
(below 'patch' means a patch with the new format we're talking about)

a) a user wants to read the patch
b) a user wants his patch to represent a complete set of changes (bin
files, directories, props)
c) when a user applies the patch, the target path differs from the
patch in some ways; it could be caused by:
- tree changes
- file changes (context or checksum don't match)
d) a user wants to modify the patch
e) a user applies the patch on a mixed-revision WC

Now, assuming those few use-cases are right, some problems arise.

(a) and (d) are incompatible with (b) at least for binary files. How
do we represent the change of a binary file? Do we build some sort of
[context -> data -> context] binary hunk (call it binary-patch(1)) or
do we send the whole new binary file? I guess the latter doesn't make
the world happy with rather large files but would be way easier and
fit with the SoC timeframe. BTW, does anybody have a clue how large
binary files are (average with Subversion's users)?

If we store tree and property changes into an encoded hunk, (a) and
possibly (d) disappear again. One solution would be to store those
changes in plain-text but I'm not sure this is the right way to take.

(c) and (e) are all about fuzzy patching. It is OK if we (initially)
decide to call patch(1) in Subversion to take advantage of its
contextual-file non-precise patching feature. But about tree changes,
we'd have to define the amount of fuzzing we want. I'm doubtful this
could be done in a 2-month period. This has been discussed elsethread
(Malcolm's redirection, March 29).

Any thoughts?

Cheers,

Charles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 3 15:47:32 2007

This message: [ Message body ]
Next message: Michael Sinz: "Finding other descendants - efficiently"
Previous message: D.J. Heap: "Re: [patch] add copying of bdb and sqlite dlls to win-tests.py"
In reply to: Erik Huelsmann: "Tree delta patch format - a bit of process"
Next in thread: Michael Brouwer: "Re: Tree delta patch format - a bit of process"
Reply: Michael Brouwer: "Re: Tree delta patch format - a bit of process"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]