Re: Improved patch format - SoC

From: Branko Čibej <brane_at_xbc.nu>
Date: 2007-04-01 09:16:35 CEST

Nicolás Lichtmaier wrote:
> == Introduction ==
>
> The diff/patch tools have been great tools. The format they use
> (called "unified diff")

Correction: "unified diff" is in fact a later addition to the set of
diff formats (the original diff didn't have it), and is by no means
universally accepted as "the standard". IIRC the GCC project, for
example, insists that patshes be submitted in context-diff, not
unified-diff format. Reading man diff(3) will give you a list of
different formats that diff can output, and consequently patch can accept.

An improved diff format that aims to be(come) compatible with patch
should be applicable to at least the original plain diff, context-diff
and unified-diff; possible even ed-script diff (see diff -e), though I
suspect that one looses too much context to be useful for patches.

[...]

> Just as Subversion took the best of CVS concepts and created a better
> version control system, we should take this format and enhance it so
> that it can serve Subversion needs. Those new needs are three:
>
> * The ability of describe tree modifications (renames, deletions, etc.),

This is more complicated than it looks, if you insist on compatibility
with current diff/patch. Diff is file-based, whereas tree modifications
are not.

For example, patches are separable: you can take a patch which contains
diffs of several files, split it apart, and apply each file's hunk
separately. When you add tree-modification metadata to such patches,
this is (in general) no longer true; and to be safe, a patch program
that accepts such an enhanced diff format should be able to warn you
that you're missing some tree modifications when you apply a patch.

I certainly don't know how to do this by simply extending, e.g., unidiff.

[...]

> * The patch should convey all the meaning implied in a 'svn merge'
> operation that is considered reasonable. A patch should be like
> tearing the merge action into two steps, i.e.: 'diff + patch = merge'.

diff + patch != merge

I don't know how that misconception came about, but merge typically
looks at three sources, not two (and note that there's a prototype 4-way
merge in Subversion's source tree).

If you want a format that can convey all the meaning of a merge
operation, then I thing you're better off breaking compatibility with
the current format.

[...]
> * Support for binary diffs. Binary diffs would just be diffs created
> with the binary diff algorithm already present in subversion, and then
> encoded in base64 to get a text representation. The algorithm name
> should be stated to allow for change in the future.

This one is tricky. The fundamental reason for the success of the
diff/patch pair is that the most common diff formats contain enough
context to allow inexact patches. Patch is smart enough to find where to
apply a patch even if a file has been (slightly) modified. To do so, it
makes a number of assumptions about how text files are organized; these
assumptions typically work for source code and consistently formatted
text, but will usually fall down when, for example, your text format is
one paragraph per line (which is fairly common in word processors and
the like).

With a generic binary file, you can define context in a similar way
(though definitely not in combination with a block-copy delta
algorithm!), but you can't invent good heuristics that would make
inexact patching work, except in very, very limited cases -- that is,
when your patch program knows everything about the format of the binary
file it's handling.

If you don't have inexact patch, then the diff/patch thing becomes
pretty much useless for code exchange (/and/ for merging).

[...]

To sum up: To do what you propose, you'd have to first invent theory
that I at least haven't seen yet. I don't think this project is suitable
for a two-month summer hack ...

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Apr 1 09:17:14 2007

This message: [ Message body ]
Next message: Kouhei Sutou: "svn_wc_relocation_validator3_t's documentation needs update?"
Previous message: Branko Čibej: "Re: Issue 2732 (was Unhandled exception when stopping apache)"
Maybe in reply to: Chris Morgan: "Re: Improved patch format - SoC"
Next in thread: Nicolás Lichtmaier: "Re: Improved patch format - SoC"
Reply: Nicolás Lichtmaier: "Re: Improved patch format - SoC"
Reply: Daniel Berlin: "Re: Improved patch format - SoC"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]