svn diff --format=svnpatch, svn patch (rich patch formats)

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2007-03-29 16:50:33 CEST

[redirecting onto dev@ - this relates to the rich-patch-format SoC task]

On Thu, Mar 29, 2007 at 02:56:50PM +0200, Charles Acknin wrote:
> >- You're probably being a bit over-optimistic on the deliverables: 'svn
> > patch' will take longer than two weeks to implement, I'm sure.
>
> Actually I assigned 3 weeks for this task. And because we're talking
> about 8 9 and 10 weeks, this assumes I'd have already worked at least
> 7 weeks on Subversion, plus the "Preparation stage" that's also a
> significant part. Is is still over-optimistic? If you think so I'd
> have to shift my schedule a bit according to your advices.
>

Ok, that's a little more reasonable, but it really depends how complicated
the implementation has to be. You're writing a new subcommand that
needs to (possibly) parse an 'svnpatch' formatted file, apply unidiffs
(allowing some amount of fuzzing to be defined) and apply tree deltas
(again, with an as-yet-undefined amount of fuzzing). The error handling
and test cases alone will probably occupy most of your time :-).

> >- You mention that you're going to store the revision number of the wc -
> > you need to consider mixed-revision working copies too.
>
> We could keep a map of each entry with its revision or its checksum
> (latter sounds better). Assuming the user ran 'svn update' before
> 'svn diff --svnpatch' is not acceptable ( , is it? ).
>

Correct, it would be unacceptable to require that the user is at any
particular revision for their wc, if the files are 'compatible'
(unconflicting) if we attempted to apply the patch to that state.

You should look at what 'merge' does. Conceptually, merging performs
a diff operation to produce tree-delta, then applies that delta to your
working copy. Certain types of 'conflict' are accepted (for example:
trying to delete a file that's already schedule-deleted in your working
copy, IIRC).

You're _almost_ breaking that pipeline in two and providing a
serialisation format for the tree delta so that the operations become:
1. diff -> delta -> serialised format (svn diff --format=svnpatch),
2. serialised format -> delta -> merge (svn patch).

Except you're not _quite_ doing that, because you're planning to use
unidiffs for textual differences rather than the existing binary format,
so you can't reuse the existing merge logic as-is (and I wouldn't
recommend you tried to - the logic involved in running 'svn merge' is
similar to that for 'svn patch', but it's not the same, and trying to
have one function implement both would probably make things too complex).

At this stage, requiring that the working copy be essentially unmodified
would probably be okay. Later, we'd probably want to allow fuzzy
matching too, but that requires someone to reimplement patch's fuzzy
unidiff application.

I wouldn't bother storing a checksum or base revision number, except
possibly as commentary (we do already provide the base revision for
text diffs). Rather than determine whether the file is unmodified,
it probably would be better just to try applying the diff and fail if
you run into any problems.

(I sometimes use the base revision on diffs to allow me to get a wc to
a state where a diff will apply cleanly, but I expect that's an unusual
use case).

For binary files, the situation is different. We currently declare
those as unmergable, though there's a long-standing request to provide
type-specific diff/merge functions so that e.g. you could merge in changes
to a zip file (think OpenOffice document). For this work, I would perhaps
consider what information we might need to implement type-sensitive
merging (the base rev, probably), but not try to implement it at all.
For binary files, I think you would want the ancestor checksum so that
you could verify that the source had the same contents.

> >- Presumably, you're not planning to implement any kind of fuzzy
> > patching at this stage (i.e. it's either a clean patch or it doesn't
> > apply). That's fine initially, but you do need to decide what will
> > happen if you're half-way through patch application and find your
> > tree or binary delta doesn't apply - do you abort, roll back, or
> > continue on? Do you provide some switch to modify that behaviour?
> > (Also, what if, e.g., I have a wc at slightly different revisions
> > from that used to produce the patch, but I know that the files haven't
> > been changed in the interim? -- I'd expect that to be a fairly common
> > scenario).
>
> I see. Well we could either behave like patch(1), that is, have an
> "at most" behavior trying to apply each hunk until the end of the file
> whatever happens, or have this exclusive behavior -- clean or nothing
> -- that doesn't start until each hunk has been checked against
> destination before starting patching. We could do both and have a
> switch for that to set the behavior we expect from this subcommand.
>

I think I'd favour trying to apply each hunk/delta, and either failing
or (with --force, perhaps) continuing on regardless. Though the latter
has the potential to make an awful big mess of your local wc if you
can't apply a tree delta that causes conflicts later on in the
operation.

I'm not sure how you'd represent rejects for tree changes, either.
Perhaps a copy/rename where the target already exists could pick a new
name? (e.g. when applying 'rename A to B' and 'B' already exists,
create 'B.1' and use it wherever 'B' is mentioned in the incoming
patch?). But as mentioned above, this is probably something you'd want
to descope from the initial implementation.

> Concerning patch application, do we a) call patch(1) from Subversion
> or do we b) try to import patch(1)'s 1000 lines of code into
> Subversion? (a) might not be as portable as (b), plus add a
> dependency to Subversion. I don't think (a) is the way you have in
> mind. I'd just like to have your confirmation in this point.
>

I had a subset of (b) in mind, but it might be that (a) is more
appropriate, at least initially. It certainly avoids any need to
reimplement patch's unidiff parsing and fuzzy application logic, at the
cost of requiring an external program. Perhaps we could implement a
--patch-cmd switch similar to --diff-cmd, and provide an internal
implementation later? (I must admit I haven't thought about that much).

> >- Should 'svn patch' be able to apply regular (non-svnpatch) unified
> > diffs as well? People will probably expect that, but it'll probably
> > mean dealing with the variety of patch-alike formats that people
> > produce. Probably worth stating it as a desirable feature, then
> > placing it out of scope for the moment?
>
> As I mentioned in my proposal, "'svn patch' is able to read both parts
> of the patch file to avoid replication". This implies 'svn patch' is
> able to read contextual-file updates, which is pretty much what
> unidiff is now. Or did you intend to include contextual-file updates
> into the encoded chunk also?
>

No, it's more a question of, if we do write a parser for the unidiff
format, whether we are planning to deal with the various odd variant
formats that I believe patch can deal with, or whether we're planning
on saying "'svn patch' operates solely on the output of 'svn diff',
and while it happens to look largely like the output of GNU diff,
we won't support any old diff format, just ours".

Likewise, we should define up-front whether we're going to require that
the user apply patches only to a working copy, or whether they could
apply a patch to e.g. an exported tree. The latter would make it
possible to produce diffs that could be applied to e.g. downloaded
tarballs, but it might make it harder to implement (or not - we may have
to see whether we need the wc information to apply a patch).

> >- You should definitely look at what SVK is doing for inspiration, but
> > don't feel that you need be compatible - IIRC, SVK uses a base64'd
> > perl pickle of an internal datastructure - hardly portable!
>
> Right, I had a quick look at what SVK does. It seems there are these
> two parts: human-readable unidiff-like + computer-readable being
> z-compressed-base64-encoded. Close to what I want to do except I'll
> use Subversion's binary delta for the computer-readable part.
>

Sure, though you'll need to come up with a way to serialise the tree-delta
changes (copy file A to B, delete C, etc), since we don't currently have
a serialisation format for that.

Regards,
Malcolm

application/pgp-signature attachment: stored

Received on Thu Mar 29 16:51:04 2007

This message: [ Message body ]
Next message: Malcolm Rowe: "Re: Subversion/Neon win32 compilation"
Previous message: Michael Haggerty: "Re: Closing issue 2238: propset/del on URLs?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]