Re: "svn patch" writes bad "original" file when launching interactive merge

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 02 Oct 2009 11:38:45 +0100

On Fri, 2009-10-02 at 10:55 +0100, Stefan Sperling wrote:
> On Thu, Oct 01, 2009 at 09:22:10PM +0100, Julian Foad wrote:
> > I have a patch in which some hunks apply perfectly and others should
> > apply with a context match at a different line number (offset by N
> > lines).
> >
> > It applies fine with GNU "patch". When I apply it with "svn patch", the
> > hunks that need an offset don't apply and instead produce conflicts. I
> > believe that's a known deficiency at the moment, and that's not the bug
> > I'm raising.
>
> What is known is that the hunk application logic is way too naive right
> now to deal nicely with scenarios such as overlapping hunks or hunks that
> match at a later offset than subsequent hunks in the patch file.
>
> You get conflicts if 'svn patch' was unable to find 100% matching context
> for a hunk anywhere. We're currently not only matching a hunk's context,
> i.e. lines starting with ' ', but also the content that a hunk is deleting,
> i.e. lines starting with '-'.
>
> UNIX patch is also fuzzy, that is, it will ignore first one, then two lines
> of context if it cannot find an exact match. 'svn patch' is not fuzzy yet.

>From what you say below, the matching is the problem. No fuzziness is
needed in my patch file, just a line-number offset. I don't think the
hunks overlap or anything. I'll have to give you a repro script.

Also I think this was with trunk_at_39501; don't know if there were any
fixes since then already.

> > When I chose "(l)aunch a merge tool" in the interactive resolver, the
> > "foo.svnpatch.original" file it produces is not the original file but
> > instead contains some "from" hunks from the patch inserted in it (at
> > inappropriate places, as it happens).
>
> I can explain why this is happening.
>
> The hunks in the patch file give us some information about what the
> "true original file" (the unmodified file the patch is based on) looked
> like. By looking at context lines and deleted lines of the hunk (i.e.
> lines starting with either ' ' or '-'), and the hunk offset, we know what,
> say, lines 5 to 10 looked like in the "true original file".
> So with each hunk, we learn a small bit about the "true original file".
>
> But since we don't have a full copy of the "true original file", we don't
> know anything about lines not mentioned in the hunks, other than that they
> must have existed since the hunk is sitting at some subsequent line offset.
>
> So what we do is, we read the patch target file as it exists in WORKING,
> and use lines from it for the missing bits, assuming that this will
> approximately match the "true original file".
>
> We process the "true modified file" (i.e. based on lines starting with
> either ' ' or '+') in a similar way to get an approximate idea of what
> the "true original file" must have looked like after it was patched.
>
> The two reconstructed files, and the target file in the working copy,
> is what we run the 3-way merge with. That's the idea.
> But it's not set in stone. We can still tweak and improve it,
> or throw it out and come up with a different approach.

Ah, OK, thanks for the explanation. That makes some degree of sense.

What it's doing, then, is constructing ".original" and ".modified" files
that are something like the real original and modified files must have
been, except that only the line numbers whose content is known from the
patch are guaranteed to have the correct content, and the rest of the
lines have content taken from the corresponding lines of the WC file.

I see the idea. The good part is that the diff between ".original" and
".modified" is guaranteed to be the same as the patch.

The not-so-good part is that the rest of the file content may be
helpful, or may be confusing to the user, depending on whether it really
matches up well. I don't know if the algorithm is yet doing all it can
to make the line-number-ranges match up when it takes "corresponding"
lines from the WC file, and we know it isn't using fuzzy-matching to
help it. But I don't know if this mode of operation will ever produce
really useful results. We might need to switch to something like: insert
blank lines for all the guessed lines. Or insert a fixed block of marker
text for each gap. In that way, it will be up to the 3-way-merge tool to
spot the modified parts and match them up (cleverly, fuzzy) with the WC
file, and we will avoid confusing the tool with blocks of guessed
content.

> > What I expect is that the "foo.svnpatch.original" file should be an
> > exact copy of what the file-to-be-patched contained before the patch
> > application began.
>
> The reconstructed files are labeled "svnpatch.original" and
> "svnpatch.modified". The target file, which you expect to be labeled
> "svnpatch.original", is labeled "svnpatch.working".
>
> We obviously have to pass the working file as merge target.
> If we pass the working file as merge-left, too, our only option is to
> pass the reconstructed "modified" file as merge-right.
> The "original" file would not be involved in the merge step at all,
> it would only be used when matching context of hunks.
>
> I'd have to think more about this and possibly play around with this
> approach a bit to see which approach is better.

Right. Thanks again.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402840
Received on 2009-10-02 12:39:07 CEST

This message: [ Message body ]
Next message: Hyrum K. Wright: "Re: svn trunk and wc messed up?"
Previous message: Stefan Sperling: "Re: svn commit: r39737 - in trunk/subversion: include/private libsvn_client libsvn_wc tests/cmdline"
In reply to: Stefan Sperling: "Re: "svn patch" writes bad "original" file when launching interactive merge"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]