[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Wrong assumption about unidiff syntax in libsvn_diff?

From: Branko Čibej <brane_at_wandisco.com>
Date: Tue, 20 Nov 2012 17:25:21 +0100

On 20.11.2012 17:15, Johan Corveleyn wrote:
> On Tue, Nov 20, 2012 at 4:09 PM, Stefan Sperling <stsp_at_elego.de> wrote:
>> On Tue, Nov 20, 2012 at 03:40:20PM +0100, Branko Čibej wrote:
>>> On 20.11.2012 15:05, Stefan Sperling wrote:
>>>> I just checked the UNIX patch code shipped with OpenBSD and it seems
>>>> you're right that it only looks for the backslash and ignores the comment.
>>>>
>>>> However, it seems in practice patches usually contain this string in
>>>> non-localised form. At least nobody has yet complained about svn patch
>>>> misparsing such patches.
>>> I'm distinctly remember a report on this very list, with a unidiff with
>>> a localized message attached. However, I can't find it in the archives.
>>> In any case it would be nice if our diff parser only looked at the \,
>>> not the whole message. I'm less worried about our not localizing that
>>> particular string.
>> Currently, lines such as:
>> \\ this is a comment
>> anywhere in the patch are silently ignored.
>>
>> If we follow your suggestion we must treat any such lines as hunk
>> terminators and assume the hunk lacks a trailing newline.
>> I suppose such a change is correct but it's a behaviour change so
>> I wouldn't want to backport it to 1.7.x.
>>
>> Below is a patch. Diff parser tests and patch tests are still passing.
>>
>> [[[
>> * subversion/libsvn_diff/parse-diff.c
>> (parse_next_hunk): Treat any line that starts with a backslash as a
>> hunk terminator, indicating that the hunk does not end with EOL.
>> Comments following the backslash might be localised or missing
>> in which case parsing the patch would fail.
>>
>> Suggested by: brane
>> ]]]
>>
>> Index: subversion/libsvn_diff/parse-diff.c
>> ===================================================================
>> --- subversion/libsvn_diff/parse-diff.c (revision 1411078)
>> +++ subversion/libsvn_diff/parse-diff.c (working copy)
>> @@ -555,15 +555,11 @@ parse_next_hunk(svn_diff_hunk_t **hunk,
>> pos = 0;
>> SVN_ERR(svn_io_file_seek(apr_file, APR_CUR, &pos, iterpool));
>>
>> - /* Lines starting with a backslash are comments, such as
>> + /* Lines starting with a backslash indicate a missing EOL:
>> * "\ No newline at end of file". */
>> if (line->data[0] == '\\')
>> {
>> - if (in_hunk &&
>> - ((!*is_property &&
>> - strcmp(line->data, "\\ No newline at end of file") == 0) ||
>> - (*is_property &&
>> - strcmp(line->data, "\\ No newline at end of property") == 0)))
>> + if (in_hunk)
>> {
>> char eolbuf[2];
>> apr_size_t len;
> I thought Branko said "leading backslash followed by a space":
>
> [[[
> Only the leading backslash and space are important for
> signaling the no-trailing-eoln state.
> ]]]
>
> (your argument about "\\ this is a comment" still holds though)

I'm wrong about the "and space" bit. In unidiff (and context-diff) the
first column is for instruction markers; +/> for imsertions, -/< for
removals, \ for no-newline-at-end, space for surrounding context. So as
far as looking only at the first column is concerned, that's correct.

I've frankly never heard about \ being a comment marker. However, we do
know the size of the hunk -- it's signalled in the hunk header -- and
the \ that signifies no-end-of-line would always appear /after/ the last
line of the hunk. In that respect, it would appear that our diff parser
has a bug, since it doesn't appear to handle these things correctly.

N.B., the original coment in that code is wrong, too; "\ No newline at
end of file" is *not* a comment, it's a processing instruction.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com
Received on 2012-11-20 17:26:01 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.