Re: extending the blame callback

From: Branko ÄŒibej <brane_at_apache.org>
Date: Mon, 14 Jan 2019 14:25:32 +0100

On 14.01.2019 13:36, Julian Foad wrote:
> Stefan, thanks for taking account of the feedback and updating the doc string in r1851197.
>
> I took a look and thought to rewrite the part about encoding and line splitting like this:
>
> * Character Encoding and Line Splitting:
> *
> * It is up to the client to determine the character encoding. The @a line
> * content is delivered without any encoding conversion. The line splitting
> * is designed to work with ASCII-compatible encodings including UTF-8. Any
> * of the byte sequences LF ("\n"), CR ("\n"), CR LF ("\r\n") ends a line
> * and is not included in @a line. The @a line content can include all other
> * byte values including zero (ASCII NUL).
>
> I dropped the reference to svn_subst_stream_translated() because it wasn't much use without saying what parameters it is given, and instead I was able to say exactly what happens overall.
>
> Problem 1: Using this blame function on a 16-bit character encoding is still really ugly: the receiver cannot know which byte sequences were stripped out.
>
> We should address this issue properly by passing a "line splitter" function in to svn_client_blame6().

I started on that then decided that svn_client_blame6 is far too narrow
scope for that. In order to do this right, we have to introduce a line
splitter callback to svn_subst_stream_translated.

My idea was to do something like this:

  * If the line-splitter is NULL, use the current default and check MIME
    type for text/*
  * If it's not NULL, ignore MIME type and provide the callback with
    file props so it can use that to determine encoding if it wants to.
  * Remove the ignore_mime_type flag and use the presence of a line
    splitter to convey the same intent; consequently,
  * Expose the "default" line splitter as a public function.

> Problem 2: Then I noticed that where this splitting algorithm is used, it is in the second pass over the file data, where we associate each "struct blame" item with a line of text.
>
> It looks like a different algorithm may be used in the first pass, when calculating the differences and constructing the blame list. It diffs all repository revisions of the file using svn_diff_file_diff_2() which splits lines according to the "ignore_eol_style" option from svn_client_blame6(diff_options.ignore_eol_style). The effect of "ignore_eol_style" is not entirely clear to me; it is used in svn_diff__normalize_buffer(). In addition, if svn_client_blame6() ends up processing a local WC revision of the file, that is processed through svn_client__get_normalized_stream(normalize_eols=(svn:eol-style == native)) before being diffed.
>
> This is horrible. Surely we should use a consistent "line splitter" everywhere.

Yes, we should introduce the line splitter concept everywhere in the
diff/patch/blame group and really anywhere where we're expected to read
"lines" from files (svn_stream_readline comes to mind). This is why I
stopped working on a splitter for blame ... the problem is much larger
than that and "fixing" blame would really not help by itself.

> I would expect this means it is possible for blame output to go wrong, with revision numbers assigned to the wrong lines.

This has always been possible due to the imprecise nature of diff, which
blame relies on. I wouldn't worry too much about it.

> I have been unable to construct such a wrong output, so far, after trying for half an hour or so. Possibly there is no code path that produces such a wrong output. But surely this is the sort of issue that leads to problems down the road.
>
> Can we re-write this properly?

As I said, this is not limited to blame. The current change in the blame
API at least makes some sense and is a localised change, that's why I
support it. It doesn't break anything that wasn't broken before.

Changing the way we handle text-like files for diff, blame and patch, on
the other hand, is quite a bit more involved and I'm afraid it'll touch
a lot of code. I wouldn't dream of rejecting svn_client_blame6 just
because it doesn't solve the larger problem.

-- Brane
Received on 2019-01-14 14:25:42 CET

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: svn commit: r1851244 - /subversion/trunk/subversion/include/svn_client.h"
Previous message: Julian Foad: "Blame callback: remove start/end parameters"
In reply to: Julian Foad: "Re: extending the blame callback"
Next in thread: Julian Foad: "Re: extending the blame callback"
Reply: Julian Foad: "Re: extending the blame callback"
Reply: Stefan Kueng: "Re: extending the blame callback"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]