On 14.01.2019 14:25, Branko Čibej wrote:
> On 14.01.2019 13:36, Julian Foad wrote:
>> Stefan, thanks for taking account of the feedback and updating the doc string in r1851197.
>>
>> I took a look and thought to rewrite the part about encoding and line splitting like this:
>>
>> * Character Encoding and Line Splitting:
>> *
>> * It is up to the client to determine the character encoding. The @a line
>> * content is delivered without any encoding conversion. The line splitting
>> * is designed to work with ASCII-compatible encodings including UTF-8. Any
>> * of the byte sequences LF ("\n"), CR ("\n"), CR LF ("\r\n") ends a line
>> * and is not included in @a line. The @a line content can include all other
>> * byte values including zero (ASCII NUL).
>>
>> I dropped the reference to svn_subst_stream_translated() because it wasn't much use without saying what parameters it is given, and instead I was able to say exactly what happens overall.
>>
>> Problem 1: Using this blame function on a 16-bit character encoding is still really ugly: the receiver cannot know which byte sequences were stripped out.
>>
>> We should address this issue properly by passing a "line splitter" function in to svn_client_blame6().
>
>
> I started on that then decided that svn_client_blame6 is far too narrow
> scope for that. In order to do this right, we have to introduce a line
> splitter callback to svn_subst_stream_translated.
>
> My idea was to do something like this:
>
> * If the line-splitter is NULL, use the current default and check MIME
> type for text/*
> * If it's not NULL, ignore MIME type and provide the callback with
> file props so it can use that to determine encoding if it wants to.
> * Remove the ignore_mime_type flag and use the presence of a line
> splitter to convey the same intent; consequently,
> * Expose the "default" line splitter as a public function.
one problem I encountered so far is that the encoding detection requires
data to work properly. But if e.g. the first few lines of a file have
only a few chars, then that's not enough.
Not sure how this could be done, but since the blame function already
has the file loaded into memory, could this be passed to the client in
any way? Maybe the first time the callback is called? Or on every call?
What I mean is instead of passing the svn_string_t for only a line, pass
it for the whole (unmodified/untranslated) file too.
Stefan
Received on 2019-01-14 18:21:27 CET