On 07.01.2019 19:58, Daniel Shahaf wrote:
> Stefan Kueng wrote on Mon, 07 Jan 2019 19:30 +0100:
>> On 06.01.2019 21:09, Daniel Shahaf wrote:
>>> Stefan Kueng wrote on Sun, Jan 06, 2019 at 20:40:28 +0100:
>>>> @@ -758,6 +759,33 @@
>>>> * will be true if the reason there is no blame information is that the line
>>>> * was modified locally. In all other cases @a local_change will be false.
>>>> *
>>>> + * @note if the text encoding of the file is not ASCII or utf8, the end-of-line
>>>> + * detection may lead to lines having a one byte offset. It is up to the client
>>>
>>> "One byte offset" is not true in general; it is true for UTF-16 but
>>> there are other encodings in the world. Besides, I would sooner point
>>> out that if the file isn't in UTF-8 (including ASCII), the end of line
>>> detection may be *wrong* since it looks for the byte 0x0A, which may not
>>> even be part of a (possibly multibyte) newline character.
>>>
>>> It's fine to give specific details about UTF-16, but we should give the
>>> more generally-applicable information first.
>>
>> The wording is "*may*", but I've reworded it slightly. I hope it's better.
>
> It _is_ better, thank you, but I agree with Julian's last post where he wrote that
> the docstring should just say that the line is split on LF bytes. The current
> patch's docstring implies the LF byte is necessarily part of a line terminator,
> which is true for UTF-8/16/32 but not necessarily true in arbitrary encodings.
next patch attached. I think this is better now.
Stefan
Received on 2019-01-07 20:57:29 CET