Stefan Kueng wrote on Mon, 07 Jan 2019 19:30 +0100:
> On 06.01.2019 21:09, Daniel Shahaf wrote:
> > Stefan Kueng wrote on Sun, Jan 06, 2019 at 20:40:28 +0100:
> >> @@ -758,6 +759,33 @@
> >> * will be true if the reason there is no blame information is that the line
> >> * was modified locally. In all other cases @a local_change will be false.
> >> *
> >> + * @note if the text encoding of the file is not ASCII or utf8, the end-of-line
> >> + * detection may lead to lines having a one byte offset. It is up to the client
> >
> > "One byte offset" is not true in general; it is true for UTF-16 but
> > there are other encodings in the world. Besides, I would sooner point
> > out that if the file isn't in UTF-8 (including ASCII), the end of line
> > detection may be *wrong* since it looks for the byte 0x0A, which may not
> > even be part of a (possibly multibyte) newline character.
> >
> > It's fine to give specific details about UTF-16, but we should give the
> > more generally-applicable information first.
>
> The wording is "*may*", but I've reworded it slightly. I hope it's better.
It _is_ better, thank you, but I agree with Julian's last post where he wrote that
the docstring should just say that the line is split on LF bytes. The current
patch's docstring implies the LF byte is necessarily part of a line terminator,
which is true for UTF-8/16/32 but not necessarily true in arbitrary encodings.
Snipped the rest — thanks for making those changes.
Cheers,
Daniel
Received on 2019-01-07 20:08:52 CET