[Daniel Shahaf]
> The current patch's docstring implies the LF byte is necessarily part
> of a line terminator, which is true for UTF-8/16/32 but not
> necessarily true in arbitrary encodings.
Nitpick: It is true in UTF-8, but not -16 or -32. There are about 70
characters in the BMP which, in UTF-16LE (and -32LE), begin with 0A:
$ grep '^..0A;' /usr/share/misc/unicode.gz | head
000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
010A;LATIN CAPITAL LETTER C WITH DOT ABOVE;Lu;0;L;0043 0307;;;;N;LATIN CAPITAL LETTER C DOT;;;010B;
020A;LATIN CAPITAL LETTER I WITH INVERTED BREVE;Lu;0;L;0049 0311;;;;N;;;;020B;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
040A;CYRILLIC CAPITAL LETTER NJE;Lu;0;L;;;;;N;;;;045A;
050A;CYRILLIC CAPITAL LETTER KOMI NJE;Lu;0;L;;;;;N;;;;050B;
060A;ARABIC-INDIC PER TEN THOUSAND SIGN;Po;0;ET;;;;;N;;;;;
070A;SYRIAC CONTRACTION;Po;0;AL;;;;;N;;;;;
080A;SAMARITAN LETTER KAAF;Lo;0;R;;;;;N;;;;;
090A;DEVANAGARI LETTER UU;Lo;0;L;;;;;N;;;;;
Received on 2019-01-08 02:37:50 CET