[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: extending the blame callback

From: Peter Samuelson <peters_at_p12n.org>
Date: Mon, 7 Jan 2019 19:37:37 -0600

[Daniel Shahaf]
> The current patch's docstring implies the LF byte is necessarily part
> of a line terminator, which is true for UTF-8/16/32 but not
> necessarily true in arbitrary encodings.

Nitpick: It is true in UTF-8, but not -16 or -32. There are about 70
characters in the BMP which, in UTF-16LE (and -32LE), begin with 0A:

    $ grep '^..0A;' /usr/share/misc/unicode.gz | head
    000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
    010A;LATIN CAPITAL LETTER C WITH DOT ABOVE;Lu;0;L;0043 0307;;;;N;LATIN CAPITAL LETTER C DOT;;;010B;
    020A;LATIN CAPITAL LETTER I WITH INVERTED BREVE;Lu;0;L;0049 0311;;;;N;;;;020B;
    030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
    040A;CYRILLIC CAPITAL LETTER NJE;Lu;0;L;;;;;N;;;;045A;
    050A;CYRILLIC CAPITAL LETTER KOMI NJE;Lu;0;L;;;;;N;;;;050B;
    060A;ARABIC-INDIC PER TEN THOUSAND SIGN;Po;0;ET;;;;;N;;;;;
    070A;SYRIAC CONTRACTION;Po;0;AL;;;;;N;;;;;
    080A;SAMARITAN LETTER KAAF;Lo;0;R;;;;;N;;;;;
    090A;DEVANAGARI LETTER UU;Lo;0;L;;;;;N;;;;;
Received on 2019-01-08 02:37:50 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.