[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ascii/binary detection.

From: Branko Čibej <brane_at_xbc.nu>
Date: 2001-08-01 22:37:52 CEST

peter.westlake@arm.com wrote:

>>There are (used to be?) systems where lines are delimited from both
>>ends. On VMS, a line started with a LF and ended with a CR, IIRC. How
>>about a more generic approach: the value of this property is a pair of
>>strings, one for the BOL and one for the EOL marker. 'native' would
>>still have the same meaning, while 'dos', 'unix' and 'mac' would be
>>aliases for ':\r\n', ':\n' and ':\n\r' (or whatever), respectively. A
>>VMS guy would make 'native' an alias for '\n:\r'.
>>
>
>It's probably best not to use "\n" and "\r" because "\n" is ambiguous.
>To a Mac programmer, for instance, it means a CR, and to a Windows
>programmer it means CRLF - maybe not in C, but certainly in Perl.
>Stick to numeric values.
>
This are Subversion properties, not string constants in your favourite
programming language. We can define "\n" to always mean "\x0A", and it's
a good mnemonic.

>Another thought: don't assume a file is binary just because it doesn't
>have any CR or LF characters! It might use the Unicode line separator
>LS (2028) or paragraph separator PS (2029), Or even EBCDIC NEL,
>which is in Unicode as 0085. This is all discussed at:
>
Until we can handle Unicode, EBCDIC, et al. natively, we'll have to
treat them as binary.

><http://www.unicode.org/unicode/reports/tr13/>
>
>May I suggest LS as the repository's native newline character?
>
That would only make sense for Unicode., and we don't handle Unicode
natively (yet), see above.

Right now, we'll only handle ASCII derivatives (that includes UTF-8).
Recognizing EBCDIC would be nice, but I don't think any kind of
heuristic will help us here: the user will have to say charset=EBCDIC
(whereupon we ask: which dialect? :-). Or we could make that the default
character set for text files where EBCDIC is the native single-byte
encoding.

Whatever; all of this is post-M3, IMHO.

    Brane

-- 
Brane �ibej
    home:   <brane_at_xbc.nu>             http://www.xbc.nu/brane/
    work:   <branko.cibej_at_hermes.si>   http://www.hermes-softlab.com/
     ACM :   <brane_at_acm.org>            http://www.acm.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:34 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.