[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ascii/binary detection.

From: Branko Čibej <brane_at_xbc.nu>
Date: 2001-08-01 22:37:52 CEST

peter.westlake@arm.com wrote:

>>There are (used to be?) systems where lines are delimited from both
>>ends. On VMS, a line started with a LF and ended with a CR, IIRC. How
>>about a more generic approach: the value of this property is a pair of
>>strings, one for the BOL and one for the EOL marker. 'native' would
>>still have the same meaning, while 'dos', 'unix' and 'mac' would be
>>aliases for ':\r\n', ':\n' and ':\n\r' (or whatever), respectively. A
>>VMS guy would make 'native' an alias for '\n:\r'.
>It's probably best not to use "\n" and "\r" because "\n" is ambiguous.
>To a Mac programmer, for instance, it means a CR, and to a Windows
>programmer it means CRLF - maybe not in C, but certainly in Perl.
>Stick to numeric values.
This are Subversion properties, not string constants in your favourite
programming language. We can define "\n" to always mean "\x0A", and it's
a good mnemonic.

>Another thought: don't assume a file is binary just because it doesn't
>have any CR or LF characters! It might use the Unicode line separator
>LS (2028) or paragraph separator PS (2029), Or even EBCDIC NEL,
>which is in Unicode as 0085. This is all discussed at:
Until we can handle Unicode, EBCDIC, et al. natively, we'll have to
treat them as binary.

>May I suggest LS as the repository's native newline character?
That would only make sense for Unicode., and we don't handle Unicode
natively (yet), see above.

Right now, we'll only handle ASCII derivatives (that includes UTF-8).
Recognizing EBCDIC would be nice, but I don't think any kind of
heuristic will help us here: the user will have to say charset=EBCDIC
(whereupon we ask: which dialect? :-). Or we could make that the default
character set for text files where EBCDIC is the native single-byte

Whatever; all of this is post-M3, IMHO.


Brane �ibej
    home:   <brane_at_xbc.nu>             http://www.xbc.nu/brane/
    work:   <branko.cibej_at_hermes.si>   http://www.hermes-softlab.com/
     ACM :   <brane_at_acm.org>            http://www.acm.org/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:34 2006

This is an archived mail posted to the Subversion Dev mailing list.