On Tue, Dec 11, 2001 at 08:52:53PM -0000, Barry Scott wrote:
> Yes you do: mandatory feature.
>
> I work between Unix and Win32. Tools on Unix choke on CR/LF
> * /bin/sh
> * g++
> try:
> #define fred \
> 12
> and see A syntax error.
Fixed in 3.0, but it remains a valid point.
> What is the current state of play in the Mac world? They use CR and
> choked on LF or CRLF text files in the past.
OSX uses the Unix convention.
I thought about it a bit more:
1. A file may be text in the sense of being a stream of characters in
a standard encoding, with 0x10 and/or 0x13 bytes as newline
indicators, without having a MIME category of "text". For
instance, RFC 3023 specifies "application/xml" if the XML source is
not intended to be human-readable. Therefore, being "text" or
"binary" needs to be an orthogonal property to the MIME type.
2. Newline conversion is a special case of general character encoding
conversion. Consider a document which is written in a combination
of English and Russian. Some of the collaborators on this document
have Unicode-capable editors, and the initial revision was checked
in encoded in UTF-8. However, there are some writers who cannot or
will not abandon KOI8-R. SVN must convert the file on checkout or
they can't even read it.
It is probably appropriate to convert back before generating any
diffs or checking in, because repository operations become
difficult if the checked-in file's encoding isn't consistent across
revisions. (But I can see the data integrity issue arguing against
that.)
We don't need this generality for 1.0 but it would be good if the
scheme we eventually settle on could be extended to support the
general case later.
So here's my suggestion. Associate with each file two properties.
The "svn:charset" property works like the charset parameter on a
Content-Type header (see RFC 2046). The "svn:line-ending" property
says what line ending convention is used (LF, CRLF, CR)[1]. Both of
these properties indicate what the file's natural format within the
repository is. It's an error to have one but not the other. The
absence of both properties indicates a binary file.
Normally, Subversion just stores these properties, it doesn't do
anything special with them. However, users may specify a (charset,
line-ending) pair when they check out a working copy. If they do,
then all the files tagged with charset and line-ending properties
which are different, undergo conversion to the pair specified on
checkout, and back-conversion to their official pair before checkin or
diff generation.
Poke holes, anyone?
zw
[1] LFCR is a theoretical possibility, handled by gcc3 because we
heard some reports of its showing up in real life, but Subversion
probably needn't bother worrying about it.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:52 2006