[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] $LastChangedDate$ encoding

From: Peter Samuelson <peter_at_p12n.org>
Date: 2006-05-07 16:07:14 CEST

[Vincent Lefevre]
> > The encoding should be consistent with filenames, which are also
> > specific to a WC.
>
> There's absolutely no reason why they should be the same.

I gave you a reason earlier. There are many situations where you want
to embed a filename inside a file. (I said this in the context of XML
files, but that's by no means the only example.)

> * Using UTF-8 (current behavior):
> + Pros: fixed encoding; no loss; compatible with file formats
> based on UTF-8, which are common (UTF-8 is more or less the
> default encoding nowadays).
> + Cons: may be incompatible with some documents.

Also may be incompatible with user expectations.

I daresay it is very common to use Subversion in an environment where
either you're only a single user, or all users have the same locale
settings. Subversion localises everything very well - users never have
to know or care that it is thinking in UTF-8 under the hood. The only
instance I know where it does not do this is in keyword expansions.

Also, you seem to assume that the common case is files with a
well-defined encoding, like XML documents. I doubt that. I guess it
is more common to use Subversion to store text documents and program
source code, not XML. And program source code rarely has a
well-defined encoding; typically users write their comments in the same
encoding they are using for the rest of their computing.

In short, I bet it is _more_ common for a file under version control to
match the encoding of a user's LC_CTYPE than it is for the file to be
UTF-8 when the user's LC_CTYPE is not.

(Side note: if it were really true that "UTF-8 is more or less the
default encoding nowadays", then this whole question would be a
non-issue, as users would all be using UTF-8 for LC_CTYPE.)

> * Using the encoding specified by the locales:
> + Pros: compatible with tools that don't understand encodings
> different from the one specified by the locales.

Which is to say, most tools with most file formats. At least on my
Unix box, very few tools I use automatically recode file content when
outputting to my terminal. I can only think of vorbiscomment and
iconv. (And vorbiscomment doesn't count - I don't think you can put
keywords into ogg vorbis files, since they expand to variable lengths.)

Received on Sun May 7 16:07:36 2006

This is an archived mail posted to the Subversion Dev mailing list.