Marcus Sundman wrote:
> Most content management (CM) systems already convert line breaks in
> text files between the unix, mac and windows types. Text encoding is
> even more important since it has far reaching consequences (if you
> have written "foo" you do not want it to say "bar" when it's in
> production code) and errors can be very hard to detect. Despite this
> fact many CM systems seem to ignore the issue completely [...]
Ben wrote:
> [A description of Subversion's current features in this area,
> confirming that Subversion punts on character encodings of file
> contents]
There are a few things going on here:
* Traditionally, version control systems have been used to manage
source code, which is traditionally written in ASCII. Localized
user messages typically come from PO files or the
language-specific equivalent, which are stored in UTF-8 or in
mixed charsets, not in the local charset.
* Although many development teams mix Windows and Mac or Unix
machines, I bet not so many mix people using different charset
encodings.
* A version control system is supposed to be about versioning, not
so much about file interchange. (Perhaps a "CM system" is also
about file interchange, but Subversion isn't a CM system.) Adding
just the newline translation functions was bothersome to many
Subversion developers.
At first glance, it would be consistent with Subversion's current
feature offerings, and not a tremendous amount of code, to add a
feature where you can set svn:encoding to "native" or to an LC_CHARSET
value, and Subversion would transcode the file's contents from UTF-8
to the stated encoding after newline and keyword translation. (This
would make it extra-important to fix the way "svn diff" works, so that
it translates the text-base to wc format and diffs against the wc
file, instead of detranslating the wc file to text-base format and
diffing against the text base.)
I'd worry that such a feature would wind up being more trouble to
users than it was worth, though. The moment I use a character which
can't be represented in your encoding, you can no longer check out
that file properly.
Marcus wrote:
> Therefore we are faced with three options:
> A) Get all systems to standardize on one encoding and one type of
> line breaks.
> I think the first two options are out of the question [...]
While standardizing on one type of line breaks is likely to remain
painful for a long time, standardizing on one encoding (specifically,
UTF-8) seems like a great idea. Why do you say it's out of the
question?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Jun 28 20:00:08 2004