Re: use of UTF-8

From: Colin Putney <colin_at_whistler.com>
Date: 2002-06-03 21:57:59 CEST

On Monday, June 3, 2002, at 12:18 PM, Karl Fogel wrote:

> Branko Èibej <brane@xbc.nu> writes:
>> Um. I'd rather say it opens up a huge can of very hungry carnivorous
>> worms. While it might be true that you can trust the locale settings
>> on most machines today (something I'm not at all sure about), you
>> can't trust programs. On Windows, for instance, I can set notepad as
>> my $EDITOR, then go and save the log message as UTF-8 or two different
>> kinds of UTF-16 (big- and little-endian). My locale info says I'm
>> using codepage 1250. Converting that text would produce
>> ... interesting? ... results.
>
> I'm still worried about this scenario too, but the reason I'm willing
> to risk it is that we can change Subversion if we discover we were
> wrong. So let's see how often problems happen in practice. After
> all, if conversion to UTF-8 *does* corrupt log messages in real life,
> then we can simply say "Well, that was a mistake", and
> backwards-compatibly change the client libraries's behavior.
>
> It would be simple enough to switch to email/mime-like behavior. Just
> stop converting to UTF-8, and start storing the literal bits of the
> log message, along with a best guess at the encoding for which they
> were written -- i.e., a new revision prop, `svn:log-message-encoding'
> or whatever. Revisions that don't have that property are assumed to
> be in UTF-8.

I'm wondering if this boils down to a question of what the 1.0 behaviour
will be. I'm pretty convinced that the email-like is the way to go, but
it does require some changes to the existing codebase.

Is this something that should be part of the I18N work that will be done
after 1.0? How much of the desire for UTF-8 is really a desire to get
1.0 out the door?

Colin Putney
Whistler.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jun 3 21:58:30 2002

This message: [ Message body ]
Next message: Karl Fogel: "Re: use of UTF-8"
Previous message: Karl Fogel: "Re: Removing non-committed files?"
In reply to: Karl Fogel: "Re: use of UTF-8"
Next in thread: Karl Fogel: "Re: use of UTF-8"
Reply: Karl Fogel: "Re: use of UTF-8"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]