[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: use of UTF-8

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-06-03 21:18:33 CEST

Branko Čibej <brane@xbc.nu> writes:
> Um. I'd rather say it opens up a huge can of very hungry carnivorous
> worms. While it might be true that you can trust the locale settings
> on most machines today (something I'm not at all sure about), you
> can't trust programs. On Windows, for instance, I can set notepad as
> my $EDITOR, then go and save the log message as UTF-8 or two different
> kinds of UTF-16 (big- and little-endian). My locale info says I'm
> using codepage 1250. Converting that text would produce
> ... interesting? ... results.

I'm still worried about this scenario too, but the reason I'm willing
to risk it is that we can change Subversion if we discover we were
wrong. So let's see how often problems happen in practice. After
all, if conversion to UTF-8 *does* corrupt log messages in real life,
then we can simply say "Well, that was a mistake", and
backwards-compatibly change the client libraries's behavior.

It would be simple enough to switch to email/mime-like behavior. Just
stop converting to UTF-8, and start storing the literal bits of the
log message, along with a best guess at the encoding for which they
were written -- i.e., a new revision prop, `svn:log-message-encoding'
or whatever. Revisions that don't have that property are assumed to
be in UTF-8.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jun 3 21:22:41 2002

This is an archived mail posted to the Subversion Dev mailing list.