[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: use of UTF-8 (was: [RFC/PATCH] commit messages not 8-bit compatible)

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-05-31 00:06:03 CEST

Greg Stein <gstein@lyra.org> writes:
> > The interface calls log messages `char *' as of one day ago :-), and
>
> And if this conversation was two days ago, I would have said stringbuf.
>
> The point is: where we have char* in our interfaces, they are almost always
> representing some characters. I'm saying that we decided on saying they were
> UTF-8 and avoiding carrying around charset metadata with those.

Right, right. But the `log_msg' parameter to functions was not
`char *' until very recently, and for reasons having nothing to do
with some prior decision about them being UTF-8.

I'm sorry to keep repeating myself. It seems (maybe I'm
misunderstanding?) that you brought up type of those params as
indicating that some decision had already been made about their
charset. But they were counted-length strings (and thus could support
binary data!) until rev 2024, and were just caught up in the general
sweep of the conversion. Their new type indicates nothing about what
charset we should use for log messages. We have to make that decision
independently of their current type, and then make sure the type
*supports* whatever decision we make.

> To be concrete: either those char* params are UTF-8, or you add a second
> parameter to state their charset. (or you just go charset neutral which
> isn't really a good option)

Those aren't the only options here (and you're dismissing charset
neutral as an obviously bad third option, mentioned only to be
rejected, when in fact it's what this whole thread is really about).

I see three options on the table:

   - Keep them as char *, declare them UTF-8, and convert user input
     as best we can.

   - Keep them as char *, declare no particular charset, but don't
     allow zero bytes.

   - Convert them back to counted-length strings and treat them as
     binary data again (I guess this is the most militantly charset
     neutral option).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:15:48 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.