> if we're going to have this be changeable with a client side config
> then i say we might as well not do any re-encoding at all. having
> utf-8 as the charset of the log messages is only helpful (in my
> opinion) if you can ALWAYS count on it being in that charset.
Agreed, up to a point. If converting log messages to UTF-8 is optional,
then you still need out-of-band information to display the log message
correctly. This might be supplied by the user via a --charset= switch, a
default in the config file, or whatever. So the user must specify a
character set, either by experimenting until she finds the right one or
by knowing the convention.
However, providing the capability to convert encodings does make it
easier for users to establish a policy of universal UTF-8. This is
It's important because it's a good way to support multilingual
development. In the past this wasn't much of a requirement, simply
because it just didn't happen very much. But the Internet is becoming
pervasive enough that there projects in which development is going on in
several different languages at once. Ruby is a good example, with
development going on in Japanese and English, with bilingual developers
coordinating between the two groups. Even if development happens in one
language, there are likely to be multilingual documentation and i18n
efforts in large projects. I think multilingual development will become
more and more common as time goes on.
The project goals on the Subersion home page are fairly narrow and
technical, but I think there's a broader philosophical goal implied by
the project itself: to promote collaboration and cooperation between far
flung individuals. Not requiring those individuals to cooperate using a
uniform language furthers that goal.
So, given robust support for multilingual development as a goal (what do
other think about this?) I can see two strategies:
1) Do as Marcus and gstein propos and decree that log messages will be
stored as UTF-8 in the repository and do the necessary conversion on
input and output as a crutch for those without Unicode capable-tools
2) Decree that log messages must be text, and store the metadata
specifiying the character set. Have the clients pass the character set
to the core libraries and have the libraries return the character set
along with the log messages at retrieval time.
I think we're still thrashing through the issue, so a vote is premature.
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Sat Jun 1 14:11:22 2002