[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Encoding in our APIs

From: Branko Čibej <brane_at_xbc.nu>
Date: 2005-05-02 12:02:23 CEST

Peter N. Lundblad wrote:

>what happens when I/O is redirected? Does it still output in the console
Ah, I'm glad you asked that. Do you know, I haven't the slightest idea
what happens. But I can certainly test it with a quick program, so...
[2'17'' later]
A simple test shows that we'll use the same encoding regardless of
whether the output is redirected or not.

>>The really big problem for "svn diff" is that, unlike most other
>>commands, it produces output before the command-line client has a chance
>>to convert it (but you already know that :). In most cases, the internal
>>(usually UTF-8) strings are converted to whatever the console encoding
>>is inside the svn_cmdline_printf functions. We can't do this with a diff
>>stream (and I suspect blame has similar problems).
>I'm changing svn_diff_file_output_unified to taken an header_encoding
>argument. So the problem is more like, if you say
>svn diff > patchfile.diff
>you could expect the headders to be in the native encoding (or the file's
>encoding if that's known), but if you do
>svn diff | more
>you probably want headers to be in the console encoding. But then the
>encoding is inconsistent wit the file's encoding (unless it is the console
>codepage, which seems uncommon if it is the old DOS 8bit encodings at
>least). A similar problem should exist for all our commands on Windows.

>Maybe this isn't a very big deal after all. Most people on Windows is use
>a GUI, which will want consistent encodings. Maybe we should just use the
>locale encoding and later use the file's encoding if that's known.
Actually, for GUIs, it would be a lot better if the headers _were_ in
UTF-8, just not part of the output stream but passed along in some other
way. That would be best for the command-line client, too, it would just
use svn_cmdline_printf to output the headers, and the conversion would
be automatic.

>I don't know, since I don'"t know how this afects people on Windows in
>reality. What I *do* know, however, is that just outputting UTF8 is wrong
>and I'd like to fix that.

>Any input from Windows people is appreciated.
As I said, I don't think I have a good answer myself, except "stay away
from Windows command line if you value your sanity"...

One of the reasons I'd like to see headers sent to the client
out-of-band is that I want to convert the Windows console I/O to use
wide-character functions (on Windows variants that support them),
because then we'd get Unicode<->console conversion for free (*and* it
would include transliteration on output, no bad thing). This would
already work for all commands except "diff" and "blame", but can't work
for those because the annotations are mixed with the file data.

I do think that the output from our internal diff can (and should) be
different than the output from external diffs.

-- Brane

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon May 2 12:03:13 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.