[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Encoding in our APIs

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2005-05-02 11:40:08 CEST

On Mon, 2 May 2005, [UTF-8] Branko Ä^Libej wrote:

> Peter N. Lundblad wrote:
>
> >On Sun, 1 May 2005, [UTF-8] Branko �^Libej wrote:
> >
> >
> >
> >But, when I have your attention;), could you clarify the difference
> >between the output encoding and the locale encoding on Windows? For our
> >normal output, such as messages, we use the output encoding (i.e. console
> >code page). Is that what oyou want to do for diffs as well?
> >
> >
> This is a tough question, and I'm not sure I know the right answer.
>
[snip]

Thansk for the explanation; it confirms my guesses regarding thi matter.

> Second, even though the encoding used for console I/O can be changed at
> runtime, Subversion does not do that. In an earlier version, we used to
> change the console encoding to be the same as the ANSI encoding, but
> that turned out to be a bad idea because the SVN command-line doesn't
> "own" the console, and the encoding used by a particular console window
> isn't specific to the running process, i.e., if you change it in an svn
> command, it doesn't revert to the previous value after the command has
> completed. That was especially embarrassing when Subversion was used
> from a cygwin shell, which has its own ideas about what the console
> encoding is supposed to be...
>
what happens when I/O is redirected? Does it still output in the console
encoding?

> The really big problem for "svn diff" is that, unlike most other
> commands, it produces output before the command-line client has a chance
> to convert it (but you already know that :). In most cases, the internal
> (usually UTF-8) strings are converted to whatever the console encoding
> is inside the svn_cmdline_printf functions. We can't do this with a diff
> stream (and I suspect blame has similar problems).
>
I'm changing svn_diff_file_output_unified to taken an header_encoding
argument. So the problem is more like, if you say
svn diff > patchfile.diff
you could expect the headders to be in the native encoding (or the file's
encoding if that's known), but if you do
svn diff | more
you probably want headers to be in the console encoding. But then the
encoding is inconsistent wit the file's encoding (unless it is the console
codepage, which seems uncommon if it is the old DOS 8bit encodings at
least). A similar problem should exist for all our commands on Windows.

Maybe this isn't a very big deal after all. Most people on Windows is use
a GUI, which will want consistent encodings. Maybe we should just use the
locale encoding and later use the file's encoding if that's known.

I don't know, since I don'"t know how this afects people on Windows in
reality. What I *do* know, however, is that just outputting UTF8 is wrong
and I'd like to fix that.

Any input from Windows people is appreciated.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon May 2 11:33:15 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.