ooops, I meant to cc: dev at subversion. Thanks, Daniel. I'm more used to
dev at httpd where there is some magic involving replyto:
Here is what I see if I run dbx against svnversion with a stop set in
svn_cmdline_fputs:
[2] stopped in svn_cmdline_fputs at line 288 in file "cmdline.c" ($t1)
288 svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t
*pool)
(dbx64) where
svn_cmdline_fputs(string = "svn: Valid UTF-8 data.(hex: 2e 61 4b).followed
by invalid UTF-8 sequence.(hex: a2 a5 95 61).", stream = 0xD6C84F8, pool =
0xE4E4040), line 288 in "cmdline.c"
svn_cmdline_fprintf(stream = 0xD6C84F8, pool = 0xE4E4040, fmt = "%s%s.",
...), line 284 in "cmdline.c"
print_error.$b20, line 332 in "error.c"
print_error(err = 0xE4E4078, stream = 0xD6C84F8, prefix = "svn: "), line 332
in "error.c"
svn_handle_error2.$b25.$b29, line 405 in "error.c"
svn_handle_error2.$b25, line 405 in "error.c"
svn_handle_error2(err = 0xE4E4078, stream = 0xD6C84F8, fatal = 0, prefix =
"svn: "), line 405 in "error.c"
main.$b17.$b18, line 216 in "main.c"
main.$b17, line 216 in "main.c"
main(argc = 1, argv = 0xDF76878), line 216 in "main.c"
Any strings that are printable with z/OS dbx "where" or "p" are in the
native EBCDIC encoding. If I stop in svn_handle_error2 and print the err
structure, I see:
(dbx64) p *err
(apr_err = 121, message = "Valid UTF-8 data.(hex: 2e 61 4b).followed by
invalid UTF-8 sequence.(hex: a2 a5 95 61)", child = 0x0, pool = 0xE4E4040,
file = "./subversion/libsvn_subr/utf.c", line = 632)
...so we have native strings here which never become UTF-8. I could patch
print_error() to do the UTF-8 conversion prior to calling
svn_cmdline_fputs(), but the back-to-back conversions seem silly. Maybe it
would be better to define svn_cmdline_fputs_native_cstring() or some such
and call that from print_error() and any other caller that passes native
strings.
Greg
On Wed, May 12, 2010 at 10:42 AM, Greg Ames <ames.greg_at_gmail.com> wrote:
>
>
> On Wed, May 12, 2010 at 12:59 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name>wrote:
>
>> Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400:
>> > The error messages are in the native code page to start with, so running
>> > them through a UTF-8 -> native conversion doesn't do anything helpful.
>> >
>> ...
>> > Index: subversion/libsvn_subr/cmdline.c
>> > ===================================================================
>> > --- subversion/libsvn_subr/cmdline.c (revision 943316)
>> > +++ subversion/libsvn_subr/cmdline.c (working copy)
>> > @@ -318,24 +318,15 @@
>> > svn_error_t *
>> > svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
>> > {
>> > - svn_error_t *err;
>> > - const char *out;
>> > + /* "string" is native. do not try to convert from UTF-8 */
>>
>> The doc string of this function (see subversion/include/svn_cmdline.h)
>> specifically promises that it'll do conversion from UTF-8.
>
>
> ok, but
>
> a) that's not appropriate for error messages
> b) it's not enforced.
>
>
>> We cannot make it unconditionally do the opposite.
>
>
> I have done exactly that with good results
>
>
>> (Perhaps with suitable #ifdef's we could do it; or perhaps your problem
>> can be fixed elsewhere (e.g., the error-printing code).)
>>
>>
> The SVN_ERR() macro and supporting functions produce native strings, not
> UTF-8, and they are widely used.
>
>
>> Is your issue only with the encoding of error messages?
>
>
> This patch addresses only the encoding of error messages. There are a few
> other places where there is confusion about the encoding of input or
> literals.
>
>
>> Or with the the encoding of all svn output?
>>
>
> I think it's a great idea to have svn metadata and text files in the
> repository in UTF-8 to promote universal access. But error messages are
> local and shouldn't be munged much or Bad Things can happen. Yes, someone
> could inject code after SVN_ERR() to convert all the literal strings and
> characters in error messages throughout subversion to UTF-8. But what's the
> point of doing that then converting it back to native to write to stderr?
> And what are the odds of picking up 100% of the literal strings and
> characters and doing exactly one UTF-8 conversion on all of them prior to
> calling svn_cmdline_fputs()? Simplicity is good, especially in error
> situations, and it saves a few cycles on non-UTF-8 systems.
>
> Greg
>
Received on 2010-05-12 20:59:50 CEST