[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] readable error messages on non-ASCII systems

From: Mark Phippard <markphip_at_gmail.com>
Date: Wed, 12 May 2010 15:09:48 -0400

You should look at this now deleted branch for Subversion on EBCDIC systems.

http://svn.apache.org/viewvc/subversion/branches/ebcdic/?pathrev=876004

It was written for AS/400, not z/OS but examining the diff of this
branch and what it was synched with will help get an idea on where to
look.

The branch was deleted in r876005. So be sure to use a peg-revision
of 876004 when looking at the repository.

On Wed, May 12, 2010 at 2:59 PM, Greg Ames <ames.greg_at_gmail.com> wrote:
> ooops, I meant to cc: dev at subversion.  Thanks, Daniel.  I'm more used to
> dev at httpd where there is some magic involving replyto:
>
> Here is what I see if I run dbx against svnversion with a stop set in
> svn_cmdline_fputs:
>
> [2] stopped in svn_cmdline_fputs at line 288 in file "cmdline.c"  ($t1)
>  288   svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t
> *pool)
> (dbx64) where
> svn_cmdline_fputs(string = "svn: Valid UTF-8 data.(hex: 2e 61 4b).followed
> by invalid UTF-8 sequence.(hex: a2 a5 95 61).", stream = 0xD6C84F8, pool =
> 0xE4E4040), line 288 in "cmdline.c"
> svn_cmdline_fprintf(stream = 0xD6C84F8, pool = 0xE4E4040, fmt = "%s%s.",
> ...), line 284 in "cmdline.c"
> print_error.$b20, line 332 in "error.c"
> print_error(err = 0xE4E4078, stream = 0xD6C84F8, prefix = "svn: "), line 332
> in "error.c"
> svn_handle_error2.$b25.$b29, line 405 in "error.c"
> svn_handle_error2.$b25, line 405 in "error.c"
> svn_handle_error2(err = 0xE4E4078, stream = 0xD6C84F8, fatal = 0, prefix =
> "svn: "), line 405 in "error.c"
> main.$b17.$b18, line 216 in "main.c"
> main.$b17, line 216 in "main.c"
> main(argc = 1, argv = 0xDF76878), line 216 in "main.c"
>
> Any strings that are printable with z/OS dbx "where" or "p" are in the
> native EBCDIC encoding.  If I stop in svn_handle_error2 and print the err
> structure, I see:
>
> (dbx64) p *err
> (apr_err = 121, message = "Valid UTF-8 data.(hex: 2e 61 4b).followed by
> invalid UTF-8 sequence.(hex: a2 a5 95 61)", child = 0x0, pool = 0xE4E4040,
> file = "./subversion/libsvn_subr/utf.c", line = 632)
>
> ...so we have native strings here which never become UTF-8.  I could patch
> print_error() to do the UTF-8 conversion prior to calling
> svn_cmdline_fputs(), but the back-to-back conversions seem silly.  Maybe it
> would be better to define svn_cmdline_fputs_native_cstring() or some such
> and call that from print_error() and any other caller that passes native
> strings.
>
> Greg
>
> On Wed, May 12, 2010 at 10:42 AM, Greg Ames <ames.greg_at_gmail.com> wrote:
>
>>
>>
>> On Wed, May 12, 2010 at 12:59 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name>wrote:
>>
>>> Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400:
>>> > The error messages are in the native code page to start with, so running
>>> > them through a UTF-8 -> native conversion doesn't do anything helpful.
>>> >
>>> ...
>>> > Index: subversion/libsvn_subr/cmdline.c
>>> > ===================================================================
>>> > --- subversion/libsvn_subr/cmdline.c    (revision 943316)
>>> > +++ subversion/libsvn_subr/cmdline.c    (working copy)
>>> > @@ -318,24 +318,15 @@
>>> >  svn_error_t *
>>> >  svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
>>> >  {
>>> > -  svn_error_t *err;
>>> > -  const char *out;
>>> > +  /* "string" is native.  do not try to convert from UTF-8 */
>>>
>>> The doc string of this function (see subversion/include/svn_cmdline.h)
>>> specifically promises that it'll do conversion from UTF-8.
>>
>>
>> ok, but
>>
>> a) that's not appropriate for error messages
>> b) it's not enforced.
>>
>>
>>> We cannot make it unconditionally do the opposite.
>>
>>
>> I have done exactly that with good results
>>
>>
>>> (Perhaps with suitable #ifdef's we could do it; or perhaps your problem
>>> can be fixed elsewhere (e.g., the error-printing code).)
>>>
>>>
>> The SVN_ERR() macro and supporting functions produce native strings, not
>> UTF-8, and they are widely used.
>>
>>
>>> Is your issue only with the encoding of error messages?
>>
>>
>> This patch addresses only the encoding of error messages. There are a few
>> other places where there is confusion about the encoding of input or
>> literals.
>>
>>
>>> Or with the the encoding of all svn output?
>>>
>>
>> I think it's a great idea to have svn metadata and text files in the
>> repository in UTF-8 to promote universal access.  But error messages are
>> local and shouldn't be munged much or Bad Things can happen.  Yes, someone
>> could inject code after SVN_ERR() to convert all the literal strings and
>> characters in error messages throughout subversion to UTF-8.  But what's the
>> point of doing that then converting it back to native to write to stderr?
>> And what are the odds of picking up 100% of the literal strings and
>> characters and doing exactly one UTF-8 conversion on all of them prior to
>> calling svn_cmdline_fputs()?  Simplicity is good, especially in error
>> situations, and it saves a few cycles on non-UTF-8 systems.
>>
>> Greg
>>
>

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
Received on 2010-05-12 21:10:20 CEST

This is an archived mail posted to the Subversion Dev mailing list.