[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: svn commit: r35746 - in trunk/subversion: include libsvn_subr

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 09 Feb 2009 20:11:06 +0000

Bert Huijben wrote:
> > -----Original Message-----
> > From: Greg Stein [mailto:gstein_at_gmail.com]
> >
> > > Author: rhuijben
> > > Date: Sun Feb 8 16:15:58 2009
> > > New Revision: 35746
> > >
> > > Log:
> > > Following up on r35743, fix whitespace and move encoding comment to
> > > header file.
> > >
> > > * subversion/include/svn_io.h
> > > (svn_io_run_diff, svn_io_run_diff3_2):
> > > Document that diff_cmd must be cstring encoded.
> >
> > Woah. That is dangerous for our public APIs. All of our public APIs
> > are supposed to be UTF-8. That's the primary reason for all the
> > svn_io_* wrappers around APR: make them all UTF-8.
> I just documented the actual implementation. (Didn't change a letter in that
> function).
> I was just very surprised that it converted the passed arguments to utf8
> before passing it to some other routines.. then converted it back and
> finally passed it back to apr.
> (And I agree that this entire api should be rev'ed to make it use apr)
> > Second, "cstring" doesn't tell me anything. Our utf-8-encoded strings
> > are cstrings, too. You mean something more like "native, local-style"
> > or something like that. I've seen that in headers before.

The wording used by svn_path_cstring_to_utf8() is "the internal encoding
used by APR".

But... The callers ultimately get the string from svn_config_get().
Neither these callers nor the entire "config" subsystem make any mention
of "UTF" at all, one way or the other.

As I understand it, Subversion's coding policy is to assume that all
strings are UTF-8 unless documented otherwise, although I don't see that
written explicitly in "hacking.html". If so, it looks there is a big
hole in the documentation, from the entire "config" subsystem, all the
way through a chain of function calls and batons down to these two
functions that ultimately consume the "diff_cmd" or "diff3_cmd" string.

See also this message on the topic of which strings in Subversion config
files are UTF8 and which are native, in response to a question on
whether we should standardize:

- Julian

> > But I still think it's bad precedent to expose this in the API.
> I think it was officially exposed with 1.0 :(

Received on 2009-02-09 21:11:39 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.