On Saturday 13 December 2003 16:44, John Szakmeister wrote:
> On Saturday 13 December 2003 12:27, Philip Martin wrote:
> > John Szakmeister <john@szakmeister.net> writes:
> > > Index: subversion/libsvn_client/blame.c
> > > ===================================================================
> > > --- subversion/libsvn_client/blame.c (revision 7978)
> > > +++ subversion/libsvn_client/blame.c (working copy)
> > > @@ -378,7 +378,9 @@
> > >
> > > SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> > >
> > > - lmb.path = url + strlen (reposURL);
> > > + /* Convert path from URI to UTF-8 before placing it in the baton */
> > > + lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
> >
> > Hmmm...
> >
> > My first instinct was "the comment is redundant" since it duplicates
> > the documented behaviour of the function. Then I realised that the
> > documentation doesn't mention UTF-8, so I started thinking about the
> > URL. It's already UTF-8 isn't it? Whether the decoded URL is UTF-8
> > depends on what characters are URI encoded in the URL, where do we
> > guarantee that the URL contains URI encoded UTF-8?
>
> Philip, I have to say you have an amazing ability to go through and
> validate *everything*. :-) Didn't even cross my mind to even try the
> things you did.
>
> > Next I tried a few things
> >
> > $ svnadmin create repo
> > $ svn import Makefile http://localhost:8888/obj/repo/%c3%a9 -m ""
> > $ LANG=en_GB svn ls http://localhost:8888/obj/repo
> > é
> > $ svn blame http://localhost:8888/obj/repo/%c3%a9
> > ../svn/subversion/libsvn_client/blame.c:308: (apr_err=20014)
> > svn: Missing changed-path information for revision 1 of '%a9'
> > $ LANG=en_GB svn ls file://`pwd`/repo/%c3%a9
> > ../svn/subversion/libsvn_client/ls.c:144: (apr_err=160013)
> > svn: URL non-existent in that revision.
>
> The problem above is due to the fact that log_message_receiver() is
> comparing URI encoded paths against non-URI encoded ones. The keys for the
> changed_paths hash where inserted in a non-URI format. The patch will fix
> this problem, but not the following ones. :-)
>
> > Rather alarmingly, I can get non-URI encoded paths into the repository
>
> Oof, I don't like the sound of that.
>
> > $ svn import Makefile http://localhost:8888/obj/repo/%e9 -m ""
> > $ svnadmin dump -q repo | grep Node-path
> > Node-path: é
> >
> > which cause ra_dav to produce an error
> >
> > LANG=en_GB svn ls http://localhost:8888/obj/repo/
> > ../svn/subversion/libsvn_ra_dav/util.c:661: (apr_err=175002)
> > svn: PROPFIND request failed on '/obj/repo/!svn/bc/2'
> > ../svn/subversion/libsvn_ra_dav/util.c:647: (apr_err=175002)
> > svn: The PROPFIND request returned invalid XML in the response: XML parse
> > error at line 28: Bytes: 0xE9 0x22 0x3C 0x2F .. (/obj/repo/!svn/bc/2)
> >
> > and ra_local to enter an infinte loop
> >
> > LANG=en_GB svn ls file://`pwd`/repo
>
> Wow, can't say I like this either. Any recommendations on how we should
> solve this problem? I saw the discussion about performing UTF-8 encoding,
> and *then* URI encoding. But how are we to validate something like
> 'http:// localhost:8888/obj/repo/%e9'. Do we need to URI decode it, UTF8
> encode, and the URI encode it again? Who should be responsible for doing
> this? The command line client, or client library?
Well, I made a patch to try and fix this problem and discovered something
interesting, but not too suprising. I modified
svn_opt_args_to_target_array() to URI-decode, UTF-8 convert, and URI-encode
the strings... yeah, that didn't work. We need a way to just verify that the
string is UTF-8. Trying to convert a string that is already UTF-8 just
resulted in it being converted more based on my current locale (which is not
UTF-8), and this is not what we want. Any ideas on how to write such a
function?
-John
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Dec 15 10:57:26 2003