[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn blame and filenames with non-ascii chars

From: John Szakmeister <john_at_szakmeister.net>
Date: 2003-12-13 22:44:45 CET

On Saturday 13 December 2003 12:27, Philip Martin wrote:
> John Szakmeister <john@szakmeister.net> writes:
> > Index: subversion/libsvn_client/blame.c
> > ===================================================================
> > --- subversion/libsvn_client/blame.c (revision 7978)
> > +++ subversion/libsvn_client/blame.c (working copy)
> > @@ -378,7 +378,9 @@
> >
> > SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> >
> > - lmb.path = url + strlen (reposURL);
> > + /* Convert path from URI to UTF-8 before placing it in the baton */
> > + lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
>
> Hmmm...
>
> My first instinct was "the comment is redundant" since it duplicates
> the documented behaviour of the function. Then I realised that the
> documentation doesn't mention UTF-8, so I started thinking about the
> URL. It's already UTF-8 isn't it? Whether the decoded URL is UTF-8
> depends on what characters are URI encoded in the URL, where do we
> guarantee that the URL contains URI encoded UTF-8?

Philip, I have to say you have an amazing ability to go through and validate
*everything*. :-) Didn't even cross my mind to even try the things you did.

> Next I tried a few things
>
> $ svnadmin create repo
> $ svn import Makefile http://localhost:8888/obj/repo/%c3%a9 -m ""
> $ LANG=en_GB svn ls http://localhost:8888/obj/repo
> é
> $ svn blame http://localhost:8888/obj/repo/%c3%a9
> ../svn/subversion/libsvn_client/blame.c:308: (apr_err=20014)
> svn: Missing changed-path information for revision 1 of '%a9'
> $ LANG=en_GB svn ls file://`pwd`/repo/%c3%a9
> ../svn/subversion/libsvn_client/ls.c:144: (apr_err=160013)
> svn: URL non-existent in that revision.

The problem above is due to the fact that log_message_receiver() is comparing
URI encoded paths against non-URI encoded ones. The keys for the
changed_paths hash where inserted in a non-URI format. The patch will fix
this problem, but not the following ones. :-)

> Rather alarmingly, I can get non-URI encoded paths into the repository

Oof, I don't like the sound of that.

> $ svn import Makefile http://localhost:8888/obj/repo/%e9 -m ""
> $ svnadmin dump -q repo | grep Node-path
> Node-path: é
>
> which cause ra_dav to produce an error
>
> LANG=en_GB svn ls http://localhost:8888/obj/repo/
> ../svn/subversion/libsvn_ra_dav/util.c:661: (apr_err=175002)
> svn: PROPFIND request failed on '/obj/repo/!svn/bc/2'
> ../svn/subversion/libsvn_ra_dav/util.c:647: (apr_err=175002)
> svn: The PROPFIND request returned invalid XML in the response: XML parse
> error at line 28: Bytes: 0xE9 0x22 0x3C 0x2F .. (/obj/repo/!svn/bc/2)
>
> and ra_local to enter an infinte loop
>
> LANG=en_GB svn ls file://`pwd`/repo

Wow, can't say I like this either. Any recommendations on how we should solve
this problem? I saw the discussion about performing UTF-8 encoding, and
*then* URI encoding. But how are we to validate something like 'http://
localhost:8888/obj/repo/%e9'. Do we need to URI decode it, UTF8 encode, and
the URI encode it again? Who should be responsible for doing this? The
command line client, or client library?

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Dec 13 22:40:16 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.