[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [TSVN] url field in .svn/entries file is not %-escaped for non-ASCII chars

From: Hiroharu Tamaru <tamaru_at_myn.rcast.u-tokyo.ac.jp>
Date: 2005-05-13 19:10:46 CEST

That was so quick again.

At Fri, 13 May 2005 15:03:39 +0200, SteveKing wrote:

> > As I mentioned before, I am using https:// repository. In
> > the repo-browser, when I right click on a file and select
> > "open" from the context menu, a web browser is launched and
> > the corresponding URL is opened.
> >
> > But, this URL has wrong encoding, at least in Japanese MS
> > Windows XP environment. The actual encoding used in the
> > URL is the native encoding (Shift JIS), whereas the
> > repository expects utf-8. So it fails with "404 Not Found".
> > The %-escaping is being performed properly, by the way.
> [snip]
>
> So basically you say that a browser needs the URL encoded the same way
> as Subversion.
Yes.

> Last time I tried, even un-encoded URL worked when
> they're entered in the URL bar in a browser - seems they don't work when
> passed programmatically.

Well, I didn't test it throughly this time, but if I
remembered correctly, it depended on the web browser and/or
version. If the browser decides to convert raw native URL
string to raw utf-8 URL string before making a query to the
web server, then it works, so in that sense, you are right,
but this assumption is dependant on many things.

For example, if I prepare an HTML page with a raw utf-8 URL
link to the repository, then it works with firefox 1.0.4,
BUT ONLY IF the View->CharacterEncoding menu is set to utf-8
before clicking on the link (which practically means that
the whole page have to be in utf-8). Preparing the link in
other encodings (euc-JP, SJIS, ...) definitely fail as well
(with firefox 1.0.4). So it is difficult to get it right
without pre-%-escaping it.

> Should be fixed in revision 3339.

Thank you! I'll test it with the next nightly.

> > One thought that came to me while writing this mail:
> > It'd be nice if there is a "Copy URL" feature that puts the
> > URL string to the paste buffer for files and directories in
> > http:// repositories in repo-browser.
>
> Done in revision 3341.

Wow, that's nice! Will test it too.

Now I noticed: there could be a very ugly corner case.
Imagine an URL: http://example.com/XXXX/svn/YYY/ZZZ.txt
where http://example.com/XXXX/svn/ is the repository root
and YYY/ZZZ.txt is the structure inside,
and both XXX and YYY are non-ASCII.
Then I assume the encoding of YYY depends on mod_dav_svn
(which is utf-8), but that of XXX depends on httpd or the
local filesystem (which COULD be something other than
utf-8). So, chances are that it could require a mixed
encoded URL whether escaped or raw! ... Ugly.

As far as I am aware, using natively encoded non-ASCII
strings for XXX, or any other normal URL, is a very rare
practice, at least in Japan (but it exists). I assume it is
because it has the same encoding dilemma with non-escaped
link URL strings as mentioned above.

The above also applies to subversion itself, and it is
really not a Subvesion nor TortoiseSVN problem, but since,
for example, IDNs (Internationalized Domain Name; encoding
scheme for putting non-ASCII chars in the hostnames,
RFC3490) have already started being used, and you cannot
%-escape an IDN in utf-8, it could become a practical issue
sometime soon.

I am not requesting anything at this time, mind you; I
mentioned it just for the sake of record.

Thank you again for your efforts.

-- 
Hiroharu Tamaru
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Fri May 13 19:27:49 2005

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.