[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: URI encoding URLs on the cmdline?

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2004-06-06 18:57:53 CEST

On Sun, 6 Jun 2004, Greg Hudson wrote:

> On Sun, 2004-06-06 at 08:56, Jim Correia wrote:
> > The rule would have to be more strict than "you can use any character
> > you want in a URI
> > on the command line except for '%'". The rule would have to apply to
> > all URI reserved characters, not just % since there is no way to know
> > whether the author of the URI intended for the character to be escaped
> > or not. In fact RFC 2396 says that one needs to consider URIs always
> > escaped (2.4.2) for this very reason.
>
> We're mostly talking about the path part here, where only a few reserved
> characters are disallowed. But you're right, characters like '?' and
> '#' are also at issue, not just '%'.
>
> I remain confident that we would be doing our users a service by
> auto-escaping international characters in URLs, even if we remain picky

I fully aggree. Try using a non-ASCII charset for a while. Even if I know
how to URI-encode my räksmörgås in principal, I don't have an
UTF-8-encoder built in to my head.

To make things simple, we could say:
Use every character you want, but if a pathname component contains one of
;?:@&=+$%, (/ excluded for obvious reasons)
you have to escape it using %hh. If you don't trust us in escaping your
characters, you are free to do it yourself.

This is backwards compatible and more user friendly. The above nine
characters should be quite uncommon in our URLs, but they can still be
used if necessary.

As Greg points out, for the path component, we could remove :@&+$, leaving
just ; (parameter
info), ? (delimits query string, which we don't use, but anyway) and =
(which is also reserved). But that's not necessary and would be
backwrads-compatible if we want to do it in the future. This is according
to 3.3. in RFC2396.

We could be more smart, like escaping an percent sing not followed by two
hex digits, but *this* is where the rules start to get confusing.

I *really* can't see the problem with this at all. If we take I18N
seriously, we need to do this IMO.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jun 6 18:47:11 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.