On Fri, 11 Jun 2004, Pierre THIERRY wrote:
> No need of my 'knob' if we have a magical way of knowing when and what
> to encode without any hint from the user. I'm pretty impatient of
> discovering the guessing algorithm...
>
I don't know what guessing algorithm you are referring to. I'm not talking
about guessing anything. You should read up on IRIs. There is a draft for
an RFC by Martin Dúrst. I don't have the URL handy, but you should be able
to find it via W3C or IETF.
As I explained earlier, to support IRIs, we just need to %-escape bytes
between 0x80 and 0xff. We already have the input in UTF-8. ONe concern is
that the IRI draft specifies that when converting from other encodings to
UTF-8, Unicode Normalization form C. I don't know if apr-iconv can
guarantee this. In any case, we can't do this normalization in the IRI ->
URI function, since the spec also says that if the input already is in an
Unicode encoding (which it is if the user uses an UTF-8 locale, for
example), we shall not do any normalization. But this isn't about
guessing, so let's leave it for now. So, IRI support is straight-forward.
Then, I also proposed to automatically escape some characters that are
always illegal in an URI. For example, space is illegal, so if we
encounter a space, we can just encode it as %20. Then, I suggested that we
can be more smart than that. For example, when we are in the path
component, [] are *not* reserved (they are in the eariler parts, for IPv6
addresses). This is no guessing, since we can always parse the URI. But it
is more complex, and may seem less predictable to the user, so whether
this is desirable can be debated further.
Maybe this clears things out, or I (and Greg) have missed something.
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 11 14:26:36 2004