Re: [RFC] Canonical Paths

From: Branko ÄŒibej <brane_at_xbc.nu>
Date: 2002-08-29 20:37:12 CEST

Marcus Comstedt wrote:

>Here comes the trick. Notice that this range includes the range
>[0 .. 127], the ASCII characters. (In fact all UTF-8 multibyte
>escapes have a range which includes ASCII, since they all start at 0.)
>That is, although an ASCII character such as '/' is normally encoded
>as its ASCII representation (00101111), we could instead encode it as
>11000000 10101111, which would then be a kind of _escaped_ '/',
>distinguishable from (in fact completely unrelated to if you just look
>at single octets) a normal '/' used as path separator. In the same
>way, we could encode the problematic NUL character as 11000000
>10000000. In fact, this is exactly what Java does to NUL characters
>when storing them in UTF-8 strings, so there exists a precedent of
>using a scheme like this.
>

I disagree strongly. First, this "denormalized" representation is not
valid UTF-8.And second, looking for a two-byte sequence is a pain.

Now, if you want to do tricks like that: there are two single bytes that
are invalid in UTF-8: these are 0xfe (11111110) and 0xff (11111111), and
they also happen to cooperate quite happily with the C string functions.
We could use one of those as the canonical path separator.

I hope you do realize, of course, that you can't have '/'s in paths
anyway, because we still have to be able to generate valid URLs, and you
can't replace the path separtor there.

-- 
Brane ÄŒibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Aug 29 20:38:08 2002

This message: [ Message body ]
Next message: kevin_at_pilch-bisson.net: "Strange windows problems."
Previous message: cmpilato_at_collab.net: "Re: annotate vs blame"
In reply to: Marcus Comstedt: "Re: [RFC] Canonical Paths"
Next in thread: Marcus Comstedt: "Re: [RFC] Canonical Paths"
Reply: Marcus Comstedt: "Re: [RFC] Canonical Paths"
Reply: Marcus Comstedt: "Re: [RFC] Canonical Paths"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]