[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Canonical Paths

From: Branko Čibej <brane_at_xbc.nu>
Date: 2002-08-29 20:37:12 CEST

Marcus Comstedt wrote:

>Here comes the trick. Notice that this range includes the range
>[0 .. 127], the ASCII characters. (In fact all UTF-8 multibyte
>escapes have a range which includes ASCII, since they all start at 0.)
>That is, although an ASCII character such as '/' is normally encoded
>as its ASCII representation (00101111), we could instead encode it as
>11000000 10101111, which would then be a kind of _escaped_ '/',
>distinguishable from (in fact completely unrelated to if you just look
>at single octets) a normal '/' used as path separator. In the same
>way, we could encode the problematic NUL character as 11000000
>10000000. In fact, this is exactly what Java does to NUL characters
>when storing them in UTF-8 strings, so there exists a precedent of
>using a scheme like this.

I disagree strongly. First, this "denormalized" representation is not
valid UTF-8.And second, looking for a two-byte sequence is a pain.

Now, if you want to do tricks like that: there are two single bytes that
are invalid in UTF-8: these are 0xfe (11111110) and 0xff (11111111), and
they also happen to cooperate quite happily with the C string functions.
We could use one of those as the canonical path separator.

I hope you do realize, of course, that you can't have '/'s in paths
anyway, because we still have to be able to generate valid URLs, and you
can't replace the path separtor there.

Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 29 20:38:08 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.