=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <brane@xbc.nu> writes:
> I disagree strongly. First, this "denormalized" representation is not
> valid UTF-8.And second, looking for a two-byte sequence is a pain.
But that's just the thing. You don't have to look for it. You still
look for the single octet 0x2f as the directory separator. The
escaping of non-path-separating slashes into a two-byte sequence is
precisely so that you _don't_ find them when looking for path
separators. See?
And that it's not "valid UTF-8" per se is also a plus, it means that
we won't accidentaly get that sequence when encoding something else.
It's only used internally anyway. And as I said, Java does the same
thing.
> Now, if you want to do tricks like that: there are two single bytes
> that are invalid in UTF-8: these are 0xfe (11111110) and 0xff
> (11111111), and they also happen to cooperate quite happily with the C
> string functions. We could use one of those as the canonical path
> separator.
Yes, but I don't see why it would be better. In my opinion, a
sequence that actually decodes as '/' would be the natural choice.
Are we concerned about saving one byte of memory somewhere?
> I hope you do realize, of course, that you can't have '/'s in paths
> anyway, because we still have to be able to generate valid URLs, and
> you can't replace the path separtor there.
"foo%2fbar" is a valid (relative) URL for a path with '/' in it.
It can be observed that the ftp scheme is (or at least was, I haven't
checked in the latest RFCs) specified such that a request for
ftp://ftp.example.com/foo%2fbar/hi%2fho/away%2fwe_go;type=i
should have the operational semantics of
1) connect to ftp.example.com (and log in anonymously)
2) cd to "foo/bar"
3) cd to "hi/ho"
4) get "away/we_go" (in binary mode)
which is pretty much isomorphic to what we'd want for filenames with
'/' in them ("cd to" would correspond be "follow the tree one level
downwards along an arbitrarily named edge"; these operations would be
carried out by the server rather than the client).
// Marcus
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 29 21:50:18 2002