[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Canonical Paths

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-08-29 21:49:00 CEST

=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <brane@xbc.nu> writes:

> I disagree strongly. First, this "denormalized" representation is not
> valid UTF-8.And second, looking for a two-byte sequence is a pain.

But that's just the thing. You don't have to look for it. You still
look for the single octet 0x2f as the directory separator. The
escaping of non-path-separating slashes into a two-byte sequence is
precisely so that you _don't_ find them when looking for path
separators. See?

And that it's not "valid UTF-8" per se is also a plus, it means that
we won't accidentaly get that sequence when encoding something else.
It's only used internally anyway. And as I said, Java does the same
thing.

> Now, if you want to do tricks like that: there are two single bytes
> that are invalid in UTF-8: these are 0xfe (11111110) and 0xff
> (11111111), and they also happen to cooperate quite happily with the C
> string functions. We could use one of those as the canonical path
> separator.

Yes, but I don't see why it would be better. In my opinion, a
sequence that actually decodes as '/' would be the natural choice.
Are we concerned about saving one byte of memory somewhere?

> I hope you do realize, of course, that you can't have '/'s in paths
> anyway, because we still have to be able to generate valid URLs, and
> you can't replace the path separtor there.

"foo%2fbar" is a valid (relative) URL for a path with '/' in it.
It can be observed that the ftp scheme is (or at least was, I haven't
checked in the latest RFCs) specified such that a request for

ftp://ftp.example.com/foo%2fbar/hi%2fho/away%2fwe_go;type=i

should have the operational semantics of

1) connect to ftp.example.com (and log in anonymously)
2) cd to "foo/bar"
3) cd to "hi/ho"
4) get "away/we_go" (in binary mode)

which is pretty much isomorphic to what we'd want for filenames with
'/' in them ("cd to" would correspond be "follow the tree one level
downwards along an arbitrarily named edge"; these operations would be
carried out by the server rather than the client).

   // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 29 21:50:18 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.