I'm considering some improvements to the "dirent/uri" API.
Lieven and Bert made some good moves this year towards untangling the
old "svn_path" APIs that tried to support all kinds of paths the same.
We needed to separate out the handling of local disk paths from the
handling of URLs, because even with our "Subversion internal form" they
still need to follow different rules. So came the "dirent/URI" API.
A good part of the concept is that URIs and native paths have to be
treated separately, but both kinds end with a series of path
"components" that can be added on, taken off, or copied from a native
path to a URL or from a URL to a native path. The functions for adding
and subtracting relpaths to and from each kind of path are (in a sense)
the core of the API.
I did some thinking on the way home from SubConf. If we were writing the
code in a high-level language with a good cross-platform library of
support for URLs/URIs and operating-system native paths, how would we
expect it to behave? Let's try to define and provide the high-level
behaviour that we would like.
There still seems to be lack of crispness about the new path kinds in
the new "svn_dirent_uri" API.
One meta-comment is that I feel the low-level path and URL functions
should be coming straight from APR (or other such support library)
whereas we seem to have written many of our own.
A "relpath" represents "an unrooted path that can be joined to any other
relative path, uri or dirent". Good, but let's specify it more
precisely. The terms "absolute" and "relative" are not clearly defined
when applied to partially-relative paths such as a Windows
"\rel-to-current-drive" or a URL "/rel-to-server".
A "dirent" represents a native operating-system path... but let's be
clear exactly what kinds of absolute and relative path this includes.
The representation seems a bit odd, using Subversion's "canonical path"
rules ("/" separator, etc.), rather than the native form, and so
requiring "to_internal_style" and "to_native_style" conversions. That is
a legacy from trying to use a single set of path functions for all kinds
of paths. There are certain benefits, mostly to do with being able to
contruct paths manually by writing "foo/bar" in tests and so on. I don't
think we necessraily need to change this but it seems like we might be
making problems by trying to store native paths in a non-native form.
And I don't much care for the name "dirent" :-) To me, "directory
entry" implies a single path component, and also implies status info
about the directory entry.
A "uri" in this API represents an "absolute path that starts with a '/'
or a schema definition"... which is gratuitously specialized, compared
with the official definition of a URI.
We use URLs a lot and rarely need to use more general URIs, so I think
the API should be geared specifically to URLs.
It is not clear whether the representation of a URI is URI-encoded. The
API should make a clear promise. I think it should be, both because
that's a valuable part of the utility of a URL API, and because it seems
unlikely to be possible to fully support URL manipulations without them
being URI-encoded. (The sort of thing that springs to mind that simply
would not work is if my password contains a "/" and I try to represent
"http://username:firstname.lastname@example.org/" without URI-encoding.)
These are some changes I'd like to make. Comments solicited.
* A RELPATH should represent a generic "path", not tied to being
interpreted as a URL path or an OS path, but freely able to be
interpreted as either. To convert a RELPATH to a relative URL or a
relative native path (dirent), we should always call an appropriate API
function, even if that doesn't change its in-memory representation. This
will allow us to decouple the representation from the API. (Actually
making such a change would in some places require extra function calls,
and so if there are more than a very few such places we may not want to
do it until we anticipate a particular benefit.)
* A RELPATH should represent always a forward path, with no
back-segments (".."), because a forward path is a nice clean concept and
is all we need to convert between Subversion local disk paths and
Subversion URLs. There will of course be some API available for
interpreting a relative path that might contain "..", but I think such a
path is nearly always user input, and we always know whether it is a URL
or a native OS path, and its interpretation always involves high-level
decisions about how to handle the "going too far back" case and the
ambiguity of whether "../foo" == ".". Therefore that interpretation
should be outside the scope of the defined "RELPATH" concept.
* Define and name functions for URLs instead of URIs.
* The representation of a URL should be always URI-encoded.
* A URL shall be defined either as a full URL starting with a scheme, or
as an RFC-defined relative URL. Either definition would be better than
the current specification that it must start with a scheme or with "/".
NATIVE OPERATING-SYSTEM PATHS:
* Specify that an OSPATH can be absolute or relative or partially
relative, and that "relative" doesn't mean necessarily relative to the
process's CWD/current drive but relative to whatever its user wants it
to be relative to. Therefore, in convert-to-abs functions, the caller
should be able to specify (or the function doc string should state) what
it's relative to.
* (Advanced.) An OSPATH object should know whether it is case-sensitive.
The default would be according to the platform it's running on, but
different file systems have different case-sensitivity so eventually if
we want to get better at handling such issues we'll need this. I'm not
planning to do this. However it is an example of how we may need to
encapsulate the path in an object rather than always represent it as a
* (Trivial.) Rename DIRENT to OSPATH. Alternative: FILEPATH, as used in
APR. But such a rename is the least of my concerns, and only makes sense
as a companion to changing the semantics.
I think the changes would not negate the work currently being done to
move to the current new APIs. Even if the same calls need to be changed
again, the current work in discovering and distinguishing what kind of
paths are being handled will make that next step easier.
This all sounds like a lot, but I hope we can do something towards it.
Does it make sense?
Received on 2009-11-11 13:26:20 CET