[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: URI-encoding on 1.7 repository?

From: Garret Wilson <garret_at_globalmentor.com>
Date: Fri, 20 Jan 2012 19:28:01 -0800

On 1/20/2012 7:00 PM, Daniel Shahaf wrote:
> Garret Wilson wrote on Fri, Jan 20, 2012 at 18:18:24 -0800:
>> On 1/20/2012 6:14 PM, Daniel Shahaf wrote:
>>> You don't care what FS backend the server runs. All you care is
>>> that the endpoint of svn_ra_open4() implements the Subversion RA
>>> API properly. Normal Subversion servers use svn_fs.h which in turn
>>> presents the same API _regardless of which backend is used_. I'll
>>> spell it out: the notion of 'valid pathname in a Subversion
>>> filesystem' does not depend on the FS backend in use.
>> All that is good news. So I guess the important question is: what
>> spells out "the notion of 'valid pathname on a Subversion
>> filesystem'"? Is it "any valid Unicode code point?" What I'm getting
> See my previous reply.

Right. So your previous reply said that a "valid pathname" is the same
on all platforms, and that the underlying implementation will take care
of the details. I'm asking what are the rules for a "valid pathname".
I'm glad that these rules are the same across all platforms, but I don't
know what the rules are. In other words, what goes in the following
function?

boolean isValidSubversionPathname(String pathname);

>
>> at is that I need to know which characters, if any, I need to encode
>> before passing them to Subversion. If Subversion supports any
>> Unicode character, I can just pass the path decoded and sleep
>> soundly at night. If not, I need to know which ones to decode and
>> which ones to pass through.
> Err, that depends on what API layer you're working with. (For example:
> svn_fs.h is perfectly happy with :,*,\n as part of the basename, but
> libsvn_wc on windows, and the mergeinfo logic, aren't.)

Oh, that's bad news. In your previous reply you said, "the notion of
'valid pathname in a Subversion
filesystem' does not depend on the FS backend in use." Now you seem to
say "whether some pathname is valid or not it depends on whether you 're
on Windows or some other platform." (Even worse, you seem to be saying
that the notion of "valid pathname" isn't even consistent across the API.)

> And 'what to encode/decode' is a rather vague question. I'm not sure if
> it means "Does `svn info uri:///foo bar` == `svn info uri:///foo%20bar`?"
> or something else. Can you be more concrete?

It doesn't matter. It's some black box that works like this:

String encode(String input);
String decode(String output);

I can come up with a thousand ways to encode/decode. I can use %hh. I
can use ^0xhh. The only two requirements are that 1) encode() provides
me with a string guaranteed to be a valid pathname, and 2) decode() will
take the encoded string and give me back the decoded string I started with.

But to meet requirement #1, I have to know which characters are
considered valid and which aren't. That's what I don't know, and that's
what I'm asking:

 1. Does the API guarantee that a "valid pathname" (whatever that is) is
    the same across all platforms? I thought you said yes, but now it
    seems you're saying no. (If you say "no", then there's no point in
    answering question 2, because we're stuck---I can write code that
    may work with one repository on one platform, but suddenly fail when
    I move the same data to another platform.)
 2. What is the definition of "valid pathname"? Is it any Unicode
    character? Is it only XML name characters? Is it any Unicode
    character except control characters and NULL (\u0000)?

Sorry if I'm not clear. It's a very simple question, and I hope I'm not
making it more complicated than it is.

Think about it this way: pretend you have an XML document with the
element <a-b>. You to walk the DOM of that document on Windows, and it
works fine. But you try process the DOM on a Mac, it breaks, with your
XML processor saying, "sorry, an XML name cannot have a '-' character".
That will never happen. Why? Because (these are analogous questions to
the ones above concerning Subversion):

 1. The XML specification guarantees that all XML processors agree on
    what an XML name is.
 2. Specifically, an XML name is composed of a NameStartChar followed by
    any NameChar, as defined here: http://www.w3.org/TR/REC-xml/#NT-Name

Does that make sense? Can we answer those same two questions concerning
Subversion pathnames?

Garret
Received on 2012-01-21 04:28:58 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.