[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: URI-encoding on 1.7 repository?

From: Garret Wilson <garret_at_globalmentor.com>
Date: Fri, 20 Jan 2012 19:34:39 -0800

Oh, and it case I wasn't clear, I'm referring to a Subversion
repository, not a local copy. And I'm referring to the top-most API. If
some of the lower layers are more restrictive than the top-most API,
then they should use some encoding scheme (what, I don't care) to shield
this platform-specific restriction from the top-level API---which is
what I thought Daniel was saying at first.

Garret

On 1/20/2012 7:28 PM, Garret Wilson wrote:
> On 1/20/2012 7:00 PM, Daniel Shahaf wrote:
>> Garret Wilson wrote on Fri, Jan 20, 2012 at 18:18:24 -0800:
>>> On 1/20/2012 6:14 PM, Daniel Shahaf wrote:
>>>> You don't care what FS backend the server runs. All you care is
>>>> that the endpoint of svn_ra_open4() implements the Subversion RA
>>>> API properly. Normal Subversion servers use svn_fs.h which in turn
>>>> presents the same API _regardless of which backend is used_. I'll
>>>> spell it out: the notion of 'valid pathname in a Subversion
>>>> filesystem' does not depend on the FS backend in use.
>>> All that is good news. So I guess the important question is: what
>>> spells out "the notion of 'valid pathname on a Subversion
>>> filesystem'"? Is it "any valid Unicode code point?" What I'm getting
>> See my previous reply.
>
> Right. So your previous reply said that a "valid pathname" is the same
> on all platforms, and that the underlying implementation will take
> care of the details. I'm asking what are the rules for a "valid
> pathname". I'm glad that these rules are the same across all
> platforms, but I don't know what the rules are. In other words, what
> goes in the following function?
>
> boolean isValidSubversionPathname(String pathname);
>
>
>>
>>> at is that I need to know which characters, if any, I need to encode
>>> before passing them to Subversion. If Subversion supports any
>>> Unicode character, I can just pass the path decoded and sleep
>>> soundly at night. If not, I need to know which ones to decode and
>>> which ones to pass through.
>> Err, that depends on what API layer you're working with. (For example:
>> svn_fs.h is perfectly happy with :,*,\n as part of the basename, but
>> libsvn_wc on windows, and the mergeinfo logic, aren't.)
>
> Oh, that's bad news. In your previous reply you said, "the notion of
> 'valid pathname in a Subversion
> filesystem' does not depend on the FS backend in use." Now you seem to
> say "whether some pathname is valid or not it depends on whether you
> 're on Windows or some other platform." (Even worse, you seem to be
> saying that the notion of "valid pathname" isn't even consistent
> across the API.)
>
>> And 'what to encode/decode' is a rather vague question. I'm not sure if
>> it means "Does `svn info uri:///foo bar` == `svn info uri:///foo%20bar`?"
>> or something else. Can you be more concrete?
>
> It doesn't matter. It's some black box that works like this:
>
> String encode(String input);
> String decode(String output);
>
> I can come up with a thousand ways to encode/decode. I can use %hh. I
> can use ^0xhh. The only two requirements are that 1) encode() provides
> me with a string guaranteed to be a valid pathname, and 2) decode()
> will take the encoded string and give me back the decoded string I
> started with.
>
> But to meet requirement #1, I have to know which characters are
> considered valid and which aren't. That's what I don't know, and
> that's what I'm asking:
>
> 1. Does the API guarantee that a "valid pathname" (whatever that is)
> is the same across all platforms? I thought you said yes, but now
> it seems you're saying no. (If you say "no", then there's no point
> in answering question 2, because we're stuck---I can write code
> that may work with one repository on one platform, but suddenly
> fail when I move the same data to another platform.)
> 2. What is the definition of "valid pathname"? Is it any Unicode
> character? Is it only XML name characters? Is it any Unicode
> character except control characters and NULL (\u0000)?
>
> Sorry if I'm not clear. It's a very simple question, and I hope I'm
> not making it more complicated than it is.
>
> Think about it this way: pretend you have an XML document with the
> element <a-b>. You to walk the DOM of that document on Windows, and it
> works fine. But you try process the DOM on a Mac, it breaks, with your
> XML processor saying, "sorry, an XML name cannot have a '-'
> character". That will never happen. Why? Because (these are analogous
> questions to the ones above concerning Subversion):
>
> 1. The XML specification guarantees that all XML processors agree on
> what an XML name is.
> 2. Specifically, an XML name is composed of a NameStartChar followed
> by any NameChar, as defined here:
> http://www.w3.org/TR/REC-xml/#NT-Name
>
> Does that make sense? Can we answer those same two questions
> concerning Subversion pathnames?
>
> Garret
>
Received on 2012-01-21 04:35:53 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.