On Thu, Feb 25, 2010 at 2:06 PM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> Lieven Govaerts wrote:
>> [[[
>> ra_serf: Fix support for international characters in paths.
[..]
>
>> [[[
>> Index: subversion/libsvn_ra_serf/merge.c
>> ===================================================================
>> --- subversion/libsvn_ra_serf/merge.c (revision 911289)
>> +++ subversion/libsvn_ra_serf/merge.c (working copy)
>> @@ -364,7 +364,7 @@
>> info->prop_val = apr_pstrmemdup(info->pool, info->prop_val,
>> info->prop_val_len);
>> if (strcmp(info->prop_name, "href") == 0)
>> - info->prop_val = svn_uri_canonicalize(info->prop_val,
>> info->pool);
>> + info->prop_val = svn_ra_serf__uri_to_internal(info->prop_val,
>> info->pool);
>
> As I mentioned on IRC, you canonicalize here when the URI is being put
> in to the hash. But when it gets read out from the hash (about 70 lines
> higher up) you also need to canonicalize the URL that it is compared
> with. Here:
>
> [line 299]
> href = apr_hash_get(info->props, "href", APR_HASH_KEY_STRING);
> if (! svn_uri_is_ancestor(ctx->merge_url, href))
>
> ... ctx->merge_url comes from session->repos_url, and that comes
> from ... somewhere higher up.
>
> You could canonicalize it right here, but really we need to fix the
> global svn_uri_canonicalize() to do this kind of canonicalization
> throughout Subversion, not just in ra_serf. Currently,
> svn_uri_canonicalize() simply isn't producing a canonical URI.
>
Digging a bit further...
Looking at chapter 6 Normalization and Comparison in RFC 3986 there
are a few things svn_uri_canonicalize does, and a few things it
doesn't.
It does:
- Case normalization, partially:
- scheme and hostname are converted to lowercase
- file:/// components are converted to lowercase on Windows
- other case sensitive components are left untouched
- Path segment Normalization
removal of "." and ".." path segments and adjacent separator
characters where possible
- Scheme-based normalization, partially
http://example.com vs http://example.com/
It does not:
- case normalization of percent-encoding triplets, which should use
uppercase letters A-F.
- scheme based normalization: removal of the default port for a scheme
(http://example.com:80 vs http://example.com)
- Percent-encoding normalization: decoding of any percent-encoded
character that shouldn't have been encoded
What about making svn_uri_canonicalize support all these
normalizations? It will solve the issue at hand, and a few others that
are easy to run into.
On Thu, Feb 25, 2010 at 12:11 AM, Bert Huijben <bert_at_qqmail.nl> wrote:
>
> svn_uri_canonicalize is still used for more than urls/uris, so it doesn't handle any encoding itself.
Hm, why is it called svn_uri_canonicalize if it takes other things
than uri's? I think it should take already encoded uri's only (and we
should make that more explicit in the documentation).
>
> But besides that, I also think this is not the right way to fix this. For several characters in uris it is optional if they are escaped or not. We need a better fix than just fixing the casing of the escaped characters...
-> Decode this optional-escaped characters during canonicalization.
>
> Unescaping the paths would be an option that resolves it in the generic case, or unescape followed by a specific standard escaping. (This last method is used in some class libraries to avoid similar issues, but to provide a useful uri anyway)
Unescaping followed by escaping is probably the easiest way to
implement this, but not really the fastest. We do have to do this only
in places where svn gets uri's from outside though.
Lieven
Received on 2010-02-26 22:01:19 CET