[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Canonical Paths

From: Nuutti Kotivuori <naked_at_iki.fi>
Date: 2002-08-29 22:18:31 CEST

Marcus Comstedt wrote:
> Greg Stein <gstein@lyra.org> writes:
>> As a URL, it would be something like:
>>
>> http://svn.example.com/repos/subdir/gaz%2fonk
>>
>> i.e use URL escaping to avoid the '/' interpretation.
>>
>> However, within our libraries... what to do? Beats the crap outta
>> me. We would need to use/invent an escaping mechanism. Personally,
>> I would simply say that the character is not allowed [on entry to
>> our libs], except as a path separator.
>
> Hm, I think I may actually have a solution to this problem.
> Slightly hackish, but it should be workable.
>
> We're using UTF-8 representation of the paths. In UTF-8, ASCII
> characters (such as '/') are encoded as themselves, a single octet
> with the MSB cleared. There exists also multibyte sequences of
> length 2-6, with each octet having the MSB set, thus making them
> easily distinguishable from ASCII characters.

Argh! I almost shit my pants when I saw this. I was born into UTF-8
late, so I catched the latest specification, which mentioned this
explictly and forbade it's use - and I bought it, hook, line and sink
- swearing to crusify any parser which didn't error out on sequences
like this and to do the same to anyone and her whole family if
something generated sequences like it.

When I think about it objectively - if it's entirely internal to
Subversion and we control the decoders as well, then who cares, might
as well do that. It's one alternative.

But it still makes my neck hair think I should be a hedgehog.

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 29 22:25:12 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.