[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Canonical Paths

From: Russ Allbery <rra_at_stanford.edu>
Date: 2002-08-29 20:50:30 CEST

Marcus Comstedt <marcus@mc.pp.se> writes:

> Here comes the trick. Notice that this range includes the range [0
> .. 127], the ASCII characters. (In fact all UTF-8 multibyte escapes
> have a range which includes ASCII, since they all start at 0.) That is,
> although an ASCII character such as '/' is normally encoded as its ASCII
> representation (00101111), we could instead encode it as 11000000
> 10101111, which would then be a kind of _escaped_ '/',

I'm not sure if this matters for the purposes of your use of this, but
quite a lot of UTF-8 software will refuse characters like this. There was
much discussion of this a while back and varient representations have been
explicitly banned in the UTF-8 spec now, so data containing such sequences
is invalid UTF-8.

The justification was security worries about having multiple
representations for special characters.

Russ Allbery (rra_at_stanford.edu)             <http://www.eyrie.org/~eagle/>
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 29 20:51:30 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.