[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn_xml_is_xml_safe() ... is not so safe?

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2005-02-20 23:50:20 CET

On Fri, 18 Feb 2005, Julian Foad wrote:

> Ben Collins-Sussman wrote:
> > I thought that the function svn_xml_is_xml_safe() was our magic ticket,
> > but upon looking at its implementation, it seems to be overly
> > restrictive. Look at the bitmask it uses: it won't allow valid UTF8
> > through!
> >
...
> This has bothered me for a while. I hope we can just improve its
> implementation rather than add another function, but we need to review our uses
> of it to be sure, and I haven't done that.
>
It is used to decied if prop vals can be sent raw or need to be base64
encoded in mod_dav_svn. That's the only places I found. I don't see why it
shouldn't be safe to chane it to allow valid UTF8 as well.

> I also haven't investigated what the exact implementation ought to be. (One or
> two sub-ranges of UTF are not allowed.)
>
We don't want certain control chars. Also FFFE and FFFF are unwanted.
We need to check to UTF8 representation of these and check them
explicitly.

(Surrogates are also disallowed, but they aren't valid UTF8 either,
AFAICT, so they are already handled by the UTF8 validation routines.)

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Feb 20 23:49:48 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.