[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bug: Control char in commit message

From: Peter Davis <peter_at_pdavis.cx>
Date: 2002-12-05 01:54:03 CET

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 04 December 2002 15:41, Philip Martin wrote:
> Look at libsvn_subr/xml.c:xml_escape, Subversion currently escapes the
> five characters &<>"'. In particular it doesn't escape the ^H that
> Andreas used. I find it odd that Subversion "escapes" a different set
> of characters from that "quoted" by apr_xml_quote_elem, but then I
> don't know much about XML or UTF8.

Basically, xml_escape is wrong. Those are the only characters that need to be
escaped in normal human text, but control characters need to be as well. In
fact, I've seen some parsers just escape everything outside of 0x20 to 0x127
(not including newlines (except in XML attributes) and including the 5
characters above). That's probably a bit overboard, but it's safe for
US-ASCII and all of the ISO-9660-* encodings AFAIK, as well as UTF-8.

Does apr_xml_quote_elem do a better job? Is there a reason why svn needs its
own xml_escape function instead of using the apr (or expat) versions?

Looking at the code, xml_escape() is wrong in another way. If inside a CDATA
section, you cannot escape a "]]>" by "]]&gt;". You have to exit the CDATA
section, write the ">", and start a new one. Thus, you get
"]]>&gt;<![CDATA[". Also, there is no requirement to escape ">" following
"]]" if not in a CDATA section. That comment (line 47) should just be
removed, and possibly there needs to be a separate xml_escape_cdata()
function, if apr doesn't already provide one.

Technically, ">" never needs to be escaped. The "&gt;" character-entity is
only provided for symmetry with "&lt;".

Ugh, I just looked at xml_unescape: it doesn't handle any numeric character
escapes (it just ignores anything other than the basic entities). Again, why
re-invent the wheel (incorrectly)? Will apr or some other library do this?

If not, then I'd be happy to correct both functions.

- --
Peter Davis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE97qOrhDAgUT1yirARAoloAJwNcjaAyoqMuO5d+CzcSK+7kCAWxwCeOzbY
m1BILqh0z//JkBUbjgj03xk=
=u805
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 5 01:54:49 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.