[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Control characters in log message cause failure

From: Ben Reser <ben_at_reser.org>
Date: 2004-12-01 04:57:15 CET

On Mon, Nov 29, 2004 at 11:25:03PM +0000, Philip Martin wrote:
> One of us is confused, or perhaps is just terminology.
>
> There is no "mixed UTF8/non-UTF8 string" and there are no "non-UTF8"
> characters that need to be "converted". There may be ASCII control
> codes in the log message, and if these are not valid XML then they
> need to be rejected or escaped, but the only place that UTF-8 comes in
> is that ASCII control codes are encoded unchanged in UTF-8.
>
> It looks like we have the same problem with paths in the entries file:
>
> $ svn mkdir wc/`printf "\x18"`
> $ svn st wc
> ../svn/subversion/libsvn_wc/entries.c:671: (apr_err=130003)
> svn: XML parser failed in 'wc'
> ../svn/subversion/libsvn_subr/xml.c:365: (apr_err=130003)
> svn: Malformed XML: not well-formed (invalid token) at line 13

I was under the impression that Unicode disallowed control characters
with the exception of tab, carraige return and line feed. The XML
specification certainly gives me that impression:
http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char

Unfortunately, I don't have access to the Unicode standard to be sure.

-- 
Ben Reser <ben@reser.org>
http://ben.reser.org
"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 1 04:58:51 2004

This is an archived mail posted to the Subversion Dev mailing list.