[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Control characters in log message cause failure

From: Kalle Olavi Niemitalo <kon_at_iki.fi>
Date: 2004-12-01 07:00:34 CET

Ben Reser <ben@reser.org> writes:

> I was under the impression that Unicode disallowed control characters
> with the exception of tab, carraige return and line feed. The XML
> specification certainly gives me that impression:
> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char
>
> Unfortunately, I don't have access to the Unicode standard to be sure.

I have the Unicode 3.0 book, and the annexes bringing that to 3.2
are on the net. Sections 2.8 (Controls and Control Sequences)
and 13.1 (Control Codes) indicate that Unicode allows all control
codes 00-1F and 7F-9F but leaves the semantics of most of them to
be defined by higher-level protocols. If XML is such a protocol,
then I suppose it has the right to disallow those characters.

The term "legal character" used in XML doesn't seem to be defined
in Unicode at all; there is "illegal code value sequence" but it
only applies to encoding forms such as UTF-8, not to individual
characters.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 1 07:01:52 2004

This is an archived mail posted to the Subversion Dev mailing list.