+ ------- Additional Comments From kfogel@tigris.org 2003-01-27 09:40
PST -------
+ This may be due to a long-standing problem (which was only theoretical
+ up till now) in our XML formats, namely, that we use attributes for
+ path names, instead of cdata. Do attribute values really allow any
+ UTF-8 data to pass through unmunged, or do they have tighter
+ restrictions? I think they might be a bit tighter, patricularly when
+ it comes to things like whitespace. Not positive about that, though.
Not really tighter, just different. Whitespace characters in attribute
values are all converted to spaces -- unless they are escaped. So the
correct way to escape filewitha\ttab would be filewitha#9tab. Just
embedding the tab in there directly will not work.
This is another way of saying that you generally want two different
escaping functions, one for character data and one for attribute values.
Oh, one other nit: XML does not allow *arbitrary* Unicode characters,
only ones that it thinks of as text. In particular, all ASCII control
characters except the whitespace characters are forbidden in XML, *even
if they're escaped*. This took my breath away when I first realized it
-- actually it still does.
+ Anyway, I always thought the fundamental theorem of XML is that
+ attributes are for data under the control of the DTD (i.e., metadata),
+ and cdata is for user data (i.e., data), since the latter must pass
+ through unmunged except (XML-escaping/unescaping doesn't count, of
+ course).
I'm unfamiliar with this dictum. I like using attributes for user data.
Luke
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 14 02:20:20 2006