Re: Bug: Control char in commit message

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-12-05 17:31:25 CET

Peter Davis <peter@pdavis.cx> writes:

[...]
> fact, I've seen some parsers just escape everything outside of 0x20 to 0x127
> (not including newlines (except in XML attributes) and including the 5
> characters above). That's probably a bit overboard, but it's safe for
> US-ASCII and all of the ISO-9660-* encodings AFAIK, as well as UTF-8.

(I'm assuming you mant 127 == 0x7f, not 0x127)

Escaping everything is of course safe, but if you want to escape
characters over 0x7f you have to take care: The UTF-8 octet sequence
0xc3 0xa4 (representing the character "ä") has to be escaped as
ä (or ä or ä), not Ã¤. The escapes encode
characters, not octets. Therefore, in the case of UTF-8, it's better
_not_ to try to escape characters beyond ASCII.

The octets 0-127 can safely be encoded as &#nn; though, since in this
range the octet value and the UNICODE codepoint of the character are
the same (this goes for UTF-8 as well as ISO-8859-* (ISO-9660 is the
CD-ROM filesystem standard :)).

The best option would probably be to encode the characters 0-31
(except 10 and 13) and 127 as numeric character entities, and '"&<> as
named character entities (' " & < >), leaving all
other characters/octets unescaped. If only one of the quote
characters are used to enclose all attributes, then the other one
doesn't need to be escaped.

// Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 5 17:33:04 2002

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: ra_svn tunnel configuration"
Previous message: Boris Boutillier: "Umask with rc_local"
In reply to: Peter Davis: "Re: Bug: Control char in commit message"
Next in thread: John Barstow: "RE: Bug: Control char in commit message"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]