[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Control characters in log message cause failure

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-11-29 21:58:47 CET

kfogel@collab.net writes:

> Actually, I don't think we *can* fix this without UTF8 predicate
> functions, to say whether a given byte is part of a multibyte UTF8
> char, or just a stray control char. Which makes this unexpectedly
> related to the
>
> "Re: using isalpha/isalnum in locale-independent code"
>
> thread going on elsewhere :-).
>
> So I'd say yeah, file an issue, and maybe mention the connection to
> that thread? (Or, tell me I'm crazy for thinking that the UTF8
> functions are a prerequisite for solving this ctrl char problem.)

OK, you're crazy ;)

The UTF-8 encoding has the property that multibyte encodings consist
solely of bytes with the high bit set, i.e non-ASCII values greater
than 0x80. Provided the log message is valid UTF-8 (something we
already check) then any ASCII control characters must be single byte
characters.

See the header in utf_validate.c for the gory details.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 29 22:00:19 2004

This is an archived mail posted to the Subversion Dev mailing list.