[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: using isalpha/isalnum in locale-independent code

From: Branko ─îibej <brane_at_xbc.nu>
Date: 2004-11-25 05:27:56 CET

No, you're not being paranoid. Ideally, we'd need a set of
classification functions that are Unicode-aware. For example, the code
that parses a "-rN[:M]" option doesn't know that ASCII 0-9 aren't the
only digits in Unicode. Of course it's debatable whether or not
Subversion should accept such non-ASCII numbers.

I think it would be best if we created our own set of svn_utf_is*
functions that would only work on the ASCII subset and always use those
when dealing with UTF-8 strings.

 There are also a lot of places where we expect string literals or
characters to be encoded in ASCII, which they won't be on an EBCDIC
system. I suspect even path manipulation would break, because our
separator is '/' rather than '0x2f'. Not that I care about EBCDIC systems...

kfogel@collab.net wrote:

>I just got done with issue #1832, and realized that the problem
>described there might be present elsewhere in the code too.
>
>The Problem:
>
>In is_valid_prop_name(), we were using isalpha()/isalnum() to test a
>UTF8 string. (That's not as bad as it seems, because we were only
>interested in the subset of UTF8 that is the same as lower ASCII.)
>However, as Joe Orton pointed out, using isalpha(), etc, was bogus,
>because those can be sensitive to locale, whereas a UTF8 string is a
>UTF8 string no matter what locale it's in. Note that the APR
>equivalents apr_isalpha(), apr_isalnum(), etc, just fall back to the
>underlying C library implementation in most cases, and thus have the
>same problem.
>
>r12029 fixed is_valid_prop_name(). But where else might similar
>things be going on in Subversion? I'll take the sites one by one:
>
> * subversion/libsvn_ra_svn/marshal.c (read_item):
> [...]
> else if (apr_isalpha(c))
> {
> /* It's a word. */
> str = svn_stringbuf_ncreate(&c, 1, pool);
> while (1)
> {
> SVN_ERR(readbuf_getchar(conn, pool, &c));
> if (!apr_isalnum(c) && c != '-')
> break;
> svn_stringbuf_appendbytes(str, &c, 1);
> }
> item->kind = SVN_RA_SVN_WORD;
> item->u.word = str->data;
> }
> [...]
>
>This looks like it's potentially problematic. The ra_svn protocol
>should never be locale-sensitive. A solution similar to r12029 could
>be applied. Perhaps the r12029 changes could be abstracted out.
>(Note there are some apr_isdigit() calls in marshal.c, they would need
>to be fixed too.)
>
> * subversion/libsvn_subr/validate.c (svn_mime_type_validate):
> [...]
> if (! apr_isalnum (mime_type[len - 1]))
> return svn_error_createf
> (SVN_ERR_BAD_MIME_TYPE, NULL,
> _("MIME type '%s' ends with non-alphanumeric character"), mime_type);
> [...]
>
>Same problem here. The svn:mime-type property's value is UTF8,
>specifically the lower ASCII portion (or perhaps even a subset of
>that). Also, technically, that function is using strcspn() and
>strchr() with hardcoded C strings/chars ("; ", '/') as arguments. If
>we were really paranoid, we could rewrite it to search for the
>specific numeric values of the appropriate UTF8 characters, instead of
>depending on C not using (say) EBCDIC.
>
> * subversion/libsvn_subr/opt.c (parse_one_rev):
> [...]
> else if (apr_isalpha (*str))
> {
> end = str + 1;
> while (apr_isalpha (*end))
> end++;
> save = *end;
> *end = '\0';
> if (revision_from_word (revision, str) != 0)
> return NULL;
> *end = save;
> return end;
> }
> [...]
>
>This one is fine, because 'str' is in native encoding. Well,
>actually, it's kind of hard to tell. I would think it *should* be
>UTF8, and that the svn_opt* functions would take UTF8 strings just
>like the rest of subversion. However, tracing the code back, it does
>not look like this string is in UTF8. On the other hand, the doc
>strings of this function and all the other functions on the way back
>to the source don't seem to say anything about encoding :-(.
>
>One possible course is to do nothing about any of these. We haven't
>seen any actual bugs about them, unless I'm forgetting a report. If
>the problems are theoretical only, then who cares?
>
>Basically, I'm really not sure how urgent any of this is, and would
>like others' opinions. Am I being paranoid, or is there a problem
>here?
>
>-Karl
>
>
-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 25 05:28:47 2004

This is an archived mail posted to the Subversion Dev mailing list.