[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: issue #1796

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Mon, 26 May 2008 23:23:09 +0200

Karl Fogel wrote:
> Neels Janosch Hofmeyr <neels_at_elego.de> writes:
>> Checking for LF is easy. For UTF-8, there is this function in
>> subversion/libsvn_subr/utf.c called check_utf8(..), which I gather I
>> cannot access from libsvn_repos unless it is made public in
>> subversion/include/svn_utf.h
>> So, I want to rename check_utf8 to svn_utf_check_utf8, put a "@since New
>> in 1.6" tag on the public doc string, publish it in include/svn_utf.h,
>> adjust all callers and use it in libsvn_repos/fs-wrap.c, in function
>> validate_prop(..).
>> Am I on the right track here? :)
> You're on the right track, but you don't have to make the function
> public. Subversion has an intermediate level of inter-library privacy,
> to allow a symbol to be shared among Subversion's modules without
> publishing that symbol to the world. See here:
> http://svn.collab.net/repos/svn/trunk/subversion/include/private
> Does that help?
Well, it would have, but there isn't any utf function published in
include/private/ either.

I could go on to publish check_utf8 in include/private/. But all the
other UTF-8 functions are declared in include/svn_utf.h. Wouldn't it be
silly to publish check_utf8 in a completely different place from the
rest of the UTF stuff? (I see check_utf8 in the category of "general
purpose tools that are nice to have around".)

If not, where in include/private/ would check_utf8 go? check_utf8 is
defined in libsvn_subr/utf.c, but there is no include/private/svn_subr.h
to add it to...

By the way, there is an alternative to check_utf8. Let's look at the
implementation of check_utf8() in libsvn_subr/utf.c:

/* Verify that the sequence DATA of length LEN is valid UTF-8 */
static svn_error_t *
check_utf8(const char *data, apr_size_t len, apr_pool_t *pool)
  if (! svn_utf__is_valid(data, len))
    return invalid_utf8(data, len, pool);
  return SVN_NO_ERROR;

check_utf8 calls svn_utf__is_valid(), defined in libsvn_subr/utf_validate.c:

svn_utf__is_valid(const char *data, apr_size_t len)
  const char *end = data + len;
  int state = FSM_START;
  while (data < end)
      unsigned char octet = *data++;
      int category = octet_category[octet];
      state = machine[state][category];
  return state == FSM_START ? TRUE : FALSE;

The difference is that svn_utf__is_valid() returns a boolean, where
check_utf8 returns a SVN_ERROR. Both of these functions are only
accessible within libsvn_subr, and are neither in include/ nor in

Any opinions on which one of these functions should be published (and

Currently, a lot of functions that convert from/to UTF-8 are declared in
include/svn_utf.h, but none that just verify. Should I rather attempt a
conversion and discard the resulting copied data? No, right?


Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194

Received on 2008-05-26 23:23:38 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.