[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn_kstring_t

From: Greg Stein <gstein_at_lyra.org>
Date: 2001-06-06 11:25:49 CEST

As most of you may be aware, I have a particular dislike(*) to our abundance
of svn_string_t use. In my mind, the problem is exposed by constructs such
as this:

    svn_string_create("some constant string", pool);

We are creating an svn_string_t on the heap, allocating more bytes for the
string on the heap, and copying that string to the heap. All in the name of
passing a *constant* to a function.

When C defines a perfectly reasonable alternative (const char *) for this,
it just pisses me off. :-)

Now, it is somewhat possible to "fake" a constant svn_string_t. Consider its
definition:

typedef struct svn_string_t
{
  char *data;
  apr_size_t len;
  apr_size_t blocksize;
  apr_pool_t *pool;
} svn_string_t;
                    
We could initialize a string like this:

  svn_string_t my_str = { CONSTANT, strlen(CONSTANT), 0, NULL };

The problem is that when we have a function such as:

  void some_function(svn_string_t *s);

As a caller, we don't know whether that function is going to modify the
string or not. If it *does*, then my_str's construction is going to cause a
segfault right quick.

We can ameliorate the problem somewhat by saying:

  svn_string_t my_str = { CONSTANT, strlen(CONSTANT), strlen(CONSTANT), pool };

Thus, when some_function() goes to change the string, it can at least
reallocate it from a pool. However, it still can't poke values into s->data
because those are bytes from CONSTANT which is (typically) mapped into
readonly memory.

As a result, for functions taking svn_string_t, even if they will treat them
as a constant, we must construct a "real" structure and pass that.

Even worse, if the function *will* treat it as a constant, we can't assume
that (based purely on the function signature) and must pass it an
svn_string_t that we are okay with having it changed. This could mean
creating a dup() of the string. (yes, it is possible to resolve the
situation in the function doc/comments, but that reliance causes
brittle-ness in the code)

So far we've discussed two problems:

1) a function taking an svn_string_t cannot tell its caller whether the
   argument will be modified or not (thus the caller must be conservative in
   the construction of the argument value)

2) because of (1), we must copy "STRING CONSTANTS" onto the heap, along with
   the supporting svn_string_t structure, to be passed to the function.

All that said... why do we have svn_string_t? Two words: counted strings.

The svn_string_t type means that we don't have to do a strlen() all over the
place; we can easily compute the length once, then pass it around.

Second: we can over-allocate the buffer so that we have space for various
operations (such as "append").

------------------

My recommendation is based on the assumption that Benefit #2 (over-allocate)
happens only in a few situations. Most of our operations are passing around
strings that will not be modified within the called function. With this in
mind, we can construct a more finely tuned structure:

typedef struct {
    const char *data;
    apr_size_t len;

} svn_kstring_t;

I'm calling it a "kstring" for "constant string" (some ancient programming
style uses "k" to mean "constant"). I avoided "cstring" because of confusion
with "C style strings". Another alternative might be "istring" for
"immutable string". Being counted strings, these could contain null bytes,
but the typical usage is for a C string, so I discounted names such as
"svn_membytes_t" or "svn_counted_t".

I'd like to propose that we introduce the svn_kstring_t type and begin
converting functions from "svn_string_t *" to "const svn_kstring_t *". Since
a kstring is immutable, there aren't a lot of functions to be added to
svn_string.h:

    svn_kstring_create()
    svn_kstring_ncreate()
    svn_kstring_createf()
    svn_kstring_createv()
    svn_kstring_dup()
    svn_kstring_isempty()
    svn_kstring_compare()
    svn_kstring_first_non_whitespace()
    svn_kstring_find_char_backward()

We can implement the other svn_string functions, but they would return a new
svn_kstring_t rather than modify-in-place. I don't have a particular feel
either way on that.

There are a number of other macros and functions for flipping between
svn_string_t and svn_kstring_t as needed.

As an aside: my particular desire to revisit this issue is because of the
language bindings issue. We'd like to be able to pass a language's native
string to our functions. However, when these functions take an svn_string_t,
we have two problems:

1) if the svn_string_t implies a modification (and we can't tell from the
   signature! we have to explicitly encode this knowledge into our binding
   code), then we must treat the parameter as IN/OUT and all that implies.

2) to create a proper svn_string_t, we need a pool. *what* pool should that
   be? This has two answers:
   a) most functions have a pool argument; encode knowledge for each
      function binding that says to use that pool
   b) create and toss a pool on each function invocation to hold the
      svn_string_t.

The above two problems are a real bitch for our language bindings. Yes, we
can solve it without a new svn_kstring_t, but it is a perfect demonstration
that svn_string_t is an improper type for many things. Consider
svn_fs_change_rev_prop:

svn_error_t *svn_fs_change_rev_prop (svn_fs_t *fs,
                                     svn_revnum_t rev,
                                     svn_string_t *name,
                                     svn_string_t *value,
                                     apr_pool_t *pool);

The binding language should be able to pass constant strings for name and
value. Not IN/OUT parameters. Nor should we have to add logic to our
bindings generation to say "well, svn_fs_change_rev_prop doesn't actually
change those values, so it is fine to construct pseudo-svn_string_t values)

If the above example used svn_kstring_t (or const char *), then we could
take the binding language's value and directly pass it in, without worry
about how to propagate changes back.

Okay... that is my opening salvo. I'll stop for now. :-)

Thoughts?

Cheers,
-g

(*) I find seven swear words in our codebase; all attributable to me; all in
    association with svn_string_t usage. :-)

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:31 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.