[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: object-model: Wrapping Subversion C-structs in C++

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Fri, 24 Sep 2010 11:43:16 -0500

On Fri, Sep 24, 2010 at 1:02 AM, Branko Čibej <brane_at_xbc.nu> wrote:
>  On 24.09.2010 04:05, Hyrum K. Wright wrote:
>> On Thu, Sep 23, 2010 at 2:20 PM, Branko Čibej <brane_at_xbc.nu> wrote:
>>>  On 22.09.2010 21:41, Hyrum K. Wright wrote:
>>>> [ apologizes for the somewhat stream-of-conscious nature of these mails ]
>>>>
>>>> On Wed, Sep 22, 2010 at 7:16 PM, Hyrum K. Wright
>>>> <hyrum_wright_at_mail.utexas.edu> wrote:
>>>>> On Wed, Sep 22, 2010 at 5:35 PM, Hyrum K. Wright
>>>>> <hyrum_wright_at_mail.utexas.edu> wrote:
>>>>>> For the C++ folks out there, I've got a question about an approach to
>>>>>> take on the object-model branch.  At issue is how to wrap the various
>>>>>> C structures returned to callers, particularly in a backward
>>>>>> compatible manner.  Currently, I'm looking at svn_wc_notify_t *.  As I
>>>>>> see it, there are a few options:
>>>>>>
>>>>>> 1) Just wrap the pointer to the C struct as a member of the wrapper class.
>>>>>>    Pros: Easy to implement; lightweight constructor.
>>>>>>    Cons: Getters would need to translate to C++ types; would need to
>>>>>> implement a copy constructor which deep copies the C struct; would
>>>>>> also introduce pools, since creating and duplicating C structs
>>>>>> requires them.
>>>>>>
>>>>>> 2) Wrap each C struct member individually
>>>>>>    Pros: C->C++ complexity is constrained to the constructor,
>>>>>> everything else is C++ types
>>>>>>    Cons: Hard to extend for future compatibility
>>>>>>
>>>>>> 3) Just pass the C-struct pointer around; don't even bother with a class
>>>>>>    Pros: Dead simple.
>>>>>>    Cons: Requires more memory management thought by consumers; not
>>>>>> C++-y enough; may introduce wrapping difficulties.
>>>>>>
>>>>>> I'd like to come up with something consistent, which would be used
>>>>>> throughout the C++ bindings.  I'm also interested in a solution which
>>>>>> ensures the C++ bindings can be used as the basis for other
>>>>>> object-oriented bindings models (Python, Perl, etc.)
>>>>> After lunch, and some thought, it feels like #1 is the best solution.
>>>>> This doesn't change the external class interface, which is good, and
>>>>> can still provide C++ values to callers who want them.  The pool
>>>>> issues are a bit messy, but at least the object can manage it's own
>>>>> memory (albeit at a significant overhead).
>>>> This could get ugly.
>>>>
>>>> Creating and destroying pools all over the place could get ugly, but
>>>> it's necessary evil because all of our object creation / duplication
>>>> functions all require a pool.  An alternative would be a set of
>>>> functions returning the size of the object, and then another which
>>>> puts the object in a pre-allocated memory location.  (These could
>>>> theoretically replace the pool argument version of the API, but that'd
>>>> be *a lot* of churn.)
>>>>
>>>> The approach would let the C++ allocate the memory (of the correct
>>>> size) using whatever scheme it wants, and then do a "placement
>>>> initialize" using the second API.  If we do go this route, I'd
>>>> recommend exposing these as private-to-Subversion, at least initially.
>>>>
>>>> The other option is just pass the C-struct pointer around everywhere,
>>>> but then the bindings consumers have to work about this exact same
>>>> issue.  In other words, it solves it now, but really just pushes the
>>>> problem elsewhere.
>>>>
>>>> -Hyrum
>>> Memory management with pools and C++ -- keep away from doing it in
>>> per-object ctor/dtor pairs is all I can say. Lifetimes would get so
>>> messy and mucked up you could hardly believe it.
>>>
>>> IMHO the best way to wrap the C structures in C++ is to subclass 'em.
>>>
>>> class svn_some_thing : public svn_some_thing_t { .... };
>>>
>>> This way you can pass C++ pointers directly to the C implementation,
>>> most methods become just inlined wrappers. Callbacks are a bit more
>>> hairy, but not all that much.
>> This works for input types, but right now it's the output types (which
>> are much more common) that I'm concerned about.  Things like
>> svn_commit_info_t, or svn_wc_notify_t.  For my initial hack at it see:
>> http://svn.apache.org/repos/asf/subversion/branches/object-model/subversion/bindings/c++/include/Types.h?p=1000600
>
> If you're strict about the wrapping, i.e., do not add any data members
> or virtual functions to the wrappers, then a simple static_cast of the
> returned pointer will solve the type conversion for output parameters.
> It makes the wrapper methods a bit more complex, but still easily inlinable.

Well, except for the pool lifetime issues. The wrapped pointer would
still be allocated in a the result pool of the called function, but by
doing a cast, and returning the wrapper to the caller, we've now
hidden that fact (and given the consumer more than enough rope to hang
their entire team). Hence the desire to duplicate the returned value
into a pool managed outside of the one provided to the C API.

> Two things worry me about your approach:
>
>    * Addidional dereference overhead, and construction overhead ...
>      might or might not be important performance-wise, and certainly is
>      important bug-wise ...
>    * The pool-per-object that you already mentioned, made worse by the
>      fact that you implement operator= ... nice syntactic sugar, but
>      has serious side effects.
>
> The pool-per-object is especially bothersome because it creates the
> false sense that you don't have to worry about pool lifetime ... but you
> *do* because every pool has a parent, you've just hidden the fact away
> very obscurely. If you /do/ want to give each object a reference to its
> containing pool (by no means a bad thing), there are better ways, see below.
>
> Consider too what you'll do with destructors and pool destruction order.
> Clearly any destructor for objects such as yours cannot be allowed to
> touch the pointer to the object contained in the pool, but what happens
> if the parent pool is destroyed before the object that contains the
> child pool? (I guess you noticed that pools are decidedly un-C++-like.)

All of the Pools used to hold the child objects are children of the
global parent (created with NULL as the parent pool). As such, they
are independent of each other, and won't have destruction order
issues. It's pretty wasteful in terms of the memory overhead, but
meets the goal of having each object have it's own pool, and control
it's own lifetime independent of other objects.

Pool are certainly un-C++-like, but our APIs don't help to much
either. To allocate (or duplicate) these structures, we require
callers to provide a pool, instead of a generic allocation mechanism.
If we did the latter, we could easily put these objects into memory
managed "natively" by C++, but because we don't, a Pool is required.
That's really the only reason to use pools in these objects.

>>> As to pools ... I'd once thought that a proper C++ wrapper for a pool
>>> would be a reference-counted smart pointer. That turns out to be too
>>> much overhead and too much of a good thing, since you *cannot* allow
>>> pool lifetime to depend on some reference counting order of execution.
>>> Best just use plain pool pointers, or maybe wrap them minimally so that
>>> destructors do the pool cleanup reliably.
>> Yeah, we've got a Pool class which destroys the underlying APR pool in
>> the dtor, so that provides a handy way to clean things up (if it
>> weren't for the 8k overhead of the minimum pool size).  To the initial
>> hack, I've added a smart pointer to prevent multiple redundant copies
>> of the underlying C struct, since I expect it to be read-only (we
>> could probably add an explicit Clone() method if folks really wanted a
>> deep copy).  The smart pointer magic is here:
>> http://svn.apache.org/repos/asf/subversion/branches/object-model/subversion/bindings/c++/include/Types.h
>>
>> There's still a self-managed pool for allocating the C-struct, but
>> it's encapsulated enough to be changable should we have a better way
>> of duplicating objects in the future.
>>
>> Thanks for the insight, by the way.  These classes and paradigms are
>> in no way set in stone, and I welcome discussion.  I sense you've got
>> a bit more C++ design experience than I do. :)
>
> Mmmm ... wouldn't know about that, and it's been ages since I'd touched
> C++. Nevertheless ... the Pool class gives me the shivers. :) At the
> very least you forgot to declare the copy constructor private Your code
> allows passing Pools by value to functions ... not a good idea, as you
> noted yourself in the comment.

It wasn't actually a copy constructor, but a "create a child pool"
constructor. However, we weren't using it, and there are other ways
of implementing that functionality. The confusing with the
possibility of the copy constructor, for both humans and compilers,
was too great, so I've made the declaration private, and removed the
definition.

> I suggest you take a look at auto_ptr and auto_ptr_ref. You need a
> similar pair of classes that will deal with pools, then you can strictly
> control when you're passing a "handle" to a pool and where you're
> holding onto the pool itself. You can't use auto_ptr directly, but you
> can use a simple wrapper to put inside auto_ptr. You may even then be
> able to derive the real Pool and PoolRef from auto_ptr and auto_ptr_ref,
> respectively ... I forget how the inheritance rules for auto_ptr work,
> they might be tricky. Certainly wrapping will work, though.

I'm familiar with auto_ptr's, but not auto_ptr_ref, although I'm not
yet convinced that we need to use either. I'm going to need to think
about this for bit...

Thanks for the insight,
-Hyrum
Received on 2010-09-24 18:43:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.