Re: Proof of concept higher-level python bindings for SoC project

From: David James <james_at_cs.toronto.edu>
Date: 2007-04-07 06:57:46 CEST

On 4/6/07, Giovanni Bajo <rasky@develer.com> wrote:
> > Should we write some code to edit lower-level API functions and
> > upgrade them so that they throw exceptions instead of returning error
> > codes? I think that this is a good idea, as long as this difference is
> > carefully documented.
>
> Of course. IMO, it is fine for the lower layer to provide an API which is
> slightly modified from the C API, as long as the modifications are general and
> automatic, and thus can be remembered by users without having to look at a
> reference guide. Basically, you want your modifications *not* to require a
> specific Python reference documentation: the C reference documentation ought
> to be enough. Thus, the rules must be simple and general. For instance:
>
> - All functions returning an error through a svn_error_t automatically raise
> a SubversionException() instead.
> - All functions returning multiple return values through the usual
> C-language "pattern" of passing pointers to variables are modified to return
> tuples of values instead.
>
> These are good general rules which are easy to remember and can be
> unconditionally applied to all functions. When somebody has read them in the
> documentation, she can go ahead and use the API just by looking at the C
> reference.

I like the first rule, because it's simple, but I don't think the
second rule is as simple as it looks. It's not easy for users (or for
us) to figure out which parameters are output parameters.
- Most (but not all) doubly indirect pointers are output parameters
- Some regular pointers are output parameters
- Sometimes, the SWIG bindings treat a pair of input parameters as a
single output parameters (e.g. char * / apr_size_t * pairs are
sometimes treated as a single output string)
- For some functions, it isn't really clear whether a parameter is an
input or an output parameter. If you, for example, pass in an array to
a function so that the function can add elements to the end of the
array, is the array an input parameter or an output parameter? I would
probably treat this as an input parameter, but the SWIG bindings often
treat this This type of distinction is confusing for users.
- Some parameters are delayed output parameters (e.g., the pointer
that you pass in isn't initialized by this function. Instead the
function saves away the pointer you pass in and initializes it later
when you call a different function.)
- In many cases, the output parameters are optional (e.g., if you
pass in "NULL" as the location where the function should write the
data, the function will not bother writing the parameter and save some
computation time).

> [...]
> > I don't think that our low-level API should automatically, for
> > example, convert indirect pointers into return values, because this
> > isn't a safe conversion in general, and it's especially confusing for
> > users when we do the conversion automatically but wrong. I'd rather do
> > this conversion manually.
>
> Do you have an example of when you think the conversion isn't safe?

Here's an example case:

   svn_stream_t *svn_stream_checksummed(svn_stream_t *stream,
                                     const unsigned char **read_digest,
                                     const unsigned char **write_digest,
                                     svn_boolean_t read_all,
                                     apr_pool_t *pool);

The documentation for svn_stream_checksummed explains that this
function creates a stream. If you pass in a non-NULL value for
read_digest and/or write_digest, the stream will continuously update
the read_digest and write_digest values as you read and write to the
stream. When you finally close the stream using svn_stream_close, the
digest parameters will be complete.

How do we deal with the read_digest and write_digest parameters?

It's possible that we could update this function to return placeholder
pointers which are updated later, but it would be difficult to make
this creation of placeholder pointers optional. In any case, I don't
think that we will be able to find an easy translation rule for this
type of function that users can understand intuitively in their heads.
I think it would be better if the low-level API sticks as close as it
can to the C API.

> >> > - Our SWIG bindings convert APR arrays and APR hashes into Python
> >> > arrays and Python hashes, if you write a whole bunch of boilerplate
> >> > typemap code.
> >>
> >> I guess that depends on how well you manage to wrap APR data
> >> structures in
> >> Python. My own personal preference is to provide a wrapper class built
> >> over
> >> the C-level APR functions, and making the class fully duckable with
> >> regular
> >> Python lists and maps (as much as possible). This lets the users write
> >> natural
> >> Python code to manage APR containers, and also allow maximum performance
> >> because data structures must not be converted every time between
> >> Python and APR.
> >
> > Our higher-level Python bindings will certainly allow users to use
> > friendly Python lists and dictionaries, and will convert these Python
> > datatypes automatically into the appropriate APR datastructures.
> >
> > It's tempting for me to consider providing some automatic type
> > conversion in the low-level ctypes bindings, as well, but I'd like to
> > stay away from this: I think that any type conversions in our
> > low-level bindings should only happen when explicitly requested, so
> > that we don't make our Python bindings too "magical" and confusing,
> > like our SWIG bindings.
>
> OK, I agree with that. But what about providing a higher-level APR data
> structure Pythonic wrapper, which is duck-typeable to standard Python types,
> and thus can be directly used without going through conversion functions each
> time?
>
> > I'd like to stay away from automatic type conversions, and instead
> > convert the types manually in the higher-level layer by calling the
> > appropriate conversion functions.
>
> If you create Pythonic APR data structures, you don't need to convert them at
> all. Conversion functions are still useful for people passing in Python data
> types (eg: literals), but getting an APR data structure, modifying it, and
> passing it back the C level is much more efficient.

I think that this would be a nice feature. We should put it on the
TODO list for the ctypes bindings.

Cheers,

David

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Apr 7 06:57:58 2007

This message: [ Message body ]
Next message: Joe Swatosh: "Re: make_gem.rb"
Previous message: Kouhei Sutou: "Re: make_gem.rb"
In reply to: Giovanni Bajo: "Re: Proof of concept higher-level python bindings for SoC project"
Next in thread: Giovanni Bajo: "Re: Proof of concept higher-level python bindings for SoC project"
Reply: Giovanni Bajo: "Re: Proof of concept higher-level python bindings for SoC project"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]