Re: ctypes + Subversion + a few high level python modules = really great python bindings

From: David James <james_at_cs.toronto.edu>
Date: 2007-04-10 08:01:50 CEST

I've tried to respond to everyone's questions in a single post, so
please prepare for a long email :)

On 4/9/07, Hyrum K. Wright <hyrum_wright@mail.utexas.edu> wrote:
> > I think an interface like this is great: it's close to what the user
> > would do if they were doing the same thing themselves on the command
> > line as opposed to in code.
>
> I also think that such an interface would be great. I don't feel
> qualified to offer an opinion on whether or not to implement it using
> ctypes, but however we implement it, I *strongly* encourage a consistent
> interface across language bindings. This may mean a more thorough
> design discussion on this list before we finalize a generic high-level
> bindings interface.
>
> FWIW, I've been toying around with some C++ bindings for the last few
> weeks, and I think the object model used for the higher-level Python
> bindings would, or should, easily translate to other object-oriented
> languages. It would also make writing some generic cross-layer
> bindings documentation a reasonable goal.

+1. As soon as I have it ready, I would be happy to post the
documentation for my new classes to the list for review. Would you be
willing to think up some use cases? What kind of stuff would you like
to do with the Python bindings?

At present I am mainly familiar with Trac and MUCC and therefore I
expect my design will focus on the needs of these applications.
However if others on the list can explain their use cases I will take
them into consideration.

I am planning to write a complete SVN shell in Python which exposes
all of Subversion's functionality including performing multiple
operations in a single commit. As such my main focus is on abstracting
the Client, RA, Repos and FS layers.

So far, I have only designed four classes. I'll explain my design below.

- Client: Represents a client which can open sessions to a
                particular repository. You can't actually do much to
                a repository directly with this class, besides open
                a session.

- Session: Encapsulates an RA session. Using this class, you
                can perform any read-only operation on a Subversion
                repository, including reading files and mining the
                history. If you want to actually modify the repository
                you will need to open a transaction using the session's
                txn() method.

- Transaction: Encapsulates a single atomic commit. This class will
                record client-level operations (e.g. mv, cp, mkdir, put)
                in a local cache. When you execute the commit method,
                the batch of operations will be sent directly to the
                server.

- Repository: Represents a client which accesses the repository
                directly via the repos and fs layers. This class may
                allow you to perform some administrative actions
                which cannot be performed remotely (e.g. create
                repositories, dump repositories, etc.).

The repository class is almost a mixin of the client and session
classes. It supports everything that the client and session classes
support, plus more. You can even open a transaction using the
Repository class. However, the repository class can only access
repositories directly.

The client class might offer convenience methods which allow you to
perform simple client operations (e.g. ls, mkdir, cp) without opening
a session, but I'm not sure that these methods would be particularly
useful, since you can do the same things just as easily using the
session and transaction classes.

I haven't thought about the WC layer yet. It might be a good idea to
double-purpose the Client class so that it can also connect to a WC
and perform similar transactions, but I haven't thought through this
carefully.

I'm personally planning to work in Python, but I agree it would not be
terribly hard to convert my work into C++ so that folks who use other
languages can benefit without duplicating code. Perhaps that will be a
TODO for the future.

On 4/9/07, Blair Zajac <blair@orcaware.com> wrote:
> Are you basing the high level API on the Ruby API? I recall somebody
> stating that the Ruby high level API is very nicely designed. If
> we're going to do a rewrite of the Python API, why don't we model it
> after the Ruby one?

The SWIG/Ruby API is very good, but I think that it sticks too closely
to the C API, and doesn't abstract enough of the details. If you want
to, for example, create a directory on a remote server using the RA
API, you still need to drive the commit editor manually. I think that
our object model should abstract away those details so that folks can
think in terms of client-level operations instead of delta editors.

On 4/9/07, Blair Zajac <blair@orcaware.com> wrote:
> On Apr 9, 2007, at 6:46 PM, David James wrote:
> > If you're familiar with our Python SWIG bindings, you'll know that our
> > low-level bindings are, unfortunately, full of bugs. If you try to
> > write a simple Python program which uses the RA layer, you'll find,
> > unfortunately, that your program may crash with little explanation
> > as to why. As a developer who loves working with Python, I find
> > these unexplained crashes very frustrating.
> >
> > These crashes in our SWIG bindings are usually caused by bugs in our
> > SWIG typemaps, so they shouldn't be blamed on SWIG itself.
> > Fortunately, if we switch from SWIG to ctypes, we won't have to
> > spend any more time writing or debugging typemaps anymore!
>
> Are these bugs in the typemaps specific to the Python SWIG typemaps
> or for all SWIG bindings? I'd rather not see effort split between a
> ctypes build and a SWIG bindings that Perl and Ruby depend upon.

It's a bit of a long story. Unfortunately, Subversion's SWIG bindings
weren't designed very carefully or well. I'm not knocking SWIG itself
-- it's a useful tool -- but by misusing SWIG in Subversion we've
basically shot ourselves in the foot.

Our SWIG bindings attempt to solve a hard problem: in general, two-way
conversions between Subversion datatypes and Python datatypes. We try
to map every possible Subversion datatype one-to-one to a native
Python datatype, and, some of these mappings make assumptions about
Subversion's behaviour that aren't always true.

In each successive version of Subversion, we tried to write more
thorough SWIG bindings, which wrapped our Subversion/C API's more
completely, but this transition required a lot of manual work due to
some showstopper deficiencies in SWIG:
1) SWIG does not automatically wrap arguments to callback
functions. Instead, you must create a thunk function which
manually converts arguments between Python and C using SWIG's APIs.
2) SWIG does not automatically wrap pointers which are contained
inside arrays or hashes. Instead, you must write a typemap which
manually converts these pointers between Python and C using SWIG's
APIs.

We already have written a large number of conversion functions which
accomplish (1) and (2), but they're really not very much fun to write.
It involves a lot of careful work and error checking.

The Ruby bindings perhaps suffer less from typemap bugs because Kouhei
has maintained them very carefully. The Python typemaps have not
received as much attention.

Once the SWIG typemaps are complete, we also suffer from another
problem: the generated SWIG bindings are undocumented. If you read
through the Subversion include files, you can understand a great deal
about how the Subversion bindings work, but your understanding will
not be complete until you understand the SWIG bindings as well.

When I write functions which use the Python bindings, I often consult
the source code of the SWIG interface files to see how the Python
datatypes will be converted into Subversion datatypes, so as to make
sure that I am providing the correct arguments. We aren't completely
consistent about how we convert datatypes between Python and C, so you
often may have to consult the source code to make sure that it behaves
as you expect.

ctypes is much simpler than SWIG, and supports many key features that
SWIG does not. I ran our Subversion include files through the ctypes
code generator, and it generated a complete low-level API for
Subversion in Python. ctypes handles everything that SWIG doesn't
handle: callbacks, composite datatypes, vtable invokers, reflection,
etc.

ctypes is also much easier to develop in. I like to use Windows, but I
don't have a Subversion development environment setup here. If I want
to play with the ctypes bindings on my Windows machine, all I have to
download is Python 2.5 and Subversion 1.4.1, plus my ctypes bindings.
I don't need a compiler or anything. I can just edit the high-level
Python code to my liking, and if I find a bug in a low-level Python
binding I can fix it, without ever leaving my Python editor, or
touching a C compiler.

On 4/9/07, Blair Zajac <blair@orcaware.com> wrote:
> I'm really -0 to -1 on this unless we determine how this impacts the
> other bindings. I feel we'll be splitting our energy on different
> bindings now. If there's work that can be done to the SWIG bindings
> that will help Perl and Ruby that would also help Python, I'd rather
> see energy go in that direction.

Unfortunately, I don't think there's anything that can be done to save
the Python SWIG bindings. Compared to our ctypes bindings, our
SWIG/Python bindings are a lost cause.

Put simply, I don't see any reason to waste time writing pages and
pages of low-level wrapper code for Python if ctypes will generate
said code automatically.

Unfortunately, unlike ctypes, SWIG does not generate wrappers for
callbacks and this is a showstopper for the Python bindings. Perhaps
SWIG can be extended to support this feature but until it does it will
be a major handicap for Subversion's SWIG bindings.

I don't think that we will be hurting the Ruby or Perl bindings by
converting our Python bindings to ctypes. In fact, I think it will
help our Ruby and Perl bindings. Here's why:

  - After we convert our Python bindings to ctypes, we won't have to
    spend so much time focusing on fixing little bugs in the low-level
    bindings. Therefore, our Python developers can focus on building
    really great high-level bindings.

  - After taking a look at the great new Python bindings, our Ruby and
    Perl developers will borrow all of the cool ideas, and make the Ruby
    and Perl bindings great as well.

> What Python versions does ctypes support? I understand it comes with
> Python 2.5, but what's the oldest it supports.

According to the ctypes homepage, ctypes supports Python 2.3 and
later. See http://python.net/crew/theller/ctypes/

> If ctypes is as great as it sounds, a fun project would be to port it
> to Ruby and Perl :)

Perl has Inline::C and Ruby has DL. I don't know if they are as good
as ctypes. I saw this post on the users@ list:
http://svn.haxx.se/users/archive-2005-06/0573.shtml

I don't think that Perl or Ruby need to switch away from SWIG in order
to benefit from our higher-level object model. I only wish to upgrade
our Python bindings to use ctypes because ctypes is incredibly good
and makes everything easy in the Python bindings. :)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 10 08:02:05 2007

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]