Re: Merging Ctypes Python Branch

From: David James <james_at_cs.toronto.edu>
Date: 2007-09-01 23:00:27 CEST

In this mail, I've attempted to answer all the questions posed in the
thread so far. So, it's a long mail -- feel free to skip down to the
parts you find interesting :)

On 8/31/07, Eric Gillespie <epg@pretzelnet.org> wrote:
> I've finished reviewing the high-level interface. A lot of these
> things are probably on someone's TODO list already.
> Nevertheless, I think this needs more work before merging onto
> trunk, especially now that we're starting to firm up the 1.5
> release schedule.

Agreed.

> - Style nits:

+1 on fixing all the style nits you mentioned. I'll try to get around
to this soon and fix everything in one big style commit.

> - WC._build_path_list
>
> This sort of magic is generally a bad idea. It isn't at all
> easier for callers to pass in "file" than ["file"], but
> allowing both can lead to hidden bugs. I looked to see if
> other functions taking path lists behave this way, but it seems
> to be limited to wc.py.

Currently, users can type any of the following:
  - wc.update("foo")
  - wc.update(["foo", "bar"])
  - wc.update(("foo", "bar", "baz"))

You think that we should only allow the last two forms? Thinking about
this carefully, I guess I agree -- it'll probably be easier to explain
the API if we don't have to explain those special cases.

> Not all methods use _build_path_list; lock and unlock, for
> example, just take a list.

That's a bug. Our API should be consistent.

> - WC.pool and WC.iterpool
>
> We use the name "iterpool" for pools we clear on each iteration
> of a loop. I'm not sure of its usage here. Some methods use
> it, others use self.pool (which is never cleared). Some
> methods that use self.iterpool clear it, some don't. For those
> that clear it, I wonder about the lifetime of objects returned.
>
> Specifically, I was thinking about commit_info. I see that
> WC.commit does not clear self.iterpool; is that so the returned
> commit_info will still work? It's no good to have these
> methods vary so.

self.iterpool is intended to be used for temporary allocations only.
If you use iterpool in a public method, you should clear the pool
before returning from the method.

It looks like wc.commit allocates commit_info from iterpool. This is
bad, because, as you suspect, the commit info will be invalidated as
soon as the iterpool is cleared.

Looking around the bindings, it seems like many other functions suffer
from this problem.

To solve the problem, we should either allocate return values in their
own, dedicated pool, or -- even better -- convert the return values
into true python datastructures which don't require pools at all.

> I'm not sure we can do anything to save quick scripts from pool
> management except allocating everything in one pool that we
> never clear. I've always assumed that's what we do in swig,
> though I'm not sure, as I manage pools myself.

Ideally, users who use high-level ctypes Python bindings should never
have to worry about pool management.

To isolate users from pool management, I've setup the following system:

  1. Large, long-lived C objects (such as repository objects) should be
     allocated in their own dedicated memory pool. Any private helper
     objects which share the same lifetime as the connection should also
     be allocated out of the same pool. When Python determines that
     the object should be deleted, we should destroy the entire memory
     pool.

  2. Temporary objects should be allocated in an iterpool. When
     our functions exit, we should clear the iterpool, so as to clear
     out the temporary objects.

  3. Typical Python objects (such as strings, lists and dictionaries)
     shouldn't be allocated in pools at all. They should just be regular
     Python objects. If Subversion returns an APR array or an APR hash,
     for example, our higher level API should convert these arrays and
     hashes into Python lists and dictionaries which are allocated
     in Python memory.

Does the above system make sense?

> - I don't understand this:
>
> # As of ctypes 1.0, ctypes does not support custom error-checking
> # functions on callbacks, nor does it support custom datatypes on
> # callbacks, so ctypesgen sets the return values for callbacks
> # to c_void_p.
> #
> # Callback functions, therefore, won't have their errors checked
> # automatically. Users who call these functions should wrap
> # their function call using SVN_ERR so as to ensure that any
> # exceptions are thrown appropriately.
>
> Does it mean that I must wrap all svn calls I make inside my
> progress_func callback with SVN_ERR()? Or does it mean I must
> wrap all calls I make to an svn function takes a callback in
> SVN_ERR()? Also, this needs to be in user-facing documentation
> somewhere, not buried in a comment down here.

No, that's not it. Let me explain.

If you call a regular Subversion function from Python, and an error is
reported, Python will throw an appropriate exception. This is great --
nice and Pythonic.

If you invoke a low-level function pointer, and an error is reported,
the function will simply return an error code. If you want to convert
this error code into a Python exception, you'll need to wrap the
function pointer invocation with SVN_ERR. This is unfortunately a
limitation of ctypes.

To see a list of the low-level function pointers accessible via the
ctypes python bindings, grep functions.py for CFUNCTYPE. You'll note
that all of the return values of these functions are marked as
UNCHECKED because ctypes doesn't support any error checks on function
pointer invocations.

Yes, this does need to be documented somewhere more obvious. In the
README? We don't really have a getting started guide yet but that's
probably where it would belong.

> - Why the staticmethod on the callback wrappers? That shouldn't
> be necessary. If you drop that, you'll get self as the first
> argument as normal, and can ignore the baton entirely. Batons
> are just a way to fake closures in C; Python code should never
> have to bother with them.

Very cool. Nice suggestion! I didn't think that ctypes would handle
instance methods correctly, but it happily turns out I'm wrong.

> - cancel_func
>
> How is this supposed to work? It seems to me that for the
> high-level side, my callback should do something like 'raise
> csvn.CancelError("Caught signal")' and the high-level wrapper
> should catch that and return an appropriate svn_error_t. The
> current high-level cancel wrapper doesn't do anything like
> that, but it also ignores my callback's return value, so I
> can't even do it the low-level way. I don't think these work
> right now.

+1 on fixing both bugs. I don't care whether users return an error or
raise an error -- whatever you think is best is fine.

> For details of apr_off_t objects, see the apr_off_t
> documentation."""
>
> Cute ;->. The high-level cb wrapper should translate to normal
> Python longs before calling the real cb.

> WC.checkout should return result_rev

> - RepositoryURI.__init__
>
> More of the magic with the uri argument, this time with an
> encoded parameter so callers can work around it. As with
> character encoding, the barriers should be at clearly defined
> places (usually at end-uesr interaction), not at random layers.
> Just require a URI-encoded argument.

We already require a URI-encoded argument by default. +1 on removing
the auto-encode feature, as it currently isn't used anywhere.

> I'm not sure this class is a good idea in the first place,
> though. It seems to me that a high-level, Pythonic interface
> should integrate well with standard Python modules and idioms,
> not export the bits of svn and apr that Python already has.
> It's the same reason the high level interface should take
> lists, strs, dicts, and so on as opposed to ctyped apr
> equivalents. In this case, I want to use urlparse.

+1 on the concept here. I mainly wrote the RepositoryURI class for the
benefit of mucc.py. Is there an equivalent to
svn_path_get_longest_ancestor in urlparse?

> - RemoteRepository -- well, it's not always remote. I suggest
> just Repository and LocalRepository. Furthermore, the Local
> version is missing some things, such as the import method.
> LocalRepository should be a superset of Repository. I suggest
> making Local a subclass of Repository; it can just prepend
> file:/// and let the superclass methods do all the work.

+1.

> - Rename RemoteRepository.log's verbose argument to
> discover_changed_paths. Same for _LogMessageReceiver. Also,
> _LogMessageReceiver.receiver probably ought to set
> entry.changed_paths based on whether the changed_paths hash is
> NULL or not.

+1.

> - Are you planning to use CallbackReceiver to turn all functions
> which call receiver callbacks into generators?

Sure. We could upgrade wc.info to be a generator, so that users can
iterate over all of the different elements in the WC using a for loop.

Any other candidates?

> - RemoteRepository.revprop_get should use get_dir or get_file.

Why should we use get_dir or get_file instead of revprop_list?

> - RemoteRepository._log_func_wrapper is duplicated in wc.py

Oops. We should fix that and only provide one implementation
(especially since the version in wc.py is now outdated.)

> - LocalRepository.__init__
>
> A separate create constructor would be better than having a
> create argument here.

To be clear, you think we should create a static method like
'LocalRepository.create' instead, which returns a repos object?

You would invoke it like this:
repos = LocalRepository.create("some/path")

How would the static method tell __init__ that we want to create a
repository? The current boolean that's already in the __init__ method
would suffice for the job -- or do you think we should use some other
method of communication?

> - LocalRepository vs. RemoteRepository
>
> [...]
>
> Yeah, I know, svn is awful about this, but no reason to
> propagate the mess into the high-level layer. I vote for
> revprop and verb_noun.

> - inconsistent assert
>
> IMO, just drop all the asserts. If someone fails to provide a
> working User instance, for example, that's their problem, and
> they'll get appropriate exceptions. It's no different from
> passing a bool as get_rev_prop's name argument.
>
> If we're going to go with the asserts, though, we should use
> them everywhere. For example, LocalRepository.txn asserts
> self.user but _get_commit_editor does not.

Personally I find the assertions helpful because they point out that
it's user error rather than a problem in the bindings. These
assertions are especially helpful when they're adorned with a comment
explaining the mistake the user made.

I don't think we need to have full coverage for the assertions, but I
do agree it's a good idea to be consistent.

_get_commit_editor is a private function so it doesn't need the
assertion. It's already protected via LocalRepository.txn.

> - _get_commit_editor (and others) have comments that should
> become docstrings.

I typically don't write docstrings for private functions. Should I?

> - _fs and _fs_root can lose the NOTE about being private; the
> leading _ already indicates that.
>
> - _txn_operation.__init__ should use SVN_INVALID_REVNUM instead
> of -1 for copyfrom_rev default argument.

> _txn_operation had me scratching my head for a while, but I
> think I understand now. So, the Txn class is all about
> building up a transaction in client memory and only driving the
> editor when the caller commits? Why?
>
> I'm not saying it's never a good idea, but it seems odd for
> this to be the default mode of operation, much less the only
> mode.

This code is borrowed from mucc.c. We build the transaction up first
before driving the editor so that we can make sure that all of the
editor operations happen in the right order.

If users want to drive the editor manually, they can use the low-level
interface.

> Also, the open method isn't open at all; rename it
> (add_op? queue_op? dunno).

+1 on queue_operation

> - Most of the Txn methods do a check_path first, which is really
> going to hurt performance, but is redundant besides. If I try
> to mkdir an existing path, I'll get an error. SVN_ERR_BAD_URL
> is the wrong code anyway; it should be SVN_ERR_FS_NOT_FOUND.
> Regarding "No such file or directory", I think we normally just
> say "path not found".

We could possibly get rid of the check_path calls in 'delete',
'mkdir', and 'propset'. It looks like we'll still need at least one
check_path call in copy and _upload_file.

> - Txn check_path and copy use peg rev syntax in their doc
> strings, which is not right; I think they should say "PATH in
> REV" instead, to reduce confusion.

Personally, I think that PATH@REV is more precise than PATH in REV. If
we don't specify the peg rev, folks might think we're talking about
PATH@HEAD in REV, since the peg rev defaults to HEAD

> - Txn.copy doesn't need an if for the path kind; both blocks are
> identical anyway.

Really? They look different to me.

> - Txn.commit -- the entire span when the edit is open needs to be
> wrapped in a try/except/abort, and for all exceptions. And
> since an exception is already in progress, the inner except
> should also trap all exceptions:

Should we trap all exceptions? I suppose it's a good idea to abort the
edit regardless of what exception happened, so +1.

> - SvnDate -- What I want in a high-level interface is easy
> integration with standard Python date/time modules, not
> svn_time_to_human_string.

+1, that's a TODO.

On 8/30/07, Eric Gillespie <epg@pretzelnet.org> wrote:
> remoterepos.RemoteRepositoryTestCase segfaults for me:

Thanks for the detailed bug report! Fixed in r26414.

On 8/31/07, Eric Gillespie <epg@pretzelnet.org> wrote:
> I have time to look at this again now. If I add the correct -L
> flag to your gcc command, it works. So, I think we need to merge
> my patch with your change: take the libdir option stuff from
> mine, drop my load_library change, and then your new find_library
> should make things work.

Thanks Eric. I committed this patch to ctypesgen in r38. I committed
your companion patch to the ctypes-python-bindings branch in r26415.

I made a few tweaks to your patch:

  * I updated get_apr_config to add appropriate -L flags for Subversion
    in case the Subversion lib directory is different from the APR lib
    directory, as it is on my system.

  * I removed the code which set the "LIBRARYPATH" variable, as it is
    no longer necessary, and has been replaced by your new, more
    portable, -L directives.

  * I conditionalized your new error checking code on
    os.name == "posix", because os.WIFEXITED are not available on
    Windows.

On 8/31/07, masaru tsuchiyama <m.tmatma@gmail.com> wrote:
> I tested it on Windows XP and got the following error.
> [...]
> raise Exception("Cannot find apr-1-config or apr-config. Please specify\n"
> Exception: Cannot find apr-1-config or apr-config. Please specify
> the full path to your apr-1-config or apr-config script
> using the --apr-config option.
>
> I think autogen.py should have the following options like as gen-make.py.
> --with-apr=DIR
> --with-apr-util=DIR
> --with-httpd=DIR
> [...]

We will definitely need to make a few changes to autogen.py to make it
work on Windows. I think that we will probably need to update more
than just the "--with-apr" option here though.

Do you happen to have a copy of some recent DLLs for Subversion that
work on Windows? I don't have Visual Studio on my machine but I might
be able to get the ctypes bindings working if I had a copy of the DLLs
at least.

> But If autogen.py will be merged to gen-make.gy when merging the
> branch to trunk, it may not be a problem.

That's a good idea. After we merge to trunk, I'm planning to update
gen-make.py to also tell the ctypes python bindings where APR and
APR-util live so that users don't need to configure this twice.

For now, though, I'd like to keep the ctypes python bindings as
self-contained as possible, so that I don't have to constantly merge
changes between trunk and the ctypes python bindings branch.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Sep 1 22:57:36 2007

This message: [ Message body ]
Next message: Hyrum K. Wright: "Re: [PATCH] Localize "Cannot revert.""
Previous message: Karl Fogel: "Re: Let's set a date/goal for the 1.5 Branch"
In reply to: Eric Gillespie: "Re: Merging Ctypes Python Branch"
Next in thread: masaru tsuchiyama: "Re: Merging Ctypes Python Branch"
Reply: masaru tsuchiyama: "Re: Merging Ctypes Python Branch"
Reply: Eric Gillespie: "Re: Merging Ctypes Python Branch"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]