Re: Merging Ctypes Python Branch

From: Eric Gillespie <epg_at_pretzelnet.org>
Date: 2007-09-04 23:33:57 CEST

"David James" <james@cs.toronto.edu> writes:

> Currently, users can type any of the following:
> - wc.update("foo")
> - wc.update(["foo", "bar"])
> - wc.update(("foo", "bar", "baz"))
>
> You think that we should only allow the last two forms? Thinking about
> this carefully, I guess I agree -- it'll probably be easier to explain
> the API if we don't have to explain those special cases.

Yeah, the others are just needless permutations, not only
confusing, but more importantly, hiding bugs. I see that Sage
disagrees, so I'm not sure where you guys will end up taking
this. Hard experience has taught me that magic like this only
seems nice in the short term, and in the long term will bite
someone in the ass. And for what? So you can avoid typing
brackets, or avoid thinking about 1-element lists? Bah.

Finally, I just don't think it's Pythonic: "Explicit is better
than implicit" and "There should be one-- and preferably only one
--obvious way to do it". All the ass-biting magic I've seen over
the years has been in Perl. Well, almost; Python has one
ass-biting misfeature: the str % operator optionally taking
non-iterables: '%s' % foo vs. '%s' % (foo,). Works just fine
until foo is a sequence, and note that strings are sequences...

> self.iterpool is intended to be used for temporary allocations only.
> If you use iterpool in a public method, you should clear the pool
> before returning from the method.

It shouldn't be called "iterpool" unless iteration is involved.

> To solve the problem, we should either allocate return values in their
> own, dedicated pool, or -- even better -- convert the return values
> into true python datastructures which don't require pools at all.

That might work. I'm still skeptical that all but the simplest
scripts can avoid pool management. Not that we shouldn't offer
what freedom from pools we can; there are quick scripts that just
won't care.

> 2. Temporary objects should be allocated in an iterpool. When
> our functions exit, we should clear the iterpool, so as to clear
> out the temporary objects.

As long as we don't end up with some methods clearing the pool
upon return and others not.

Bleh, I just thought of another issue: exceptions. If I'm
passing my own (possibly iter)pool into methods, I can clear it
or take the appropriate action when I get an exception. If I
have no control over pools, then I might just start leaking.
That's why I'm skeptical; I just don't think all the issues are
really going to be obvious up front.

> If you invoke a low-level function pointer, and an error is reported,
> the function will simply return an error code. If you want to convert
> this error code into a Python exception, you'll need to wrap the
> function pointer invocation with SVN_ERR. This is unfortunately a
> limitation of ctypes.

I understand this, or at least I think I do. I don't understand
what it has to do with callbacks, which is what the comment is
talking about.

> To see a list of the low-level function pointers accessible via the
> ctypes python bindings, grep functions.py for CFUNCTYPE. You'll note
> that all of the return values of these functions are marked as
> UNCHECKED because ctypes doesn't support any error checks on function
> pointer invocations.

They all seem to be callback types, which are called by svn, not
by my Python app.

> +1 on fixing both bugs. I don't care whether users return an error or
> raise an error -- whatever you think is best is fine.

For the high-level, I should be raising an exception. I have a
partly working patch to generate separate exception classes (and
a hierarchy) from the SVN_* error codes. Then I'll be able to
raise csvn.errors.Cancel("Caught signal").

> +1 on the concept here. I mainly wrote the RepositoryURI class for the
> benefit of mucc.py. Is there an equivalent to
> svn_path_get_longest_ancestor in urlparse?

posixpath.commonprefix

> > - Are you planning to use CallbackReceiver to turn all functions
> > which call receiver callbacks into generators?
>
> Sure. We could upgrade wc.info to be a generator, so that users can
> iterate over all of the different elements in the WC using a for loop.

I think log, info, and status are the ones.

> > - RemoteRepository.revprop_get should use get_dir or get_file.
>
> Why should we use get_dir or get_file instead of revprop_list?

Heh, I guess I was more tired than I thought when I was writing
some of this. I meant that it should use svn_ra_rev_prop to get
only the request revprop rather than fetching them all.

> > - LocalRepository.__init__
> >
> > A separate create constructor would be better than having a
> > create argument here.
>
> To be clear, you think we should create a static method like
> 'LocalRepository.create' instead, which returns a repos object?
>
> You would invoke it like this:
> repos = LocalRepository.create("some/path")
>
> How would the static method tell __init__ that we want to create a
> repository? The current boolean that's already in the __init__ method
> would suffice for the job -- or do you think we should use some other
> method of communication?

Yeah, it would need a bit of reworking in order to dump the
create argument, and opinions differ whether it's worth it.
Sometimes separate named constructors are provided, .open and
.create, with the documentation saying not to use cls() at all,
but I prefer lazy instantiation, which also makes this easier:

Index: repos.py
===================================================================
--- repos.py (revision 26445)
+++ repos.py (working copy)
@@ -344,7 +344,7 @@
        and implement the allow_access function.
     """

- def __init__(self, path, create=False, user=User()):
+ def __init__(self, path, user=None):
         """Open the repository at PATH. If create is True,
            create a new repository.

@@ -353,14 +353,26 @@
         self.pool = Pool()
         self.iterpool = Pool()
         self._as_parameter_ = POINTER(svn_repos_t)()
- self.user = user
- if create:
- svn_repos_create(byref(self._as_parameter_), path,
- None, None, None, None, self.pool)
+ if self.user is None:
+ self.user = User()
         else:
- svn_repos_open(byref(self._as_parameter_), path, self.pool)
- self.fs = _fs(self)
+ self.user = user

+ @classmethod
+ def create(cls, path, user=None):
+ self = cls(path, user)
+ self._fs = svn_repos_create(byref(self._as_parameter_), self.path,
+ None, None, None, None, self.pool)
+ return self
+
+ _fs = None
+ def _get_fs(self):
+ if self._fs is None:
+ self._fs = svn_repos_open(byref(self._as_parameter_), self.path,
+ self.pool)
+ return self._fs
+ fs = property(_get_fs)
+
     def latest_revnum(self):
         """Get the latest revision in the repository"""
         return self.fs.latest_revnum()

Another common approach is to provide an open constructor as
well, with callers never meant to use cls().

Note the user= default argument fix, too; user=User() means all
instances share the same default object.

> > - inconsistent assert
> >
> > IMO, just drop all the asserts. If someone fails to provide a
> > working User instance, for example, that's their problem, and
> > they'll get appropriate exceptions. It's no different from
> > passing a bool as get_rev_prop's name argument.
> >
> > If we're going to go with the asserts, though, we should use
> > them everywhere. For example, LocalRepository.txn asserts
> > self.user but _get_commit_editor does not.
>
> Personally I find the assertions helpful because they point out that
> it's user error rather than a problem in the bindings. These
> assertions are especially helpful when they're adorned with a comment
> explaining the mistake the user made.

So how is this any different than passing in a bool as
get_rev_prop's name argument? He's a poor programmer who assumes
that an exception in his program is someone else's fault ;->.

> _get_commit_editor is a private function so it doesn't need the
> assertion. It's already protected via LocalRepository.txn.

I see now; I missed it earlier.

> > - _get_commit_editor (and others) have comments that should
> > become docstrings.
>
> I typically don't write docstrings for private functions. Should I?

I'm sure opinions differ. Documentation generators know to
separate public interfaces from private (those with a leading _
in the name). I'm with those who say that doc strings are useful
not only to consumers but also implementors, and doc strings on
the private interfaces can help me as I work on the class itself.
Not to mention help code reviewers ;->.

> This code is borrowed from mucc.c. We build the transaction up first
> before driving the editor so that we can make sure that all of the
> editor operations happen in the right order.

Ahh, so that's the purpose. You want to let callers not have to
worry about ordering. Hm, I'm not sure. We can disregard the
mucc program in thinking about the right thing to do here, as it
can sort the operations itself before driving the editor, as I
always assumed it did.

> If users want to drive the editor manually, they can use the low-level
> interface.

If we go this route, then we need a medium-level interface.
Currently, the choice is between the constraining high-level
interface and the insane, worse-than-swig ctypes interface, with
all the POINTER stuff that I don't yet understand.

> > - Txn check_path and copy use peg rev syntax in their doc
> > strings, which is not right; I think they should say "PATH in
> > REV" instead, to reduce confusion.
>
> Personally, I think that PATH@REV is more precise than PATH in REV. If
> we don't specify the peg rev, folks might think we're talking about
> PATH@HEAD in REV, since the peg rev defaults to HEAD

If you want to be completely precise, say "PATH@REV in REV";
currently, it's saying "PATH@REV in HEAD", which is what I found
confusing.

> > - Txn.copy doesn't need an if for the path kind; both blocks are
> > identical anyway.
>
> Really? They look different to me.

Yeah, I missed the local_path=local_path part. See earlier
comment about being tired ;->.

> > - Txn.commit -- the entire span when the edit is open needs to be
> > wrapped in a try/except/abort, and for all exceptions. And
> > since an exception is already in progress, the inner except
> > should also trap all exceptions:
>
> Should we trap all exceptions? I suppose it's a good idea to abort the
> edit regardless of what exception happened, so +1.

I mean trap, abort, and re-raise.

> On 8/30/07, Eric Gillespie <epg@pretzelnet.org> wrote:
> > remoterepos.RemoteRepositoryTestCase segfaults for me:
>
> Thanks for the detailed bug report! Fixed in r26414.

I haven't looked at the core dump (probably won't this week), but
it still segfaults:

0 ctypes-python-bindings% python test/run_all.py -v
test_local_check_path (localrepos.LocalRepositoryTestCase) ... ok
test_local_get_prop (localrepos.LocalRepositoryTestCase) ... ok
test_local_latest_revnum (localrepos.LocalRepositoryTestCase) ... ok
test_remote_check_path (remoterepos.RemoteRepositoryTestCase) ... ok
test_remote_latest_revnum (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_get (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_list (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_set (remoterepos.RemoteRepositoryTestCase) ... ok
test_svnimport (remoterepos.RemoteRepositoryTestCase) ... zsh: segmentation fault (core dumped) python test/run_all.py -v

> * I conditionalized your new error checking code on
> os.name == "posix", because os.WIFEXITED are not available on
> Windows.

Hm, so it's still ignoring errors and printing "Generated OK" on
Windows. Maybe some Windows person can chime in with the correct
equivalent of the W* macros on Windows. At the very least, you
should print out the status code and exit non-zero, not continue
on to print the "OK" message.

> For now, though, I'd like to keep the ctypes python bindings as
> self-contained as possible, so that I don't have to constantly merge
> changes between trunk and the ctypes python bindings branch.

BTW, it may not sound like it from my posts so far, but it's
*because* I'm so excited by Sage's and your work that I'm looking
into this in such detail :). And I don't feel strongly about
most of these, they're just opinions.

Thanks.

--
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Sep 4 23:30:54 2007

This message: [ Message body ]
Next message: Eric Gillespie: "Re: Let's set a date/goal for the 1.5 Branch"
Previous message: Matt Bodley: "patch SSL certificate warning"
In reply to: David James: "Re: Merging Ctypes Python Branch"
Next in thread: Ben Collins-Sussman: "Re: Merging Ctypes Python Branch"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]