[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Mon, 14 Jan 2019 06:10:43 GMT

In article <CAEZNtZ+SKqiATLwDvzyVf=SULPGhT7usxhZEawNfextPWsbEag_at_mail.gmail.com>
troycurtisjr_at_gmail.com writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <futatuki_at_yf.bsdclub.org> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>> typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>> - It isn't allow Unicode for input, however 's' and 'z' format units
>> PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>> it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>> add code to conversion for py2)
>
>
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode! The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library. The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
>
> PyBytes: Sequence of byte values, e.g. "raw data"
> In Py2: str
> In Py3: bytes
>
> PyStr: Character data
> In Py2: Unicode
> In Py3: str

Unfortunately, PyStr in py3c compatibility layer API is the intersection
of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
use PyUnicode_* for handling Unicode in py2.

 
>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>> but current implementation of conversion function doesn't allows. Isn't
>> it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>> hash value)
>
>
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.

Then I'll fix it. Does it need to separate typemap that allow NULL
as hash value and dones not allow?
 
>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>> hash value)
apr_array_header_t is't hash, so this was '(for values)' :)

 
> And here is the start of my patch review:
>
> Don't forget to put a brief description of the over overall change in
> the log message here. And as this is such large change, a detail
> section is probably also in order as well.
 
Yes, I see.

> > [In subversion/bindings/swig]
> >
> > * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3,
> > allow Str as well for data argment of svn_stream_write()
> >
> > * include/svn_global.swg
> > (remove)(%typemap(in) char *, char const *, char * const,
> > char const * const): Move this typemap into include/svn_strings as
> > typemap (in) IN_STRING
> >
> [snip]
> >
> > diff --git a/subversion/bindings/swig/core.i b/subversion/bindings/swig/core.i
> > index c309d48070..6993ef4c9c 100644
> > --- a/subversion/bindings/swig/core.i
> > +++ b/subversion/bindings/swig/core.i
> > @@ -442,18 +442,31 @@
> > */
> > #ifdef SWIGPYTHON
> > %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> > - char *tmpdata;
> > Py_ssize_t length;
> > - if (!PyBytes_Check($input)) {
> > - PyErr_SetString(PyExc_TypeError,
> > - "expecting a bytes object for the buffer");
> > - SWIG_fail;
> > + if (PyBytes_Check($input)) {
> > + if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) {
> > + SWIG_fail;
> > + }
> > + }
> > +%#if IS_PY3
> > + else if (PyStr_Check($input)) {
> > + $1 = PyStr_AsUTF8AndSize($input, &length);
> > + if (PyErr_Occurred()) {
> > + SWIG_fail;
> > + }
> > }
> > - if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
> > +%#endif
>
> Don't use the IS_PY3 conditional here, and you get Unicode (on Py2)
> conversion automatically.

I'll try to use Unicode_* API here for both of py2 and py3.
 
> > + else {
> > + PyErr_SetString(PyExc_TypeError,
> > +%#if IS_PY3
> > + "expecting a bytes or str object for the buffer"
> > +%#else
> > + "expecting a string for the buffer"
> > +%#endif
>
> I think it is valid to say "bytes or string" object for both Py2 and
> Py3, thus getting rid of another conditional.

I see.
 
> > + );
> > SWIG_fail;
> > }
> > temp = ($*2_type)length;
> > - $1 = tmpdata;
> > $2 = ($2_ltype)&temp;
> > }
> > #endif
>
> [snip]
>
> > diff --git a/subversion/bindings/swig/include/svn_string.swg b/subversion/bindings/swig/include/svn_string.swg
> > index 0fc64ebdcc..8be4c3d746 100644
> > --- a/subversion/bindings/swig/include/svn_string.swg
> > +++ b/subversion/bindings/swig/include/svn_string.swg
> > @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t;
> > }
> > #endif
> >
> > + /* -----------------------------------------------------------------------
> > + Type: char * (input)
> > +*/
> > +#ifdef SWIGPYTHON
> > +%typemap (in) IN_STRING
> > +{
> > + $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", "$1_name");
> > + if (PyErr_Occurred()) SWIG_fail;
> > +}
> > +
> > +%typemap (freearg) IN_STRING "";
>
> What does freearg here do? Looking at the docs, it seems this supposed
> to be used to free allocated resources once the wrapper function
> exits.

This is the replacement of %typemap(in, parse='s')
and %typemap(in, parse='z') which uses internal char * buffer of
Python object. This typemap also uses py object internal buffer,
so it doesn't need typemap for freearg. As far as I read the code
SWIG produed for %typemap(in, parse=...), it seems undefine
typemap(freearg) corresponding it. If `%typemap (freearg) IN_STRING "";'
doesn't exist here, pre-defined typemap for char * in swig library
becomes effective, then extra codes are produced.

 
> > +
> > +%apply IN_STRING {
> > + const char *,
> > + char *,
> > + char const *,
> > + char * const,
> > + char const * const
> > +};
> > +#endif
> > /* -----------------------------------------------------------------------
> > define a way to return a 'const char *'
> > */
> > diff --git a/subversion/bindings/swig/include/svn_types.swg b/subversion/bindings/swig/include/svn_types.swg
> > index 319f7daa6b..7c933b1ac7 100644
> > --- a/subversion/bindings/swig/include/svn_types.swg
> > +++ b/subversion/bindings/swig/include/svn_types.swg
> > @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
> > #ifdef SWIGPYTHON
> > %typemap(in) const char *MAY_BE_NULL
> > {
> > - if ($input == Py_None) {
> > - $1 = NULL;
> > - } else {
> > - $1 = PyBytes_AsString($input);
> > - if ($1 == NULL) SWIG_fail;
> > - }
> > + $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", "$1_name");
> > + if (PyErr_Occurred()) SWIG_fail;
> > }
> > #endif
> >
> > diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > index 8a4ec631be..58cfec30a3 100644
> > --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t *error_chain)
> >
> > /*** Helper/Conversion Routines ***/
> >
> > +/* Function to get char * representation of bytes/str object */
> > +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null,
> > + const char * funcsym, const char * argsym)
>
> I think we need to be very careful with the return value here, both
> the type and the lifetime. The trouble with using the value returned
> directly from the C api below is that they typically return a pointer
> to an internal buffer which must not be modified. Returning it as
> "char *" is probably not the right idea.

I see.

> Additionally, these are
> internal to the "input" object, so if something like:
>
> PyObject *foo = <create Py String>;
> char *my val = svn_swig_py_string_to_cstring(foo, ...);
> Py_DECREF(foo);
>
> svn_awesome_api(my_val, ...); /* This could explode */
>
> Of course, we do need to handle the case of an input "char *"
> argument, so I think the best option is probably to use
> "apr_pstrdup()" to allocate a new string from the application pool.

I don't suppose that this function is called other than
"%typemap(in) IN_STRING" and "%typemap(in) char *MAY_BE_NULL",
so I think the comment for this function is insufficient about it.
(APIs which need to use apr_pstrdup() are already separated
like "%typemap(in) const void *value" in core.i, I think.)
 
> > +{
> > + char *retval = NULL;
> > + if (PyBytes_Check(input)) {
> > + retval = PyBytes_AsString(input);
> > + }
> > +#if IS_PY3
> > + else if (PyStr_Check(input)) {
> > + retval = (char *)PyStr_AsUTF8(input);
> > + }
> > +#endif
>
> Remove this IF_PY3 conditional as well.
>
> > + else if (input != Py_None || ! maybe_null) {
> > + PyErr_Format(PyExc_TypeError,
> > +#if IS_PY3
> > + "%s() argument %s must be bytes or str%s, not %s",
> > +#else
> > + "%s() argument %s must be string%s, not %s",
> > +#endif
> > + funcsym, argsym, maybe_null?" or None":"",
> > + Py_TYPE(input)->tp_name);
> > + }
> > + return retval;
> > +}
> > +
>
> [ As far as my review got for now]
>
> I'll continue reviewing over the next few days, that is all the time I
> have for tonight!
>
> Troy

Thank you for review. I'll start to fix pointed out.

-- 
Yasuhito FUTATSUKI
Received on 2019-01-14 07:12:28 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.