[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [swig-py3][patch] interfacing bytes object instead of str

From: Troy Curtis <troy_at_troycurtisjr.com>
Date: Wed, 26 Dec 2018 23:01:43 -0500

On 12/26/18 11:51 AM, futatuki_at_poem.co.jp wrote:
> On 12/25/18 12:20, Troy Curtis Jr wrote:
>> On Mon, Dec 24, 2018 at 1:11 PM Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>
>> wrote:
>
>> Awesome! I'm really glad you are helping work on this, helps to keep me
>> motivated as well ;)
>
> I'm glad if I could some help, too :)

I've committed your patch as is at r1849784, so that any of these
enhancements we've discussed can be smaller, more focused patches. I
did tweak some formatting for the log message to better match the
typical guidelines. Mostly spacing and converting to complete sentences.
I also converted the symbol references in python code from
<Class::method> to more the more Pythonic <Class.method> format for
better grepping.

The only real functionality changes I'm concerned with on the changes
are dropping str() conversion on a couple of the API inputs since
previously providing any object with a suitable __str__ method would
work, but would now be broken. But a simple way of supporting Py2 and
Py3 isn't a "gimme", so didn't want to block this patch on that.

>
>> So I wonder, as a guiding principle if we could support either Str or
>> Bytes
>> as an input? That would be akin to most
>> of the standard module interfaces.  I think that would be fairly easy to
>> support, and would help ensure that the
>> behavior is much closer to the typical expectation. it would also reduce
>> the number of b"" that are necessary :P Of
>> course, in many cases for the standard library interfaces, the input type
>> is used as a hint to indicate what type to
>> return, which is much more difficult/impossible to fully support in these
>> bindings.  That seems like it would provide
>> a way to make using the bindings more familiar, though we'd still have to
>> have the policy of just always returning Bytes
>> objects as output. That input/output disconnect might undo any ease of
>> use
>> advantage than just always using Bytes
>> ...I'm not sure exactly what my opinion on that is...
>
> It is just I wanted to confirm, how deal with inputs :)
>
> I also want that both of Bytes and Str is acceptable input values,
> especially for paths on file system (not Subversion's internal paths).
> So, I'll tweak typemaps for it.
>
>> The patch inline for comments:
>>
>> diff --git a/build/ac-macros/swig.m4 b/build/ac-macros/swig.m4
>>> index d865689de5..c797ae69d8 100644
>>> --- a/build/ac-macros/swig.m4
>>> +++ b/build/ac-macros/swig.m4
>>> @@ -154,7 +154,7 @@ AC_DEFUN(SVN_FIND_SWIG,
>>>             ])
>>>
>>>             if test "$ac_cv_python_is_py3" = "yes"; then
>>> -             SWIG_PY_OPTS="-python -py3"
>>> +             SWIG_PY_OPTS="-python -py3 -DPY3"
>>>             else
>>>                SWIG_PY_OPTS="-python -classic"
>>>             fi
>>>
>>
>> We shouldn't define a new preprocessor value here, but rather make use of
>> the existing IS_PY3 value defined by py3c.
>
> It was really needed because IS_PY3 macro symbol is not recognized to
> SWIG, because it is C preprocessers macro, not SWIG macro. I've looked
> for distinguish py2 and py3 on SWIG context, but I couldn't find suitable
> macro. (however this is no more needed)
>
>> diff --git a/subversion/bindings/swig/core.i
>>> b/subversion/bindings/swig/core.i
>>> index e5a234c5e2..c309d48070 100644
>>> --- a/subversion/bindings/swig/core.i
>>> +++ b/subversion/bindings/swig/core.i
>>>
>>
>> [snip]
>>
>> @@ -442,16 +442,18 @@
>>>   */
>>>   #ifdef SWIGPYTHON
>>>   %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
>>> -    if (!PyStr_Check($input)) {
>>> +    char *tmpdata;
>>> +    Py_ssize_t length;
>>> +    if (!PyBytes_Check($input)) {
>>>           PyErr_SetString(PyExc_TypeError,
>>> -                        "expecting a string for the buffer");
>>> +                        "expecting a bytes object for the buffer");
>>>           SWIG_fail;
>>>       }
>>> -    /* Explicitly cast away const from PyStr_AsUTF8AndSize (in
>>> Python 3).
>>> The
>>> -       swig generated argument ($1) here will be "char *", but since
>>> its
>>> only
>>> -       use is to pass directly into the "const char *" argument of the
>>> wrapped
>>> -       function, this is safe to do. */
>>> -    $1 = (char *)PyStr_AsUTF8AndSize($input, &temp);
>>>
>>
>>
>>> +    if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
>>>
>>
>> Why not just use "&$1" here instead of introducing a new temp variable?
>
> It's to avoid C compiler warning to discard 'const' quarifier, but this
> can be also safely accomplished by cast.
>
>> +        SWIG_fail;
>>> +    }
>>> +    temp = ($*2_type)length;
>>> +    $1 = tmpdata;
>>>       $2 = ($2_ltype)&temp;
>>>   }
>>>   #endif
>>>
>> @@ -504,8 +506,8 @@
>>>          SWIG_fail;
>>>       }
>>>
>>> -    if (PyStr_Check($input)) {
>>> -        const char *value = PyStr_AsString($input);
>>> +    if (PyBytes_Check($input)) {
>>> +        const char *value = PyBytes_AsString($input);
>>>           $1 = apr_pstrdup(_global_pool, value);
>>>       }
>>>       else if (PyLong_Check($input)) {
>>>
>>
>> Since this function does a whole list of checks and conversions,
>> perhaps it
>> makes sense to support Str here in
>> addition to bytes? Of course in Py2 the "PyStr_Check()" would effectively
>> be performed twice if we did that, but
>> I don't expect that to be a tremendous impact, and with the "cost" of an
>> addition "IS_PY3" preprocessor check,
>> that could be eliminated from the Py2 build.
>
> I Agreed.
>
>>> diff --git a/subversion/bindings/swig/include/svn_global.swg
>>> b/subversion/bindings/swig/include/svn_global.swg
>>> index 191bbdd531..f7364be28f 100644
>>> --- a/subversion/bindings/swig/include/svn_global.swg
>>> +++ b/subversion/bindings/swig/include/svn_global.swg
>>> @@ -31,6 +31,12 @@
>>>   #define SVN_DEPRECATED
>>>   #endif
>>>
>>> +#ifdef SWIGPYTHON
>>> +%begin %{
>>> +#define SWIG_PYTHON_STRICT_BYTE_CHAR
>>> +%}
>>> +#endif
>>> +
>>>   %include typemaps.i
>>>   %include constraints.i
>>>   %include exception.i
>>> @@ -136,9 +142,15 @@ static PyObject * _global_py_pool = NULL;
>>>   /* Python format specifiers. Use Python instead of SWIG to parse
>>>      these basic types, because Python reports better error messages
>>>      (with correct argument numbers). */
>>> +#if defined(PY3)
>>>
>>
>> This is where we'd want to use py3c's "IS_PY3" value.
>
> including header file py3c/compath.h is done by indirectly as
>
> #ifdef SWIGPYTHON
> %{
> #include "swigutil_py.h"
> #include "swigutil_py3c.h"
> %}
> #endif
>
> however, it is enclosed by %{ %} so it is not defined as SWIG macro
> symbol. and following
>
>>> +%typemap (in, parse="y")
>>> +  char *, char const *, char * const, char const * const "";
>>> +#else
>>>   %typemap (in, parse="s")
>>>     char *, char const *, char * const, char const * const "";
>>> +#endif
>
> is a SWIG context switch. this is why -DPY3 was needed.
> However, if we want to allow both of Bytes and Str for input, `parse'
> directive cannot be used here and must write code here to type check
> and to convert ourself, it isn't needed.

Ah, I see your point!

>
> Rest of all your suggestions convinced me, so I'll start to modify. >
> Cheers,

Thanks!
Troy
Received on 2018-12-27 05:01:58 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.