[PROPOSAL] Safe Memory Management for the Python/SWIG bindings

From: David James <james82_at_gmail.com>
Date: 2005-07-20 04:07:13 CEST

Currently, if you make a memory management mistake in your Python (or
Perl or Java or Ruby) code that uses the Subversion SWIG bindings,
your program will crash with a segfault. Segfaults are difficult to
debug and can also be a source of security holes.

Here's a two-pronged proposal that aims to stop SWIG segfaults:

1. Track reference counts to each pool.
- When a variable is allocated from a pool, increment the pool reference count.
- When a variable is destroyed, decrement the pool reference count.
2. Validate pool memory (as a runtime configurable debugging option,
enabled by default for safety purposes)
- Before allocating memory (inside the SWIG/C bindings)
- Before accessing memory (using a __getattribute__ handler)
- Before writing to memory (using a __setattr__ handler)

As of r15370, we already verify that pools are correct before
allocating memory from them (in the python-bindings-improvements
branch). So we're already part-way there. I've described the
implementation for the rest of the safety features below.

IMPLEMENTATION

Here's a quick and dirty prototype for a "CheckedObject" class which
maintains a reference to a pool and validates that the pool is correct
before allowing it to be accessed. This object accomplishes tasks 1
and 2. When variables inside the CheckedObject class are accessed or
modified, we check the validity of the CObject before allowing access
to the underlying variable. We could modify the SWIG proxy wrappers
for CObjects to contain these functions.

class CheckedObject(object):

  def __getattribute__(self, name):
    if name != "assert_valid":
      self.assert_valid()
    return object.__getattribute__(self, name)

  def __setattr__(self, name, value):
    self.assert_valid()
    return object.__setattr__(self, name, value)

def assert_valid(self):
object.__getattribute__(self,"pool").assert_valid()

We could write a script to modify the generated Python versions of the
SWIG wrappers to contain the above functions. Furthermore, whenever
swig objects are created, we would call object.__setattr__(obj,
"pool", pool), to add a reference to the pool that created the SWIG
object.

In order to detect when pools are cleared in pure C code, we will need
to add an APR cleanup function for each pool which is managed by C
code. This cleanup function would notify the Python bindings that the
C code has deleted the pool and would therefore mark the Python
version of the pool as invalid. If we did this, then users who attempt
to access a python variable after its memory has been deallocated,
would get a Pythonic error message instead of a segfault. (It would
solve issue #2172)

Whenever a checked object is passed into a SWIG/C function, the SWIG/C
function would call the "assert_valid" function to verify that the
object is valid.

CHALLENGES

In order for the above implementation to work, we will need to have a
Python proxy object for every C object that is accessible from Python.
Currently, SWIG only creates Python proxy objects if we provide it
with the details of the internals of the C struct.

Currently, we do not provide SWIG with all the details of how the
internals of Subversion are implemented, so SWIG treats some
Subversion datastructures as opaque pointers. We can fix this problem
by using the %include directive -- just %include the internal .h or .c
files which describe the structure. If we do this, then SWIG will
generate real Python objects for our "opaque pointers" -- not just C
Objects.

Subversion developers: What do you think of this idea?

Cheers,

David

-- 
David James -- http://www.cs.toronto.edu/~james
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Wed Jul 20 04:08:17 2005

This message: [ Message body ]
Next message: Brian Holmes: "Re: Re: libsvn_repos logging -- rethunk"
Previous message: Brian Holmes: "Re: libsvn_repos logging -- rethunk"
Next in thread: David James: "Re: [PROPOSAL] Safe Memory Management for the Python/SWIG bindings"
Reply: David James: "Re: [PROPOSAL] Safe Memory Management for the Python/SWIG bindings"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]