Hey, all. I'm wondering if I can get some extra eyes/brains on a
particular usage of our Python bindings.
The attached tarball contains a directory in which lives two files:
- run-test.sh - a shell script to drive the reproduction recipe
- pysvnget - a Python program that uses the bindings and a
generator-based wrapper of the FS's file content access APIs
If you explode the tarball, cd into the resulting directory, and run the
shell script, it should create a test repository and working copy within
that sandbox and start a loop. The loop will...
1. add text (a datestamp) to a single file in the working copy,
2. ensure the file is under version control,
3. commit the file, then
4. try to dump the content of the file from the repository using the
Python program.
The problem that I see when I do this is that after a few iterations of the
loop, the Python program starts to SEGFAULT.
I suspect there's some misinteraction with the APR pool subsystem at work
here -- my Python program is (intentionally) taking advantage of the
bindings' pool self-management logic. If I had to guess, I'd say that the
delayed access to the FS via the generator is causing reads from memory
that once lived in pools that have since been destroyed. Unfortunately, I
don't think I ever really understood how that magic worked in the first
place.
While this is a simple scenario where "don't do that" might seem an easy
enough response, what is represented by the Python program is
much-distilled logic that is live in production in some of The Company
Formerly Known As CollabNet's products. The generator approach exists to
keep server-side memory use constant while allowing http-based reads of
arbitrarily large versioned files. Moreover, the size and nature of the
codebase is such that I'd really prefer NOT to start manually doing pool
management (though as a last resort, it's not out of the cards).
Anything stand out as obviously wrong with my code?
-- Mike
Received on 2020-09-29 15:26:48 CEST