Sorry for the self-reply, but I thought I'd note that if I tweak
pysvnget thusly, the SEGFAULTs stop:
--- pysvnget 2020-09-29 09:34:07.918002584 -0400
> +++ pysvnget.pools 2020-09-29 09:33:54.278153037 -0400
> @@ -21,17 +21,17 @@
> yield chunk
> svn.core.svn_stream_close(self.stream)
>
> -def get_generator(repos_path, peg_revision, path):
> - fs = svn.repos.fs(svn.repos.open(repos_path))
> +def get_generator(repos_path, peg_revision, path, pool):
> + fs = svn.repos.fs(svn.repos.open(repos_path, pool))
> peg_revision = peg_revision or svn.fs.youngest_rev(fs)
> fsroot = svn.fs.revision_root(fs, peg_revision)
> return SvnContentProxy(fsroot, path).get_generator()
> #
> #
> --------------------------------------------------------------------------
> -
> +pool = svn.core.svn_pool_create()
> if len(sys.argv) < 3:
> sys.stderr.write("Usage: REPOS-PATH PATH-IN-REPOS [PEGREV]\n")
> sys.exit(1)
> peg_revision = len(sys.argv) > 3 and int(sys.argv[3]) or None
> -generator = get_generator(sys.argv[1], peg_revision, sys.argv[2])
> +generator = get_generator(sys.argv[1], peg_revision, sys.argv[2], pool)
> print(b''.join(generator).decode('utf-8'))
On Tue, Sep 29, 2020 at 9:26 AM C. Michael Pilato <cmpilato_at_red-bean.com>
wrote:
> Hey, all. I'm wondering if I can get some extra eyes/brains on a
> particular usage of our Python bindings.
>
> The attached tarball contains a directory in which lives two files:
>
> - run-test.sh - a shell script to drive the reproduction recipe
> - pysvnget - a Python program that uses the bindings and a
> generator-based wrapper of the FS's file content access APIs
>
> If you explode the tarball, cd into the resulting directory, and run the
> shell script, it should create a test repository and working copy within
> that sandbox and start a loop. The loop will...
>
> 1. add text (a datestamp) to a single file in the working copy,
> 2. ensure the file is under version control,
> 3. commit the file, then
> 4. try to dump the content of the file from the repository using the
> Python program.
>
> The problem that I see when I do this is that after a few iterations of
> the loop, the Python program starts to SEGFAULT.
>
> I suspect there's some misinteraction with the APR pool subsystem at work
> here -- my Python program is (intentionally) taking advantage of the
> bindings' pool self-management logic. If I had to guess, I'd say that the
> delayed access to the FS via the generator is causing reads from memory
> that once lived in pools that have since been destroyed. Unfortunately, I
> don't think I ever really understood how that magic worked in the first
> place.
>
> While this is a simple scenario where "don't do that" might seem an easy
> enough response, what is represented by the Python program is
> much-distilled logic that is live in production in some of The Company
> Formerly Known As CollabNet's products. The generator approach exists to
> keep server-side memory use constant while allowing http-based reads of
> arbitrarily large versioned files. Moreover, the size and nature of the
> codebase is such that I'd really prefer NOT to start manually doing pool
> management (though as a last resort, it's not out of the cards).
>
> Anything stand out as obviously wrong with my code?
>
> -- Mike
>
Received on 2020-09-29 15:36:16 CEST