So, I've been banging my head against the brick wall that is issue
2564 for much of yesterday and today, and I'm starting to run out of
ideas. This is my attempt to throw what I've figured out towards the
crowd on dev@subversion.t.o and see if anyone else sees something I'm
missing.
The problem manifests itself as commits seeming to succeed, but
actually being rolled back by BDB because it thinks that the process
exited without cleanly detaching from the BDB environment.
This only happens when svnserve is in forking mode, if you switch to
threaded mode it works fine.
The problem goes away if you explicitly destroy the connection pool
after running the 'serve()' function.
Ok, so you might ask, why not just add that explicit destroy and move
on with life? Well, it's not quite so simple. You see, we shouldn't
actually have to do that. When you create a new top level pool (like
connection_pool) it gets registered as a child of the global pool,
which is destroyed during apr_terminate.
You might think "ahh, it's just some weird behavior from APR's fork
code, it probably magically unregisters cleanups behind your back",
but, as far as I can tell it doesn't. apr_proc_fork() is about the
simplest wrapper around the underlying fork() call that you can think
of, and there are no magical atfork callbacks registered anywhere that
I can find.
Now, I've hacked the pool creation code, so I can see that it's
getting linked into the global pool, and I've hacked the terminate
code, so I can see that apr_terminate is getting called in the child
process when it exits, but for some reason it never gets around to our
pool (I added an apr_pool_tag when we create the connection pool, and
I look for that tag in apr_pool_destroy, and we never hit it).
Well, it's worse than that actually. It's not that we never hit it,
it's that we /almost/ never hit it. If I run the basic_tests.py then
my log file shows me that it actually gets cleaned up once, but that's
out of maybe 20 or so child processes that were spawned off, which is
not especially comforting.
So at this point, I'm worried. Either there's something wrong in the
apr pool cleanup code, which seems unlikely, or something is
corrupting the global pool's list of children, which is quite
disturbing.
Any thoughts?
-garrett
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 7 16:28:57 2006