[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Another 1.4 release critical bug

From: Garrett Rooney <rooneg_at_electricjellyfish.net>
Date: 2006-08-01 23:07:25 CEST

On 7/31/06, Philip Martin <philip@codematters.co.uk> wrote:
> "Garrett Rooney" <rooneg@electricjellyfish.net> writes:
>
> > I'm now able to run basic_tests.py over svnserve with a bdb 4.4 repos
> > with apr built with pool debugging and under valgrind with no errors,
> > so I think we're on the right track ;-)
>
> Unfortunately it doesn't work with --enable-dso which is how some
> (most?) distributions build Subversion. I think the problem is that
> the callbacks happen after the DSO has been unloaded and so the
> callback code is no longer part of the executable. Using trunk:
>
> $ valgrind -q svnadmin create --fs-type bdb repo
> ==16619== Jump to the invalid address stated on the next line
> ==16619== at 0x44CF79D: ???
> ==16619== by 0x41DC43B: pool_clear_debug (apr_pools.c:1372)
> ==16619== by 0x41DC624: pool_destroy_debug (apr_pools.c:1457)
> ==16619== by 0x41DC423: pool_clear_debug (apr_pools.c:1369)
> ==16619== by 0x41DC624: pool_destroy_debug (apr_pools.c:1457)
> ==16619== by 0x41DC6EC: apr_pool_destroy_debug (apr_pools.c:1499)
> ==16619== by 0x41DC241: apr_pool_terminate (apr_pools.c:1269)
> ==16619== by 0x41DD6B7: apr_terminate (start.c:82)
> ==16619== by 0x42ACAE1: exit (exit.c:54)
> ==16619== by 0x4296E3D: (below main) (libc-start.c:245)
> ==16619== Address 0x44CF79D is not stack'd, malloc'd or (recently) free'd
> ==16619==
> ==16619== Process terminating with default action of signal 11 (SIGSEGV)
> ==16619== Access not within mapped region at address 0x44CF79D
> ==16619== at 0x44CF79D: ???
> ==16619== by 0x41DC43B: pool_clear_debug (apr_pools.c:1372)
> ==16619== by 0x41DC624: pool_destroy_debug (apr_pools.c:1457)
> ==16619== by 0x41DC423: pool_clear_debug (apr_pools.c:1369)
> ==16619== by 0x41DC624: pool_destroy_debug (apr_pools.c:1457)
> ==16619== by 0x41DC6EC: apr_pool_destroy_debug (apr_pools.c:1499)
> ==16619== by 0x41DC241: apr_pool_terminate (apr_pools.c:1269)
> ==16619== by 0x41DD6B7: apr_terminate (start.c:82)
> ==16619== by 0x42ACAE1: exit (exit.c:54)
> ==16619== by 0x4296E3D: (below main) (libc-start.c:245)
> Segmentation fault
>
> $ valgrind -q svn ls file://`pwd`/repo
> ==16621== Jump to the invalid address stated on the next line
> ==16621== at 0x47BC79D: ???
> ==16621== by 0x422943B: pool_clear_debug (apr_pools.c:1372)
> ==16621== by 0x4229624: pool_destroy_debug (apr_pools.c:1457)
> ==16621== by 0x4229423: pool_clear_debug (apr_pools.c:1369)
> ==16621== by 0x4229624: pool_destroy_debug (apr_pools.c:1457)
> ==16621== by 0x42296EC: apr_pool_destroy_debug (apr_pools.c:1499)
> ==16621== by 0x4229241: apr_pool_terminate (apr_pools.c:1269)
> ==16621== by 0x422A6B7: apr_terminate (start.c:82)
> ==16621== by 0x4314AE1: exit (exit.c:54)
> ==16621== by 0x42FEE3D: (below main) (libc-start.c:245)
> ==16621== Address 0x47BC79D is not stack'd, malloc'd or (recently) free'd
> ==16621==
> ==16621== Process terminating with default action of signal 11 (SIGSEGV)
> ==16621== Access not within mapped region at address 0x47BC79D
> ==16621== at 0x47BC79D: ???
> ==16621== by 0x422943B: pool_clear_debug (apr_pools.c:1372)
> ==16621== by 0x4229624: pool_destroy_debug (apr_pools.c:1457)
> ==16621== by 0x4229423: pool_clear_debug (apr_pools.c:1369)
> ==16621== by 0x4229624: pool_destroy_debug (apr_pools.c:1457)
> ==16621== by 0x42296EC: apr_pool_destroy_debug (apr_pools.c:1499)
> ==16621== by 0x4229241: apr_pool_terminate (apr_pools.c:1269)
> ==16621== by 0x422A6B7: apr_terminate (start.c:82)
> ==16621== by 0x4314AE1: exit (exit.c:54)
> ==16621== by 0x42FEE3D: (below main) (libc-start.c:245)
> Segmentation fault

Ok, so here's my first cut at a patch. The fs loader stuff seems
pretty reasonable, but for some totally unclear reason it is not
enough to make the problem go away, I also have to make the ra loader
code allocate dsos in a global pool, if I don't then I continue to get
segfaults during global pool destruction. Why? I have no clue. Did
this stuff ever work?

Note that this is totally not committable at this point, but I'd LOVE
if someone else could look at it and tell me what they think...

-garrett

[[[
Make a valiant attempt at curing the insanity of the dso loading code.

* subversion/libsvn_fs/fs-loader.c
  (dso_cache): New global hash of dso objects.
  (load_module): Add locking, use the global common pool to hold the
   dso, and use the new cache.

* subversion/libsvn_ra/ra_loader.c
  (load_ra_module): Use a global pool to hold the dso for the ra layer
   since for some reason we crash during apr_terminate otherwise.
]]]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Tue Aug 1 23:08:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.