Peter Lundblad found a thread-safety problem in the patch below.
(<embarrassment> We did think about thread-safety issues for previous
versions of the patch, versions that used a boolean instead of a
struct pointer. But when we switched to the struct, those previous
thoughts became irrelevant. Call it the mental equivalent of a cache
invalidation problem.)
Here's the thread issue:
In the block that starts 'if (APR_STATUS_IS_ENOTIMPL (apr_err))',
assume we assign enotimpl_handle_node = apr_palloc(), and then the
thread switches out. Now another thread switches in and reaches the
earlier check, 'if (enotimpl_handle_node)'. The new thread sees that
enotimpl_handle_node is not NULL, and proceeds into the conditional
body. But the first thread never finished initializing
enotimpl_handle_node->handle, enotimpl_handle_node->next, etc.
Badness results!
Peter's coming up with a patch that uses the non-global caching system
that was already present in the code anyway. Even without the
thread-safety issue, this would be a better solution.
-Karl
kfogel@collab.net writes:
> Background:
>
> An attempt to load a 90GB dumpfile (of the ASF repository) into FSFS
> resulted in intolerable linear memory growth. The problem turned out
> to be in the translation code. Because APR was compiled without
> APR_HAS_XLATE, apr_xlate_open() kept returning APR_ENOTIMPL, which
> meant that there was no resulting xlate_handle to cache, which meant
> that instead of getting it out of the cache the next time, we'd try to
> generate it again... and again... and again... each attempt using up
> some more memory. This is in libsvn_subr/utf.c:get_xlate_handle_node().
>
> Justin Erenkrantz temporarily solved this at the ASF by recompiling
> APR with iconv (only to reveal another, slower memory leak, but that's
> a separate issue not discussed here). We still needed a fix for
> APR_ENOTIMPL in Subversion, so Justin and Ben Collins-Sussman and I
> discussed a strategy, and came up with this patch. They've reviewed
> prior versions of the patch, but not seen this final one, so they are
> absolved of responsibility for any brainos here.
>
> Review welcome!
>
> Thanks,
> -Karl
>
> [[[
> Plug a memory leak by avoiding repeated failing attempts at apr_xlate setup.
>
> Patch by: jerenkrantz
> kfogel
> sussman
> Suggested by: jerenkrantz
> dberlin
> ghudson
> brane
> striker
>
> * subversion/libsvn_subr/utf.c
> (enotimpl_handle_node): New file-static variable.
> (get_xlate_handle_node): Use enotimpl_handle_node to determine
> whether to even bother calling apr_xlate_open again. Update and fix
> doc string, which was wrong before anyway.
> ]]]
>
> Index: subversion/libsvn_subr/utf.c
> ===================================================================
> --- subversion/libsvn_subr/utf.c (revision 15373)
> +++ subversion/libsvn_subr/utf.c (working copy)
> @@ -68,6 +68,17 @@
> memory leak. */
> static apr_hash_t *xlate_handle_hash = NULL;
>
> +/* Non-NULL iff apr_xlate_open() ever returned APR_ENOTIMPL. Since
> + there would be no point ever calling apr_xlate_open() again in this
> + process, we can save ourself some setup work (including memory
> + allocations) by caching a handle_node to return in that case;
> + enotimpl_handle_node->handle will be NULL, of course. Note that
> + apr_xlate_open() could return APR_ENOTIMPL because APR_HAS_XLATE is
> + not defined, or for some other reason. For our purposes it doesn't
> + matter, we just want to remember that it is not implemented for
> + this process. */
> +static xlate_handle_node_t *enotimpl_handle_node = NULL;
> +
> /* Clean up the xlate handle cache. */
> static apr_status_t
> xlate_cleanup (void *arg)
> @@ -122,12 +133,13 @@
> }
> }
>
> -/* Return an apr_xlate handle for converting from FROMPAGE to
> - TOPAGE. Create one if it doesn't exist in USERDATA_KEY. If
> - unable to find a handle, or unable to create one because
> - apr_xlate_open returned APR_EINVAL, then set *RET to null and
> - return SVN_NO_ERROR; if fail for some other reason, return
> - error. */
> +/* Set *RET to a handle node for converting from FROMPAGE to TOPAGE,
> + creating the handle node if it doesn't exist in USERDATA_KEY.
> + If unable to find a handle, or unable to create one because
> + apr_xlate_open returned APR_EINVAL or APR_ENOTIMPL, then
> + *RET->handle will be NULL (and in the ENOTIMPL case, allocate *RET
> + itself in a child of the top-level pool, not in POOL). If fail for
> + any other reason, return error. */
> static svn_error_t *
> get_xlate_handle_node (xlate_handle_node_t **ret,
> const char *topage, const char *frompage,
> @@ -199,6 +211,14 @@
> if (userdata_key && xlate_handle_hash)
> pool = apr_hash_pool_get (xlate_handle_hash);
>
> + /* If we failed due to ENOTIMPL before, then we'd certainly fail
> + now, so just return without trying. */
> + if (enotimpl_handle_node)
> + {
> + *ret = enotimpl_handle_node;
> + return SVN_NO_ERROR;
> + }
> +
> /* Try to create a handle. */
> *ret = apr_palloc (pool, sizeof(xlate_handle_node_t));
> apr_err = apr_xlate_open (&(*ret)->handle, topage, frompage, pool);
> @@ -229,6 +249,19 @@
>
> if (APR_STATUS_IS_EINVAL (apr_err) || APR_STATUS_IS_ENOTIMPL (apr_err))
> {
> + if (APR_STATUS_IS_ENOTIMPL (apr_err))
> + {
> + /* Remember for next time that we needn't bother trying.
> + Use the top-level pool because we have no lifetime
> + guarantees about our local pool parameter. */
> + enotimpl_handle_node = apr_palloc (svn_pool_create (NULL),
> + sizeof (*enotimpl_handle_node));
> + enotimpl_handle_node->handle = NULL;
> + enotimpl_handle_node->frompage = NULL;
> + enotimpl_handle_node->topage = NULL;
> + enotimpl_handle_node->next = NULL;
> + }
> +
> (*ret)->handle = NULL;
> return SVN_NO_ERROR;
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 20 23:03:49 2005