Re: svn commit: r1648253 - /subversion/trunk/subversion/libsvn_fs_fs/tree.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 29 Dec 2014 15:39:40 +0100

On Mon, Dec 29, 2014 at 1:46 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan2_at_apache.org> writes:
>
> > In FSFS, use sub-pools for temporaries in those API function
> implementations
> > that call open_path().
> >
> > All these functions are complex enough to make the overhead of creating a
> > temporary sub-pool be outweighed by the benefits of tighter memory usage
> and
> > better isolation from pool usage scheme.
> >
> > * subversion/libsvn_fs_fs/tree.c
> > (fs_change_node_prop,
> > fs_make_dir,
> > fs_delete_node,
> > fs_make_file): These functions don't return allocated memory, so use
> > the new sub-pool for all allocations.
> > (apply_textdelta): Allocate the node to be stored in the baton in the
> > baton's pool. That way, all remaining allocations
> > from POOL are temporaries.
> > (fs_apply_textdelta): Pass a temporary sub-pool to the above function.
> > (apply_text,
> > fs_apply_text): Same as for apply_textdelta and fs_apply_textdelta.
> > (fs_closest_copy): Use a temporary sub-pool for all queries that don't
> > produce the objects to return.
>
> I have several concerns about this change (including the following-up
> r1648272,
> r1648243 and other related changes):
>
> 1) It violates the APR pool usage conventions, as per Subversion Community
> Guide [1]:
> [[[
> Functions should not create/destroy pools for their operation; they
> should
> use a pool provided by the caller. Again, the caller knows more about
> how
> the function will be used, how often, how many times, etc. thus, it
> should
> be in charge of the function's memory usage.
> ]]]
>

That rule cannot be violated as it states an optional requirement.
It says "should", not "shall" or "must" - and for a good reason.
As they stand, the rules a incomplete and have some unintended
consequences (think recursion without loops). I plan on writing
a post on that topic tomorrow.

> Functions within this patch, like fs_change_node_prop(), allocate a
> constant
> amount of memory and do not require a subpool.

How do you know? Is there some expressed guarantee about
this in the code.

For the code at hand, you are also wrong. get_dag() needs at
least O(path length) memory in the general case. How much
memory will be needed at each directory level to read the
respective noderev is everyone's guess.

> The real reason why we added
> these subpools is the DAG node cache happening to bind its lifetime to
> the
> passed-in pools, so this is a workaround, and it does not follow our
> set of
> established rules.
>

That is true. Although I would not call it work-around. Is is
necessary to isolate API users from implementation details.

> It is the cache locking concept that *is broken*, not that we need
> subpools
> everywhere.

Demonstrate that reference counting is a broken concept.

You may not agree with its use here. You may also argue
that *not* always locking the cache is the broken element
of the design.

You might also demonstrate that it is / has become now
basically ineffective even in the high-load scenarios that it
is meant to help with.

> If we choose to "fix" the problem this way, we will quickly find
> ourselves in a very dangerous situation. Miss one subpool â€” you're
> dead.
>

That is why isolating the implementation from the API
users is important - with or without cache locking. The
queries we need to run may look trivial but often involve
various file buffers and other "heavy" structures. Because
the user may call sequences of those seemingly simple
FS functions using the same pool, they may pile up.

Bottom line: cleaning up at the FS API boundary is a
idea in any case.

> Pass a potentially long-living DAV resource pool as a scratch pool for a
> one-time FSFS DAG operation â€” you're dead again. How should anyone know
> that all the calls to open_path() or get_dag() that happen to lock the
> DAG
> node cache now require a subpool? Should we constantly review all these
> places?
>

Yes, that is a potential problem. Possible solutions include

* Making the DAG cache work without locking (see below).
* Make the FS API function implementations use sub-pools.
* Switch the FS API to the two-pool paradigm.

Ideally, we would do all of them because each one provides
benefits in its own rights. As of r1648272, the second point
should be fulfilled either by using explicit sub-pools or using
ordinary iterpools. I probably missed a few, so please, review.

The whole point of the refcounting DAG cache is that DAG
nodes are relatively heavy objects (due to the noderev struct
in them) while we usually need only a single element, e.g.
the location of the next directory level. So, this is what I will
try and report back as soon as the results are in:

* Split open_path into one producing the full parent path.
This one copies all DAG nodes from the cache along the path.
Remove the get_dag-related shortcuts from it.

* The other contains all the current short paths and only returns
the final node. Also as a copy of the cached DAG node if
it came from cache.

* Remove the cache locking code.

> 2) I am now witnessing random Apache HTTPD Server segfaults (Subversion
> 1.9.0-dev built from r1648272), and I suspect them to be a consequence
> of
> these recent pool-related changes. Here is a sample backtrace of the
> crash:
>

This problem is most likely caused by some returned path
not being allocated in the result pool. You may set BUCKET_COUNT
in tree.c to 1. That should make it fail more consistently.

> libsvn_fs-1.dll!get_node_revision_body()
> libsvn_fs-1.dll!svn_fs_fs__dag_get_node()
> libsvn_fs-1.dll!open_path()
> libsvn_fs-1.dll!svn_fs_fs__node_id()
> libsvn_fs-1.dll!svn_fs_fs__check_path()
> mod_dav_svn.so!prep_regular()
> mod_dav_svn.so!get_resource()
> mod_dav.so!dav_get_resource()
> mod_dav.so!dav_method_get()
> ...
>
> Given the above, I am -1 on doing this. Please revert this change and
> other
> related changes that were supposed to fix the problem.
>

I will keep the added sub-pools in place for now. The problems
they cause now will always occur when we move code to the
two-pool paradigm. The DAG cache issue is simply the trigger
to tighten our pool usage in FSFS.

-- Stefan^2.
Received on 2014-12-29 15:40:43 CET

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: [RFC] Refining our naming rules"
Previous message: Anatol Pomozov: "ruby 2.2: check-swig-rb test fails"
In reply to: Evgeny Kotkov: "Re: svn commit: r1648253 - /subversion/trunk/subversion/libsvn_fs_fs/tree.c"
Next in thread: Ivan Zhakov: "Re: svn commit: r1648253 - /subversion/trunk/subversion/libsvn_fs_fs/tree.c"
Reply: Ivan Zhakov: "Re: svn commit: r1648253 - /subversion/trunk/subversion/libsvn_fs_fs/tree.c"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]