On Mon, Jul 28, 2014 at 1:48 AM, Branko Čibej <brane_at_wandisco.com> wrote:
> On 28.07.2014 00:23, stefan2_at_apache.org wrote:
> Author: stefan2
> Date: Sun Jul 27 22:23:26 2014
> New Revision: 1613854
> URL: http://svn.apache.org/r1613854
> On the authzperf branch: Very first step towards faster authz.
> Determine the user's aliases and group memberships once and
> then use this to evaluate rules in O(1).
> Right now, that information is constructed on the fly for each
> request and used from the existing matching code. This overhead
> will be eliminated in later commits. Right now, this is to
> introduce new, reviewable / testable code.
> +/*** Utilities. ***/
> +/* We are using hashes to represent sets. This glorified macro adds the
> + * string KEY to SET. */
> +static void
> +set_add_string(apr_hash_t *set,
> + const char *key)
> + apr_hash_set(set, key, strlen(key), "");
> svn_hash_sets. APR's default hash function avoids the strlen overhead
> completely; our hash function already does the strlen.
> +/*** Users, aliases and groups. ***/
> +/* Callback baton used with add_alias(). */
> +typedef struct add_alias_baton_t
> + /* Store the alias names here (including the '&' prefix). */
> + apr_hash_t *aliases;
> + /* The user we are lookup up. */
> Maybe change this to: "The user we lookup are up to look up"? :)
Writing commentary is always the hardest part ... ;)
Fixed in r1613867.
> +/* Return a hash set of all name keys (plain user name, decorated aliases
> + * and decorated group names) that refer to USER in the authz CONFIG.
> + * This include indirect group memberships. */
> I'm confused (mostly by the lack of documentation of the structure of the
> in-memory representation of the parsed authz file). Is your intention to
> statically generate, at parsing time, a correlation between each possible
> user name and the set of access control entries that are valid for that
> user? I can easily imagine such a thing exploding in size very quickly, even
> though most ACEs are not likely to ever be used during the lifetime of the
> parsed authz representation.
Right now (old code), for every path, we walk up the tree until we find
a relevant authz entry. Other requests scan all rules in the authz file.
For every path rule, all rules (individual lines in the authz sections),
we check whether it mentions the current user directly or indirectly.
My patch determines all group memberships and aliases in O(size of
groups and alias sections). The final code will do this only twice per
connection (anonymous user and actual user). The lookup is trivial
and also covers special groups like "*" and "$authenticated".
> That last is true for paths in the authz file, too; but memory required for
> them is proportional to the number of paths in the authz file, whereas for
> the user->ACE correlation, it's proportional to the nuber of paths (ACEs)
> times the number of users.
The old code reads the authz file once per connection. The rationale
behind the new code is that we may as well to go over the resulting
svn_config_t once more and create a filtered tree structure plus
preprocessed user data. A single call to authz_get_any_access or
authz_get_tree_access can fully amortize the initial overhead.
> (Of course, if I were writing this, I'd concentrate on the path lookup tree
> first, and tackle users later ...).
The whole thing is already implemented and initial performance numbers are
gigantic (7.5M path checks /s against multiple path rules of the
Users were simply the most easy part to start at, minimizing the amount of
Received on 2014-07-28 02:57:11 CEST