[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Authz perf regression 1.9 -> 1.10

From: Branko Čibej <brane_at_apache.org>
Date: Thu, 12 Sep 2019 14:01:38 +0200

On 12.09.2019 13:30, Sam Toliman wrote:
> >> (~3500k in my case)
> > You have more than 3 million groups?
> Misprinted. There are ~3500 groups in my case.
>
> > The best thing you can do is prepare a patch, with tests, that
> > implements your suggestions. Considering of course that correctness is
> > more important than performance. :)
> I will try to do it properly, in accordance with the algorithm
> described above.
>
> I have played with code base and ended up with minimal ad hoc diff
> required to speed up svn_authz__parse() [1]. It's certainly not a
> patch candidate, but can you please verify that this user filter
> doesn't break any implicit contracts or semantic agreements in the
> auth logic (see prepare_global_rights())?

Your change might break the semantics of the '*', '$anonymous' and
'$authenticated' ACE selectors. Any username that's not explicitly
mentioned in the authz file and group definitions is still subject to
those access entries. I'd have to re-read the code ... IIRC we do have
precooked global rights for those categories and check those first
before user-specific rights, but it's worth double-checking.

The other thing I can think of is that, even if this is only used in
tunnel mode, you should still make sure that the parsed internal
representation isn't cached. It would be very wrong indeed if the
user-filtered representation was reused for another user.

On that point, I'd suggest that the parser itself is *not* made aware
that it's running in some kind of "tunnel mode" -- that's far too much
information at too low a level, i.e., a layering violation. Renaming
'tunnel_user' to something else (e.g., 'only_user' or 'constrain_user'
or similar) would do the trick, as well as moving comments about tunnel
mode to the place that's actually aware of it and what it means.

> It gave some performance improvements (best of 10 runs):
> 1.10:
> time SVN_SSH='/tmp/svnserve1.10.sh <http://svnserve1.10.sh>' svn info
> svn+ssh://{server}/arc >/dev/null
>  real 0m0,501s
>  user 0m0,009s
>  sys 0m0,008s
>
> 1.10 with patch:
> time SVN_SSH='/tmp/svnserve1.10_patch.sh
> <http://svnserve1.10_patch.sh>' svn info svn+ssh://{server}/arc >/dev/null
>  real 0m0,134s
>  user 0m0,001s
>  sys 0m0,014s

That's a very nice result indeed, if only we can be certain of
correctness. (Hint: we don't have nearly enough tests for authz, and the
current ones almost certainly don't exercise tunnel mode).

(More comments below.)

> Index: subversion/libsvn_repos/authz_parse.c
> ===================================================================
> --- subversion/libsvn_repos/authz_parse.c       (revision 5646686)
> +++ subversion/libsvn_repos/authz_parse.c       (working copy)
> @@ -133,6 +133,10 @@
>  
>       N.B.: The result pool is AUTHZ->pool. */
>    apr_pool_t *parser_pool;
> +
> +  /* Stores tunnel_user to speed up ACL loading in tunnel mode.
> +     For more info see ARCADIA-1621. */
> +  const char *tunnel_user;
>  } ctor_baton_t;

Please keep the 'parser_pool' member last in this structure. It's not a
public type, being defined in the implementation file, so reordering
members is OK. And I'm sure you'll come up with more generic wording for
the comment. :)

> @@ -1298,10 +1307,11 @@
>                   svn_stream_t *rules,
>                   svn_stream_t *groups,
>                   apr_pool_t *result_pool,
> -                 apr_pool_t *scratch_pool)
> +                 apr_pool_t *scratch_pool,
> +                 const char *tunnel_user)

Similarly, we have a convention that pool parameters are always last in
an argument list. I won't repeat this observation for the other cases in
the diff.

>  {
>    ctor_baton_t *const cb = create_ctor_baton(result_pool, scratch_pool);
> -
> +  cb->tunnel_user = tunnel_user;

Anything that's stored in the constructor baton should be allocated in
its pool, so use something like this instead:

    cb->tunnel_user = apr_pstrdup(cb->parser_pool, tunnel_user);

(Also leave the empty line.)

> Index: subversion/include/svn_repos.h
> ===================================================================
> --- subversion/include/svn_repos.h      (revision 5646686)
> +++ subversion/include/svn_repos.h      (working copy)
> @@ -4170,7 +4170,8 @@
>                        svn_boolean_t must_exist,
>                        svn_repos_t *repos_hint,
>                        apr_pool_t *result_pool,
> -                      apr_pool_t *scratch_pool);
> +                      apr_pool_t *scratch_pool,
> +                      const char *tunnel_user);

We need an API revision here, you can't just add parameters to an
existing public function. You'd have to create a new function,
'svn_repos_authz_read5', with the new parameter, then deprecate and
reimplement 'svn_repos_authz_read4' in deprecated.c as a wrapper for
'svn_repos_authz_read4' (there are several examples of how that's done
right there).

-- Brane
Received on 2019-09-12 14:01:41 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.