Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Tue, 29 Jul 2014 13:34:37 +0200

On Mon, Jul 28, 2014 at 7:38 PM, Branko ÄŒibej <brane_at_wandisco.com> wrote:
> On 28.07.2014 19:19, stefan2_at_apache.org wrote:
>>
>> Author: stefan2
>> Date: Mon Jul 28 17:19:47 2014
>> New Revision: 1614080
>>
>> URL: http://svn.apache.org/r1614080
>> Log:
>> On the authzperf branch: Implement the notion of path rule ordering
>> by making svn_config_t iterate through sections in declaration order.
>> This is done using a simple linked list because we can't remove
>> sections but only add them.
>
>
> No no no no!

What exactly did my change break?

Right now, the svn_config interface returns the sections in random order.
I changed the code such that this random order happens to be the same
as the (static and dynamic) declaration order.

> Changing the svn_config_t structure is not the right to do this. The correct
> approach here is to parametrise the svn_config parser with a constructor
> method, then create a new constructor for the authz parser which will then
> get all entries in the file order. Please revert these svn_config changes.

I reverted the changes for now as I agree that the new behaviour
should be somewhat more explicit.

However, since this is not about construction (the result would
still be a hash) but about iteration, I see only two basic options:

1. Invent a new data structure that duplicates 80+% of svn_config_t
  to provide the same functionality but different ordering through
  its 'enumerate sections' interface. This basically would be r1614080
  minus a few bits that we happen not to call in the authz context.

2. Reapply r1614080 but let 'enumerate sections' return data in
hash order as on /trunk. Bump the 'enumerate sections' API
to include a "use declaration order' option.

IMO, number 2 is the better choice as it gives full control over the
added functionality without adding significant maintenance costs.

>> Without support for wildcards or other patterns, the config struct
>> will only contain a single section for each path. With wildcards,
>> there may be more than one. All three of the follwing path rules
>> are equally applicable:
>>
>> [/foo/*.doc]
>> * = r
>>
>> [/foo/bar.*]
>> * = rw
>>
>> [/foo/bar.doc]
>> jrandom =
>>
>> To make conflicts managable, always pick the last path rule. That
>> means users should specify general rules first, followed by exceptions
>> and finally (and optionally) critical rules that deny certain access,
>> potentially globally.
>
> Are you saying that a wildcard path should match before a concrete path when
> it happens to appear later in the authz file? Perish the thought. Exact
> matches should always override fuzzy matches, anything else is not intuitive
> at all and we'll receive a ton of spurious bug reports about how authz files
> don't work.

No, that is not what I'm saying. My rules are simple:

* Select the path rule(s) that cover the largest section of the path.
This is what we do today and there can be only one such match
w/o the use of wildcards.

* If there is more than one such rule, apply the one declared last.
This seems to be the easiest rule to follow and understand.

Those rules suggest the following ordering:

* Declare old-style path rules in tree order. That makes it easy
  to understand what applies where. I expect most people today
  do it that way. Everything else leads to "surprising" rules, in
  particular as later declarations for the exact same path will
  be merged with previous ones and rules for the same user /
  alias / group be overwritten.

* Specify generic rules first. These define system or repo-wide
  default rules and shall be overruled by local exceptions.
  For example make "*/**/*.docx" r/o for everybody. Specific
  people may later get write access to individual docx files or
  all docx files in a given sub-tree etc. To the user, this looks
  like a natural outcome of the previous ordering.

* Sub-tree specific rules wildcard rules should be declared
  immediately before the respective old-style rules for that
  sub-tree. This is not a new rule but a mere result of the
  previous two.

* "Catch-all" rules shall be placed last. These should be seen
  as a "last resort" to e.g. deny access to certain data irrespective
  of any previous declaration. E.g. "*/**/mysecret.docx" will catch
  all copies everywhere as long as they share the same name.
  For the same reason, this must also be the last resort because
  it may unintentionally break things for other people.

> Precedence ordering is fine between identical exact matches or between
> equivalent wildcard matches for the same path, but not between these two
> categories.

Why? The only difference is that you can't do the "catch all"
anymore while not providing any tangible benefit: Ordering
must still be evaluated and if people don't sort their path rules
by path, they will still get into trouble due to losing overview.

-- Stefan^2.
Received on 2014-07-29 13:35:07 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]