[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1614703 - in /subversion/branches/authzperf: BRANCH-README subversion/include/svn_string.h subversion/libsvn_repos/authz.c subversion/libsvn_subr/string.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 11 Aug 2014 18:25:45 +0200

On Mon, Aug 4, 2014 at 2:53 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:

> On 30 July 2014 20:27, <stefan2_at_apache.org> wrote:
> > Author: stefan2
> > Date: Wed Jul 30 16:27:44 2014
> > New Revision: 1614703
> >
> > URL: http://svn.apache.org/r1614703
> > Log:
> > On the authzperf branch: Implement support for generic / complex
> > wildcard patterns like "/foo*bar*baz/".
> >
> > To do this efficiently requires normalized pattern paths, so we
> > add the normalization logic as described here:
> > https://wiki.apache.org/subversion/AuthzImprovements
> >
> Stefan,
>
> Please find my comments inline.
>

[...]

> +normalize_wildcards(const char *path,
> > + apr_pool_t *result_pool)
> > +{
> > + const char *pos;
> > + svn_stringbuf_t *buffer = svn_stringbuf_create(path, result_pool);
> > +
> > + /* Reduce sequences variable-length segment matches to single segment
> > + * matches with the other segment patterns reduced to "*". */
> > + while (svn_stringbuf_replace_all(buffer, "/**/**/", "/*/**/"))
> > + ;
> > +
> > + /* Our tree traversal is more efficient if we put variable segment
> count
> > + * wildcards last. */
> > + while (svn_stringbuf_replace_all(buffer, "/**/*/", "/**/*/"))
> > + ;
> > +
> > + /* Reduce trailing "**" a single "*". */
> > + while ( buffer->len > 1
> > + && buffer->data[buffer->len-1] == '*'
> > + && buffer->data[buffer->len-2] == '*')
> > + svn_stringbuf_remove(buffer, buffer->len-1, 1);
> > +
> > + /* Reduce "**" _inside_ a segment to a single "*". */
> > + for (pos = strstr(buffer->data, "**"); pos; pos = strstr(pos, "**"))
> > + if ((pos > buffer->data && pos[-1] != '/') || (pos[2] != '/'))
> > + svn_stringbuf_remove(buffer, pos - buffer->data, 1);
> > + else
> > + ++pos;
> Quick quiz: I'm only one dev@ list who do not understand magic peace
> of code above?

I honestly hope you are. Nice typo, though ;)

> I'm fine if majority of Subversion devs do not consider
> such code as cryptic unreviewable magic.
>

[...]

> > +static svn_boolean_t
> > +match_pattern(const char *s,
> > + const char *pattern)
> > +{
> > + /* Matching a wildcard pattern is trival:
> > + * PATTERN can be considered a list of literal strings separated by
> '*'.
> > + * We simply have to find all sub-strings in that order, i.e. we can
> do
> > + * so greedily. Be careful to match prefix and suffix correctly.
> > + */
> > + char pattern_char, s_char;
> > +
> > + /* The prefix part of PATTERN needs special treatment as we can't just
> > + * match any substring of S. */
> > + if (pattern[0] != '*')
> You'are using inconsitent style to access first char in pattern:
> pattern[0] and *pattern. It makes code review harder. Why you're doing
> this way?
>

You are right that both are interchangeable.
However, I prefer the index notation when treating
something as a string and the pointer notation when
it is (moving) pointer. Since both are the same type
in C, there is some gray area where I would useither.

>
> > + {
> > + apr_size_t match_len;
> > +
> > + /* match_to_next_wildcard() assumes a that the first char
> matches. */
> > + if (pattern[0] != s[0])
> > + return FALSE;
> > +
> > + /* Match up to but not beyond the next wildcard. */
> > + match_len = match_to_next_wildcard(s, pattern);
> > + if (match_len == 0)
> > + return FALSE;
> > +
> > + /* Continue at next wildcard or end-of-string. */
> > + pattern += match_len;
> > + s += match_len;
> > + }
> > +
> > + /* Process all of PATTERN and match it against S char by char. */
> > + while ((pattern_char = *pattern))
> > + {
> > + apr_size_t match_len = 0;
> > +
> > + /* If PATTERN ended on a wildcard, S can be nothing but a match.
> */
> > + pattern_char = *++pattern;
> You're assigning patern_char in while() condition and in while body.
> What is the point for such code?
>

That's a remnant of an older version of this function.

> I'm also find expression like "pattern_char = *++pattern" very hard to
> follow.
>

[...]

As I learnt recently, there is an APR function that already covers
all the odd pattern matching that we need here. Since r1617148.
the above code has been completely removed.

>
> > +apr_size_t
> > +svn_stringbuf_replace_all(svn_stringbuf_t *str,
> > + const char *to_find,
> > + const char *replacement)
> > +{
> Please add tests for this function. Code doesn't seems trivial and
> does potential dangerous things like direct manipulation of stringbuf
> using memcpy.
>

Done in r1617144.

>
> > + apr_size_t replacements = 0;
> > +
> > + apr_size_t current = 0;
> > + apr_size_t original_length = str->len;
> > +
> > + apr_size_t to_copy;
> > + apr_size_t to_find_len;
> > + apr_size_t replacement_len;
> > + apr_size_t new_length;
> > +
> > + /* Early exit. */
> > + const char *pos = strstr(str->data, to_find);
> > + if (pos == NULL)
> > + return 0;
> > +
> > + to_find_len = strlen(to_find);
> > + replacement_len = strlen(replacement);
> > +
> > + /* We will store the new contents behind the NUL terminator of the
> current
> > + * data and track the total length in STR->LEN to make the
> reallocation
> > + * code preserve both bits. However, we need to keep the NUL between
> them
> > + * to make strstr stop at that boundary. */
> Why you've implemented so complicated algorithm instread of calling
> existing svn_stringbuf_replace() for every match?
>

O(N) vs. O(N^2). There is certainly a lot of further tuning
potential in various cases (to_find_len == replacement_len etc.)
but the current implementation has decent performance and
good worst-case behaviour.

You're still reallocating buffer multiple times and moving data
> arround, so I doubt that it significantly more performant that simple
> implementation based on svn_stringbuf_replace().
>

You are mistaken. svn_stringbuf_t has O(N) guarantees for
growing to string length N.

[...]

> I'm also +1 Brane comment about using maintaner mode and do not commit
> code with compilation warnings.
>

I accidentally hadn't activated it for that branch (I often switch
between configuration options). Once activated it showed
exactly two warnings that were not deprecations. So, on my
machine, the code *is* clean.

To get any of the size conversion warnings that I fix occasionally,
I have to switch to macos.

-- Stefan^2.
Received on 2014-08-11 18:26:15 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.