[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: Effects of importing over 18000 items into repository as one commit transaction

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2006-01-11 18:28:51 CET

Ben Collins-Sussman wrote:
> On 1/11/06, Julian Foad <julianfoad@btopenworld.com> wrote:
>
>>That document mentions the problem of O(N) behaviour for copy operations, in
>>section 3 of each "chapter". It doesn't mention such a problem for any other
>>situation. Do you want to patch it to warn of O(N) behaviour in other or all
>>cases?
>
> I'm talking about the section on revision properties:

Oh, thanks, Ben - I get it now: I thought our document specifically warned of
an O(N) speed problem in the case of copies, implying that there is no such
problem in other cases. But what it means is just that O(N) behaviour is
surprising for copies because we otherwise expect them to have O(1) behaviour,
whereas in other cases O(N) behaviour should be expected.

> In order to decide which revprops are okay to display, all of the
> changed-paths in a given revision have to be examined. This isn't
> something that can be optimized away with clever uses of trees,
> red-black elimination, etc. What must be determined is whether
>
> * all the changed-paths are readable
> * some of the changed-paths are readable
> * none of the changed-paths are readable.
>
> And that's just an O(N) search (where N == number of changed-paths in
> the revision), no matter what you do.

Obviously when we say "the operation is O(N)" that's a simplification: some
parts of the operation certainly have to iterate over the N paths, but the
question is whether a significant amount of time is spent doing something that
could be done more quickly. For example, if the authz check could skim through
the list of paths to find the root-most one and then perform its expensive ACL
check on that, it might then be able to eliminate many other paths from needing
an expensive check. Note: "might" - it wouldn't work in all cases or with all
authz systems. And note that I'm not saying mod_dav_svn could do this; it
would have to be done in the applicable authz module.

> Our authz_read() callback operates on single paths, and that's it.

OK, I see. I guess you're saying we can't change that API, so we're stuck. If
we could add and use a "check this list of paths" API it could potentially be
much faster, but I guess that API is external to Subversion and not easily
influenced by us. (Sorry for trying to discuss it without knowing the
architecture.)

So, the bottom line is (probably) that if one configures one's repository to
use an authz system that doesn't have very fast performance, it will slow down
certain Subversion operations that consider a large set of paths in the
repository, one example being "svn log".

There are of course further ideas that could be explored if this becomes a big
problem. We could reconsider the mapping of how many paths are readable to
which rev-props are readable. We could consider partly caching the readability
of rev-props (may be impossible to do correctly, I know).

Anyway, thanks for the insights. That's the limit of my interest in the
matter. I thought to begin with that the slow-down was implied to be with FSFS
only, and therefore that it might be a simple algorithmic inefficiency under
our control. In fact, we still don't know whether this authorization behaviour
was the cause of the original poster's problem.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jan 11 18:42:20 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.