[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1442088 - Improve DAG node / path lookup performance in FSFS

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Sat, 23 Feb 2013 01:26:39 +0100

On Fri, Feb 22, 2013 at 5:43 PM, Julian Foad <julianfoad_at_btopenworld.com>wrote:

> Bert Huijben wrote:
> > [Julian]
> > I don't like this 'is_canonical' flag. In the rest of Subversion, nearly
> > all APIs assume paths are canonical already [...]
> > I see that many of the functions in this file currently appear to be
> > accepting and dealing with non-canonical paths, but I think the way to
> go is
> > to change that.
> > <<
> >
> > [Stefan]
> > Turned out not to be too much work to fix all callers.
> > <<
> >
> > I still see that we canonicalize the paths on +- every api. (About 20
> calls
> > in tree.c and dag.c)
>

We need to do that on API level but not internally.
Julian's suggestion to push the canonicalization
(canonization?) to the FS loader is properly the
right thing to do and will simplify BDB code as well.

> > Eventually this api should move to something like an
> > assert(is_canonical(path,…)) on all functions and just assume that
> callers
> > do the right thing, like we do everywhere else
> >
> > Canonicalizing is everywhere is expensive and error prone.
>
> Yes. I've just been looking at this, and I'd like to expand on the "error
> prone" aspect.
>
> I don't know if we've ever fully specified what is a valid FS path, but I
> noticed that svn_fs__canonicalize_abspath() doesn't treat a "." path
> segment as special, whereas all our other path APIs treat a "." segment as
> non-canonical.
>
> That means that if anything passes a path with a "." segment into the FS
> layer API, this canonicalization will not remove it and then some other
> function will throw an assertion failure or produce the wrong result, for
> some definition of 'wrong'. Indeed I tried inserting a '.' in a path in a
> call to svn_fs_node_id() in fs-base-test.c, and an assertion faliure was
> thrown.
>

Hm. "." segments should never occur on the FS level.
My guess is that the main reason for all the c14n was
some caller beings unclean when it comes to adding
'/' to the beginning and the end may maybe producing
"//" artifacts back when paths got glued together by hand.

Also svn_fs__canonicalize_abspath() accepts NULL but no code seems to
> require that -- at least the test suite still passes using FSFS when I
> remove that special case (and remove a test case explicitly testing that
> function with NULL).
>

NULL might also be an artifact from "way back when".

> PROPOSAL
>
> 1. Agree and document the spec for an FS layer path. Something like:
> sequence of non-null UTF-8 characters, leading '/', no trailing '/' unless
> the empty path, no '//', no '.' segments, no '..' segments. (There's no
> need for the FS layer to treat '.' and '..' as special, but since the
> higher layers do it's just easier if we choose the same rules and so can
> use the same path functions.)
>

+1 on specifying what the API wants.

> 2. Delete svn_fs__canonicalize_abspath(); use svn_fspath__canonicalize()
> instead; optimize svn_fspath__canonicalize(), and consider whether we want
> a variant of that function that doesn't always allocate in the new pool.
>

FYI, svn_fspath__canonicalize() does not handle ".." segments
(not too important here but raised an eyebrow when I read the code).

 3. Move the canonicalization calls out of the vtable implementation
> functions, into the FS vtable wrapper functions. Consider whether it needs
> to canonicalize all non-canonical paths. May be better to throw an error
> here for some cases -- especially for a '..' segment and perhaps also '.'.
>

It seems that there are FSFS-internal and maybe BDB-internal
callers of those functions that don't pass paths in canonical form.
I can't think of a reliable way to find all direct and indirect producers
of paths in FSFS, so we break some rarely used functionality.

Therefore, I suggest postponing the change to 1.9.

> API entry points
> > that must assume non canonical paths should be deprecated, with the
> > deprecated function handling the canonicalization. (Like
> > svn_path_url_add_component which had some assumptions built in).
>
> 4. Consider whether the FS API still needs to canonicalize its input paths
> in this way. I guess at the moment, for backward compatibility, it does,
> but we could rev the FS API and change that.
>

I think we should rev the API in 1.9 and eliminate internal
c14n from FSFS and BDB code. I can see that as being
another aspect in the general refactoring of FSFS code.

> > All our apis assume paths are canonical and I currently see code
> (get_dag())
> > that tries something and if it fails canonicalizes the path and tries
> again.
>

Looks weird but it speeds up typical usage of our FS API
layer: If we hit the cache, the path has been in canonical
form - otherwise, we would not have found the entry keyed
by this path. DAG lookup is really computationally expensive
these days compared to other parts.

> We should really get away from this kind of patterns wherever we can.
> > (Security boundaries might be an exception)
>
> Agreed.
>

No disagreement here.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*
http://www.wandisco.com/subversion/download
*
Received on 2013-02-23 01:27:11 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.