[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1442088 - Improve DAG node / path lookup performance in FSFS

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 22 Feb 2013 16:43:32 +0000 (GMT)

Bert Huijben wrote:
> [Julian]
> I don't like this 'is_canonical' flag.  In the rest of Subversion, nearly
> all APIs assume paths are canonical already [...]
> I see that many of the functions in this file currently appear to be
> accepting and dealing with non-canonical paths, but I think the way to go is
> to change that.
> <<
>
> [Stefan]
> Turned out not to be too much work to fix all callers.
> <<
>
> I still see that we canonicalize the paths on +- every api. (About 20 calls
> in tree.c and dag.c)
>
> Eventually this api should move to something like an
> assert(is_canonical(path,…)) on all functions and just assume that callers
> do the right thing, like we do everywhere else
>
> Canonicalizing is everywhere is expensive and error prone.

Yes.  I've just been looking at this, and I'd like to expand on the "error prone" aspect.

I don't know if we've ever fully specified what is a valid FS path, but I noticed that svn_fs__canonicalize_abspath() doesn't treat a "." path segment as special, whereas all our other path APIs treat a "." segment as non-canonical.

That means that if anything passes a path with a "." segment into the FS layer API, this canonicalization will not remove it and then some other function will throw an assertion failure or produce the wrong result, for some definition of 'wrong'.  Indeed I tried inserting a '.' in a path in a call to svn_fs_node_id() in fs-base-test.c, and an assertion faliure was thrown.

Also svn_fs__canonicalize_abspath() accepts NULL but no code seems to require that -- at least the test suite still passes using FSFS when I remove that special case (and remove a test case explicitly testing that function with NULL).

PROPOSAL

1. Agree and document the spec for an FS layer path.  Something like: sequence of non-null UTF-8 characters, leading '/', no trailing '/' unless the empty path, no '//', no '.' segments, no '..' segments.  (There's no need for the FS layer to treat '.' and '..' as special, but since the higher layers do it's just easier if we choose the same rules and so can use the same path functions.)

2. Delete svn_fs__canonicalize_abspath(); use svn_fspath__canonicalize() instead; optimize svn_fspath__canonicalize(), and consider whether we want a variant of that function that doesn't always allocate in the new pool.

3. Move the canonicalization calls out of the vtable implementation functions, into the FS vtable wrapper functions.  Consider whether it needs to canonicalize all non-canonical paths.  May be better to throw an error here for some cases -- especially for a '..' segment and perhaps also '.'.

> API entry points
> that must assume non canonical paths should be deprecated, with the
> deprecated function handling the canonicalization. (Like
> svn_path_url_add_component which had some assumptions built in).

4. Consider whether the FS API still needs to canonicalize its input paths in this way.  I guess at the moment, for backward compatibility, it does, but we could rev the FS API and change that.

> All our apis assume paths are canonical and I currently see code (get_dag())
> that tries something and if it fails canonicalizes the path and tries again.
>
> We should really get away from this kind of patterns wherever we can.
> (Security boundaries might be an exception)

Agreed.

- Julian
Received on 2013-02-22 17:44:09 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.