On 12.02.2012 02:52, Stefan Fuhrmann wrote:
> The silly part of FSFS is that it does not optimize access
> paths, yet, but stores changes individually. The challenge
> is our two-dimensional key space and the fact that different
> operations traverse the data along different dimensions
> (e.g. log ./. checkout).
Interestingly enough, the 2D keyspace isn't that big a problem. The real
issue is that we don't even represent all the relevant keys, because of
the lazy copying of subtrees. That's what actually prevents us from
doing one-shot lookups of arbitrary path_at_rev; and even then, we'd only
really have to do a step-by-step top-down resolve if the initial fast
lookup failed.
> Question: how many entries would a direct lookup structure
> need to have (i.e. path_at_rev -> data pointer)? Keep in
> mind that may valid paths like /branches/foo/bar will never
> be mentioned anywhere in a SVN repository because they
> never got touched under that name. A rough estimation for
> a fully expanded list of entries is
>
> #nodesInTrunk * #openBranches * #revisions
>
> This yields 10^9 entries for small repositories and >10^14
> for KDE-sized ones. Clearly impractical.
So that's not how you do it. :)
You'd only need one reference per actual node, not per possible node
lookup paths including revisions. Obviously you have to resolve path_at_rev
to a concrete node before you can do anything with its attributes. With
the caveat that there's a nasty edge case involving not-yet-lazy-copied
child nodes; but given the way we currently crawl the tree, that's not
really an issue.
-- Brane
Received on 2012-02-12 06:28:53 CET