Stefan Fuhrmann wrote on Sun, Feb 12, 2012 at 13:57:23 +0100:
> On 12.02.2012 06:27, Branko Čibej wrote:
> >On 12.02.2012 02:52, Stefan Fuhrmann wrote:
> >
> >>The silly part of FSFS is that it does not optimize access
> >>paths, yet, but stores changes individually. The challenge
> >>is our two-dimensional key space and the fact that different
> >>operations traverse the data along different dimensions
> >>(e.g. log ./. checkout).
> >Interestingly enough, the 2D keyspace isn't that big a problem.
>
> On the physical level, it becomes a problem as files
> model a linear data stream, i.e. you have to find a
> mapping onto a 1-dimensional structure that keeps
> related / dependent information closely together.
>
So perhaps solve _that_ problem?
Write a filesystem/disk driver for two-dimensional files, that store
their bytes in the same physical sector on multiple platters. A reader
can then read either along a sector, or read the same sector on
cosecutive platters...
> > The real
> >issue is that we don't even represent all the relevant keys, because of
> >the lazy copying of subtrees. That's what actually prevents us from
> >doing one-shot lookups of arbitrary path_at_rev; and even then, we'd only
> >really have to do a step-by-step top-down resolve if the initial fast
> >lookup failed.
>
> How do you verify that path_at_rev even exists?
>
> For instance, you could maintain an index of (only)
> every path ever touched. As long as we want to keep
> our O(1) delete operations for paths, entries in that
> index cannot be updated / removed. So, we need
If the cache contains only fspaths, with no metadata about them, sure.
But if the cache can map fspaths to a set of revision ranges...
> to always verify that there is actually a path from /@rev
> to path_at_rev.
>
> >>Question: how many entries would a direct lookup structure
> >>need to have (i.e. path_at_rev -> data pointer)? Keep in
> >>mind that may valid paths like /branches/foo/bar will never
> >>be mentioned anywhere in a SVN repository because they
> >>never got touched under that name. A rough estimation for
> >>a fully expanded list of entries is
> >>
> >>#nodesInTrunk * #openBranches * #revisions
> >>
> >>This yields 10^9 entries for small repositories and>10^14
> >>for KDE-sized ones. Clearly impractical.
> >So that's not how you do it. :)
> >
> >You'd only need one reference per actual node, not per possible node
> >lookup paths including revisions. Obviously you have to resolve path_at_rev
> >to a concrete node before you can do anything with its attributes. With
> >the caveat that there's a nasty edge case involving not-yet-lazy-copied
> >child nodes; but given the way we currently crawl the tree, that's not
> >really an issue.
>
> We simply follow the tree, i.e. one lookup per parent.
> With any luck most of the intermediate nodes have
> already been cached, which is exactly what makes our
> differential trees so efficient: as long as a sub-tree is
> not being modified, all copies and future revisions
> share the *same* representation.
>
> But maybe we are simply talking past each other here.
> So, could you give a short summary of what you consider
> wrong / inefficient about SVN / FSFS and how you would
> like to see this addressed.
>
> -- Stefan^2.
Received on 2012-02-12 17:59:30 CET