Greg Hudson wrote:
>(Incidentally, repeating yourself over and over again on this point,
>without presenting any new arguments, as you did in
><40B52A9E.1040002@xbc.nu> in response to Ben, isn't really helping your
>argument.)
>
>
As usual, I keep forgetting people can't read my mind, and I'm drawing
on my experience in implementing "real" filesystems, which is probably
not common... This explanation could get a bit longish.
I'm looking at the lock lookup table as a transition path towards fixing
one of the deficiencies of the current FS schema (at least the
database-backend version), and that's inefficient representation of the
revision/path index.
In a filesystem -- Subversion's FS is no exception -- most operations
can be broken down into two stages: a) resolve the path into a
filesystem object (node revision), and b) do your stuff with that
object. All access checks, symlink translation, etc. happen during the
first phase, which is usually the most time-critical part of any
filesystem implementation.
In the current BDB schema, directory contents are represented as
variable-sized, unsorted list of entries. In order to find a name in the
directory, the FS has to read the whole list and scan it sequentially,
which means that the cost of path lookups is proportional to directory
size in both time and memory. It also means that we have to write a
complete new copy of the entries list if a single entry changes, which
uses up a fair amount of space in the repository.
Incidentally, many "real" filesystems make the same mistake, most
notably older Unix filesystems. Modern FS's (e.g., NTFS, ReiserFS, XFS)
use some sort of ordering/indexing to make each lookup cost O(log N)
time (NTFS and XFS use a B-tree). Another common way to speed up the
first phase is to put all the information needed for the lookup, access
control, etc. into the same index, increasing locality of reference.
NTFS, for example, puts directory entries, ACLs, and even the contents
of small files into a single B-tree.
In the near future, we'll be adding locking and later ACLs to
Subversion. Both features require a path->object lookup. Assuming we
have directory locks, both require a cheap way to propagate inherited
values. Another feature that's on our list somewhere are unversioned
properties, perhaps even unversioned files(!). All of these could use a
single path-based lookup index. My vision is to slowly evolve this index
until the point where, by 2.0, it can replace the current directory
entries list.
Of course, life wouldn't be interesting if this index could be a simple
path->info mapping. BDB supports variable-length keys, but table-based
databases (which we want to support in future) are notoriously cranky
about rejecting such, and they're not very efficient in BDB, either. So
the structure of the index would have look like this:
(dir-id, name, [txn-id]) -> info
The txn-id is needed for ACLs, for example, because the same path can
have different ACLs in different revisions. The lookup would take a
reference txn id (e.g., resolved from a revision number) and do a "find
first smaller than" type of key search. HEAD would have to be
represented by a magic value; locks are always in HEAD.
The dir-id is a unique directory identifier that sets the context for
the name. We may be able to use a (node-id, copy-id) pair for that; I
haven't thought through all the ramifications. If not, we'll have to
generate a unique directory ID for this index, which is not so nice.
The info contains the bits and pieces attached to the path:
* lock object, ACL, unversioned prop,... for this entry
* If the entry represents a directory:
o the dir-id of this directory, used by the next level of lookup
o a flag indicating whether the lock/acl/prop/... is inheritable
o maybe a flag if a lock exists anywhere _beneath_ this directory
The last item could be an optimisation for recursive directory locking:
if this flag is set, then a recursive lock on the directory would
conflict with an existing lock somewhere beneath it (yes, even if you
own the lock). Maintaining this flag is a bit tricky, but probably
cheaper than scanning the whole subtree for possible lock conflicts. It
might require a few tweaks in the structure of the index to get an
optimal implementation, though.
You'll notice that I didn't spend too much time worrying about how this
design scales from locks/acls -- which are expected to be fairly
"sparse" in the tree -- to a full-blown directory index. There are some
tricky questions to answer; for example, I'm still not sure how to avoid
having to write a zillion index entries every time a directory changes,
either directly or through bubble-up; or how to keep the constant-time
nature of directory copies. It might be possible to even avoid
"physical" bubble-up completely. Anyway, these ruminations need a lot of
evolving first.
There. I've put my brane on display. Hope you enjoy mucking through it. :-)
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri May 28 08:53:15 2004