Branko =?ISO-8859-2?Q?=C8ibej?= <firstname.lastname@example.org> writes:
> > Well, keep in mind that Berkeley DB is caching stuff for us. So we
> > won't usually be doing any I/O.
> I/O isn't the problem, linear searching could be. You don't even need an
> SVN filesystem to thest that: For typical path lookups, NTFS (which
> stores directory entries in a B+tree) beats the average Unix filesystem
> (which uses a list).
> All of this isn't important for small directories. But ... when I was
> working on that other versionable filesystem, we saw reports of people
> using directories with 5-10000 files -- all of them COBOL source. That
> stressed our caches a bit.
Yeah, this is something I'm interested in addressing in the long
term. For 1.0, though, I think we should leave it as is. We know the
Linux filesystem is good enough for a lot of stuff.
> How about
> (<dir node id>, <entry name>) -> <entry node id>
> You still get duplicates (although not so many), and a rename won't
> bother you.
> But any sort of index would mean duplicating the directory entry list
> and having to keep the two in sync. Bad, bad. *Except* if the index is
> the only place where you store this; and you rip out the ENTRIES list
> from DIR skels.
Yes, when the time comes, I think something like this is the way to
go: simply move directory entries out into a table somewhere. Choose
the right ordering predicate, and you'll get Berkeley DB's btrees
managing your directory searches for you. (Does this mean
transactions needn't lock the whole directory to change one entry?
Even better!) The structure still needs to allow sharing of entries
between similar revisions of a directory, of course, but there are
ways to make that work.
Perhaps the table should be
DIR-NODE-ID ENTRY-NAME REVISION -> ENTRY-NODE-ID
where REVISION is the filesystem revision in which this directory
entry was last changed. Use a table order that makes it easy to
search for "the last rev no later than N". To delete a directory
entry, you'd add a table entry of the form:
DIR-NODE-ID ENTRY-NAME REVISION -> deleted
which would mask table entries for the younger directory entries.
This is very much like SCCS, which tags every line that ever appeared
in the file with the set of the revisions in which it appears.
When the time comes...
Received on Sat Oct 21 14:36:21 2006