[Various] filesystem changes
From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2001-02-09 23:24:07 CET
I should have posted to the list from the beginning. Mea culpa.
------- Start of forwarded message -------
Topics:
----------------------------------------------------------------------
Date: Wed, 7 Feb 2001 17:09:58 -0500 (EST)
Branko and Greg,
As a result of:
- a very long phone conversation with Karl, ending with the conclusion
yesterday, I've decided to make some pretty fundamental changes in the
I'm explaining it to you guys instead of the list because I think
The essential idea is:
This basically means that nodes in a transaction are identified solely
The first step is to fix up `svn_fs.h' and `structure' to explain how
------------------------------
Date: Fri, 09 Feb 2001 19:43:32 +0100
Jim Blandy wrote:
> yesterday, I've decided to make some pretty fundamental changes in the
Oooh, I like that. Cloning is frowned upon in the civilised world. :-)
> I'm explaining it to you guys instead of the list because I think
Be careful here. Right now we'll be doing a linear search through the
I'd suggest creating an index out of the whole filesystem structure,
(<path>, <revision>) -> <node revision id>
or something like that, for every object in the filesystem. We can hang
> This means that the sort of inter-process clone tracking
Creating a new mutable node will bubble up to the root of the FS within
> The first step is to fix up `svn_fs.h' and `structure' to explain how
I can hardly wait!
-- Brane �ibej home: <brane_at_xbc.nu> http://www.xbc.nu/brane/ work: <branko.cibej_at_hermes.si> http://www.hermes-softlab.com/ ACM: <brane_at_acm.org> http://www.acm.org/ ------------------------------ Date: 09 Feb 2001 15:48:58 -0500 From: Jim Blandy <jimb@zwingli.cygnus.com> To: Branko =?ISO-8859-2?Q?=C8ibej?= <brane@xbc.nu> Cc: Jim Blandy <jimb@zwingli.cygnus.com>, Greg Hudson <ghudson@mit.edu>, Karl Fogel <kfogel@galois.ch.collab.net>, Ben Collins-Sussman <sussman@newton.ch.collab.net>, Mike Pilato <cmpilato@collab.net>, Greg Stein <gstein@lyra.org> Subject: Re: filesystem changes Message-ID: <npvgqjh9ad.fsf@zwingli.cygnus.com> References: <200102072209.RAA09614@zwingli.cygnus.com> <3A843A54.7040809@xbc.nu> Hmm, now we're getting into stuff that should be discussed on the public list. I guess it wasn't a good idea to post privately. Oh well. > > The essential idea is: > > - Remove svn_fs_node_t. > > - Introduce svn_fs_root_t, representing the root of some revision or > > transaction. > > - Change all operations to refer to files by an svn_fs_root_t and a path > > instead of an svn_fs_node_t. > > > > This basically means that nodes in a transaction are identified solely > > by their path, which we traverse from the root every time we access > > the file. > > Be careful here. Right now we'll be doing a linear search through the > ENTRIES list for every name lookup, and reading record data every time, > too. That could slow things down considerably. Well, keep in mind that Berkeley DB is caching stuff for us. So we won't usually be doing any I/O. > I'd suggest creating an index out of the whole filesystem structure, > like we talked about when we were discussing ACLs: > > (<path>, <revision>) -> <node revision id> Right, but when you do a rename of something high up in the directory hierarchy, now you have an unbounded number of index entries to revise... :( Keep in mind also that, because of the sharing, the number of entries in such an index would be, in general, exponential in the number of actual nodes in the repository. :( :( > > This means that the sort of inter-process clone tracking > > the `clones' table was supposed to permit is now completely > > unnecessary, as the parents of a node will never change or be cloned > > while we're working on it. > > Creating a new mutable node will bubble up to the root of the FS within > one (database) transaction, right? Yep. It'll bubble up from the node to that transaction's root, or to the first already-cloned parent node. > > The first step is to fix up `svn_fs.h' and `structure' to explain how > > things will work. For `structure', you'll mostly see deletions. I'll > > be committing those changes today or tomorrow. > > I can hardly wait! (Translation: "Jim's completely lost his marbles!") :) ------------------------------ Date: Fri, 09 Feb 2001 22:36:48 +0100 From: Branko =?ISO-8859-2?Q?=C8ibej?= <brane@xbc.nu> To: Jim Blandy <jimb@zwingli.cygnus.com> CC: Greg Hudson <ghudson@mit.edu>, Karl Fogel <kfogel@galois.ch.collab.net>, Ben Collins-Sussman <sussman@newton.ch.collab.net>, Mike Pilato <cmpilato@collab.net>, Greg Stein <gstein@lyra.org> Subject: Re: filesystem changes Message-ID: <3A8462F0.9020203@xbc.nu> References: <200102072209.RAA09614@zwingli.cygnus.com> <3A843A54.7040809@xbc.nu> <npvgqjh9ad.fsf@zwingli.cygnus.com> Jim Blandy wrote: > Hmm, now we're getting into stuff that should be discussed on the > public list. I guess it wasn't a good idea to post privately. Oh well. I'd say just go ahead. gather up these mails and post them to the list, if everybody agrees. (The +1 from me is automagic.) >> Be careful here. Right now we'll be doing a linear search through the >> ENTRIES list for every name lookup, and reading record data every time, >> too. That could slow things down considerably. > > > Well, keep in mind that Berkeley DB is caching stuff for us. So we > won't usually be doing any I/O. I/O isn't the problem, linear searching could be. You don't even need an SVN filesystem to thest that: For typical path lookups, NTFS (which stores directory entries in a B+tree) beats the average Unix filesystem (which uses a list). All of this isn't important for small directories. But ... when I was working on that other versionable filesystem, we saw reports of people using directories with 5-10000 files -- all of them COBOL source. That stressed our caches a bit. >> I'd suggest creating an index out of the whole filesystem structure, >> like we talked about when we were discussing ACLs: >> >> (<path>, <revision>) -> <node revision id> > > > Right, but when you do a rename of something high up in the directory > hierarchy, now you have an unbounded number of index entries to > revise... :( Ouch. > Keep in mind also that, because of the sharing, the > number of entries in such an index would be, in general, exponential > in the number of actual nodes in the repository. :( :( Ouch ouch. You're right, have to think of something else. It might be enough to keep the entries list sorted, to get O(log N) searches. No, wait -- the skel structure practiaclly forces sequential access ... But no, you don't have to keep a full index! And you don't want to, or you'd have to insert a whole repository's worth of entries on every commit. It's enough to record a new index entry only on those commits where the node changes. Say you have two versions of foo/bar.c: the index would contain (foo/bar.c 213) -> 47.1 (foo/bar.c 329) -> 47.2 Even though you don't have an entry for revision 291, you can find the rigth node revision by looking for the youngest revision prior to that -- and get 213. Before 213, there was no foo/bar.c. Hmm. You'd have to record deletions, of course ... Oh damn, a rename sonwhere in the path would still burn you. How about (<dir node id>, <entry name>) -> <entry node id> ? You still get duplicates (although not so many), and a rename won't bother you. But any sort of index would mean duplicating the directory entry list and having to keep the two in sync. Bad, bad. *Except* if the index is the only place where you store this; and you rip out the ENTRIES list from DIR skels. -- Brane �ibej home: <brane_at_xbc.nu> http://www.xbc.nu/brane/ work: <branko.cibej_at_hermes.si> http://www.hermes-softlab.com/ ACM: <brane_at_acm.org> http://www.acm.org/ ------------------------------ End of forwarda000In Digest *************************** ------- End of forwarded message -------Received on Sat Oct 21 14:36:21 2006 |
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.