[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Another working copy library

From: Jonathan Gilbert <o2w9gs702_at_sneakemail.com>
Date: 2007-01-17 17:37:42 CET

At 11:36 PM 1/16/2007 -0800, David Anderson wrote:
>I've been kicking the thought around for a while now, so I'll get it
>out here in the open.
>I think we all know about the "organic" growth of libsvn_wc. As more
>large projects like gcc or KDE adopt Subversion, they are starting to
>also run into scalability issues with the working copy library that
>cannot be resolved easily (not to mention companies that have groaned
>a little about this).
> - Doesn't play well with other commandline tools. When I do find or
>grep runs over a working copy, I always have to pipe that through
>`grep -v .svn` to filter out all the dupes. The tool still has to
>crawl twice the number of inodes and output twice the actual amount of
>data. And yes, I'm sure there is a nifty hidden switch in both find
>and grep that would let me exclude this intelligently. I'm sure I can
>find another tree-crawling tool that we equally break and that doesn't
>have an exclusion capability.

The replacement of the WC library is clearly a long-term goal, probably not
even in the same major version. For a shorter-term goal, what about storing
the text-base inside a single file with a format similar to ZIP but
compressed with the QuickLZ library? This would eliminate the majority of
false positives with file searching tools and save disk space with probably
no noticeable speed impact (it might even be faster due to improved
locality, if the archive were kept defragmented).

Personally, I don't want to see text-base go away. I don't have any
objections to it being optional, but I'd be unhappy if some future version
of Subversion no longer had the capability to do diffs quickly.

>libsvn_wc_sqlite stores all the metadata for a working copy in a
>*single* SQLite database. This sqlite database is located in a .svn
>subdirectory inside the root of the working copy. So, for example, if
>you were to check out the svn trunk from svn.collab.net, you would
>have trunk/.svn containing wc.db (and probably some other very
>lightweight stuff, like a wc version file). There is no other .svn
>directory anywhere else in the working copy. When you invoke an svn
>command that needs to look at the working copy, libsvn_wc_sqlite walks
>back up the tree from the cwd until it finds a .svn directory, and
>uses that metadata for the entire tree rooted at that directory.

As others have mentioned, I can see GUI tools continuously crawling up the
tree, though perhaps that could be handled with caching heuristics with the
downside that there would be a window between changes to the WC structure
and the GUI updating..

>Another thing I'd very much like is to completely eliminate all
>implicit tree crawls. The metadata is the working copy, unless the
>user requests a forced crawl to update metadata for some reason.

Note that crawling up the tree is still a crawl of some sort :-)

>This implies telling Subversion about all operations on versionned
>data. We already do that for all operations, except for edits. I'd
>like to change that. libsvn_wc_sqlite checks out the working copy
>entirely read-only, and you have to tell svn (through something like
>`svn edit file`... Yes, I have been using perforce lately) that you
>are touching it, at which point it'll record that in the metadata and
>flip the file to be writable.

I really don't like this idea :-) What I do like is this:

At 03:51 AM 1/17/2007 -0800, David Anderson wrote:
>So SQLite is not as efficient as writing our own raw database format
>fine-tuned for svn, or for that matter, our own OS-level filesystem
>specially built to handle svn working copies.

I think there is great potential for a Subversion filesystem. It need not
be anything really complicated, and could even be in part implemented on
top of the existing libraries and WCs. Within the tree mounted through the
Subversion filesystem, of course, the .svn directories would be actually
hidden (I wonder if UNIX systems can support 'cd'ing into a directory that
the filesystem driver didn't report as part of an 'ls -a'...) and not just
dot files, so that file search utilities would not ever pick up metadata.
More importantly, changes made to files would *always* be detectable, even
if the application making the change sets the file time back to precisely
what it was before the change.

I would absolutely love to see a filesystem driver for SVN :-)

On UNIX systems, FUSE would be the natural choice, so that the Subversion
filesystem code be user-mode. As for Windows systems, I did a quick search
and found that there has been at least some work done on user-mode
filesystem drivers. I'm not sure if the results of that work are available,
but it has at least been proven that the FUSE concept works on Windows too.
(See: "Creating User-Mode Device Drivers with a Proxy", Galen C. Hunt)

Jonathan Gilbert

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jan 17 17:47:00 2007

This is an archived mail posted to the Subversion Dev mailing list.