[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Another working copy library

From: Ivan Zhakov <chemodax_at_gmail.com>
Date: 2007-01-17 10:25:10 CET

On 1/17/07, David Anderson <dave@natulte.net> wrote:
> I've been kicking the thought around for a while now, so I'll get it
> out here in the open.
> I think we all know about the "organic" growth of libsvn_wc. As more
> large projects like gcc or KDE adopt Subversion, they are starting to
> also run into scalability issues with the working copy library that
> cannot be resolved easily (not to mention companies that have groaned
> a little about this).
> The ones I can think of right now:
> - Having to crawl the entire tree on most operations to compute local
> changes. While not too bad for a vast majority of users, large trees
> take a long time to crawl, and don't even get me started on large tree
> + nfs.
> - Storing metadata all over the place. This is by design, to allow
> for working copy severability. It's a nice feature, but it's many more
> inodes than is reasonable on large trees. And it's another component
> of having to recrawl a huge tree for most operations.
> - Text-base storage. Being able to forego these, or even fetch them
> on-demand, has been a feature request for a long, long time. It
> doesn't look to me like we'll be implementing this any time soon,
> because of the state of libsvn_wc.
> - Doesn't play well with other commandline tools. When I do find or
> grep runs over a working copy, I always have to pipe that through
> `grep -v .svn` to filter out all the dupes. The tool still has to
> crawl twice the number of inodes and output twice the actual amount of
> data. And yes, I'm sure there is a nifty hidden switch in both find
> and grep that would let me exclude this intelligently. I'm sure I can
> find another tree-crawling tool that we equally break and that doesn't
> have an exclusion capability.
> So, basically, I think that our working copy design has worked okay
> for most people, but it's now shown its limits, and it might be time
> for a change.
> So, I want to break libsvn_wc.
> Okay, now, calm down, and read through before killing me.
> I've been thinking about an alternative to libsvn_wc. The semantics of
> this alternative library would be slightly different, enough that it
> would not be perfectly compatible with the existing libsvn_wc. This is
> why I propose this as an alternative library that follows as much as
> possible of the current svn_wc.h API, that would be selected/enabled
> at runtime, using a dynlib mechanism similar to the one we have for ra
> and fs backends. From here on, I'll refer to this alternative
> implementation as libsvn_wc_sqlite.
Yes, yes, yes! I had the same ideas some time ago. Count me as
volunteer on this feature.

> libsvn_wc_sqlite stores all the metadata for a working copy in a
> *single* SQLite database. This sqlite database is located in a .svn
> subdirectory inside the root of the working copy. So, for example, if
> you were to check out the svn trunk from svn.collab.net, you would
> have trunk/.svn containing wc.db (and probably some other very
> lightweight stuff, like a wc version file). There is no other .svn
> directory anywhere else in the working copy. When you invoke an svn
> command that needs to look at the working copy, libsvn_wc_sqlite walks
> back up the tree from the cwd until it finds a .svn directory, and
> uses that metadata for the entire tree rooted at that directory.
> This preserves working copy portability, and breaks working copy
> severability. That is, you can still move an entire checkout ('trunk'
> in my previous example) somewhere else and have a working WC, but you
> can't move a subtree out (say 'trunk/subversion') and have that be a
> functionning working copy for that subtree. In fact, doing the latter
> would have the effect of exporting that subtree: no metadata, just the
> files.
> I have never really used WC severability, but I understand there are
> use cases, and more importantly, users of this feature. This would be
> a first API exclusive to libsvn_wc_sqlite, something like
> svn_wc_sever(), which takes a subtree of a WC and makes into its own,
> standalone WC by creating a .svn/wc.db there and entering the relevant
> metadata from the parent database. I haven't worked out the exact
> behavior yet from the user's POV, but it would therefore mean
> something along the lines of `svn sever wc_subdir; mv wc_subdir
> somewhere_else`, instead of the current `mv wc_subdir somewhere_else`.
I don't think that this feature is so required for users. At least for
the first time.

> Another thing I'd very much like is to completely eliminate all
> implicit tree crawls. The metadata is the working copy, unless the
> user requests a forced crawl to update metadata for some reason.
> This implies telling Subversion about all operations on versionned
> data. We already do that for all operations, except for edits. I'd
> like to change that. libsvn_wc_sqlite checks out the working copy
> entirely read-only, and you have to tell svn (through something like
> `svn edit file`... Yes, I have been using perforce lately) that you
> are touching it, at which point it'll record that in the metadata and
> flip the file to be writable.
> This behavior is off by default however. The default is to crawl the
> subtree rooted at cwd to work out what was edited, and to sanity check
> metadata as you go. An option passed to svn checkout makes all WC
> files read-only, and relies solely on the metadata to operate on the
> wc, unless a particular operation forces a crawl.
Hmm. Why you need this feature? IMHO crawling tree data spends most
time in reading entries and creating locks, not in reading file
I think that most users like ability to any edit without any commands.

> Text-bases now. By default, they are stored in the metadata sqlite
> database (or maybe in a separate text-base sqlite DB alongside the
> regular metadata. Details.). I would however like to have a clear line
> drawn in the internals of libsvn_wc_sqlite, where we could add other
> behaviors in the future. Say, no text-bases and fail all operations
> that require them for ultra lightweight working copies, or no
> text-bases but retrieved via the ra api when needed (which opens the
> way for webdav caching proxies to work their magic).
Btw, Are you sure that sqllite is ready to store very big data inside?
Like 100 mb field?

> I think that libsvn_wc_sqlite addresses the issues I pointed out at
> the beginning of this mail: tree crawls are minimized, inode count
> goes way down, commandline tools don't find text-base dupes all over
> the place, and we have a clear internal API where we can handle the
> text-base storage problem cleanly. And, hopefully, most operations are
> reduced to an SQL select statement, which can be blindingly fast if
> the database is indexed properly.
> I'm not claiming that it is perfect, it is a different tradeoff on
> various points. I do think, however, that it is worth it.
> You're now all invited to shoot this idea down with sensible
> arguments. I will now go and make peace with myself while you assemble
> the firing squad :-). Oh, and I am willing to attempt to put
> changesets where my mouth is, this isn't a rant calling for someone
> else to do it.

Ivan Zhakov
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jan 17 10:25:33 2007

This is an archived mail posted to the Subversion Dev mailing list.