[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Why rewrite the Subversion working copy?

From: David Glasser <glasser_at_davidglasser.net>
Date: Tue, 22 Apr 2008 16:28:47 -0700

Why rewrite the Subversion working copy?

It's generally agreed by now that libsvn_wc is the most painful part
of Subversion to deal with. Reasons for this, and the start of a
design for a rewrite, are enumerated by Erik Huelsmann and others in
notes/wc-ng-design.

But this is going to be a lot of work! Is it really worth it?

It's been over four years since Subversion 1.0, and over seven years
since the first milestone release. Subversion showed the world that
there could be open source version control software that was better
than CVS; since then, there has been an explosion of new open source
version control systems.

So when it comes time to consider sinking a large amount of time and
effort into rewriting the hairiest part of Subversion, a natural
question is: why bother? Why put the effort into improving Subversion
instead of working on git, Mercurial, or <insert your favorite new VCS
here>?

Now, Subversion has many obvious advantages over, say, git:

* more user-friendly UI (except all the weirdnesses that come from the
  .svn-in-each-directory decision)
* commitment to portability (specifically, including Windows)
* library design with a stable and documented API for integration into
  other systems

But, you know, fixing these problems in git might not be much harder
than rewriting the Subversion working copy. And by now, I'm pretty
confident that, *in situations where it's feasible*, the "having all
of the historical data on the client" model is superior to the
Subversion client/server model. We spend so much time in Subversion
development thinking about "roundtrips" and "putting load on the
server" and so on, which isn't even a consideration for a system like
git.

But the key phrase there is "in situations where it's feasible".

I'm pretty confident that, for a new open source project of non-huge
size, I would not choose Subversion to host it, at least not for
reasons directly associated with the version control system itself
(eg, I might choose Subversion because I like Google Code Project
Hosting; github looks like it might be good competition eventually,
though).

So does that mean Subversion is dead? That we should all jump ship
and just write a new front-end for git and make sure it runs on
windows?

Nah. Centralized version control is still good for some things:

* Working on huge projects where putting all of the *current* source
  code on everyone's machine is infeasible, let alone complete
  history (but where atomic commits across arbitrary pieces of the
  project are required).
* Read authorization! A client/server model is pretty key if you
  just plain aren't allowed to give everyone all the data. (Sure,
  there are theoretical ways to do read authorization in distributed
  systems, but they aren't that easy.)

My opinion? The Subversion project shouldn't spend any more time
trying to make Subversion a better version control tool for non-huge
open source projects. Subversion is already decent for that task, and
other tools have greater potential than it. We need to focus on
making Subversion the best tool for organizations whose users need to
interact with repositories in complex ways, like:

* Working on enormous repositories, where you don't want to check out
  the entire project

  - checkouts below the branch root: we have that!
  - sparse directories: we mostly have that!

* Working on repositories with enormous files, where you don't want an
  extra "base" copy of every file if you're only editing a few at a
  time

  - baseless wcs: we don't have that yet, but it should be easy in
    wc-ng

* Workspaces where different parts come from different branches

  - switching: we have that, but it's really easy to pass the wrong
    URL to "svn switch" and break your working copy

* Workspaces containing trees from different repositories, or from
  different parts of the same repository nested inside each other

  - externals: we have it, but it's a tacked-on wart of a feature with
    shoddy semantics, a shoddy UI, and big limitations

* Workspaces containing multiple parts of one repository, side by
  side, which should be committed together atomically

  - this sort of works, sometimes, if you happen to hit one of the
    codepaths that doesn't try to find a common parent of the commit
    targets, or you do awful hacks like putting a fake ".svn"
    directory in the parent directory

A great deal of the power of tools like git comes from their ability
to assume that situations like the above aren't worth dealing with.
(And for lots of projects, they're absolutely correct! We don't need
version control systems to be one-size-fits-all.)

The general Subversion architecture *should* be able to deal with all
of the above use cases. Some of them are already achieved, to some
degree or another. None of them should require any server-side
changes at all. They all could theoretically be achieved without a
full wc rewrite (see Blair's message today about "file externals"),
but dealing with the current wc is full of pain.

I'd like to put time and energy into fixing the working copy
situation. But I don't want to fix it just in order to make easy
things still easy. I want to fix it to make hard things feasible.

There are many levels that need to be designed. Erik (and others)
have done an excellent job in the notes/wc-ng-design file of analyzing
what I think of as the "middle" layer: the layer most similar to the
current libsvn_wc. I've been doing a lot of thinking about the lower
and higher layers.

For the lowest layer, I've started implementing a prototype in Python
of a low-level Subversion metadata store, with full unit tests and all
that jazz. Currently I've designed the API for a refcounted blobstore
and am working on tree and treestore abstractions. I'll put it
somewhere (http://svn.collab.net/repos/svn/experimental/svnws? gvn
repository? whatever) sometime soon, once I've got a little more
implemented. The key goal here is to create a *non-brittle* working
copy, where we don't have to be scared that typing "svn switch" with
the wrong URL will corrupt the working copy irretrievably. Efficiency
is nice too. (Hopefully, while the code itself will probably need to
be backported to C, the tests might end up being executable against
the "real" code.)

For the higher layer, I've been thinking about the design of
"libsvn_workspace" and an "svn workspace" command, which allows users
to define non-trivial working copy layouts, mapping one or more
repository subtrees into their workspace. "switched directories" and
"externals" wouldn't be special cases any more: they're just be normal
things that show up in workspaces that aren't just a single repository
subtree. Projects can be configured ad hoc using "svn workspace"
commands, or in a version-controlled file (or property?), similar to
svn:externals or the SVK svk:project:* property. Perhaps I (we?) can
add this layer to the prototype I'm starting, before backporting it to
C for the real svn_ws.

So yeah, this is some of what I've been thinking.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-04-23 01:29:02 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.