[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion, decentralized version control, and the future.

From: Garance A Drosihn <drosih_at_rpi.edu>
Date: 2007-07-03 03:54:47 CEST

At 4:56 PM -0700 6/28/07, Karl Fogel wrote:
>I've been wanting to post this for a while, but was waiting
>for the dust from Linus Torvald's GIT talk to settle first

Thanks for starting this thread. I think discussion of some
of these ideas will be helpful to subversion development. I
have moved to subversion for many things, and I'm not eager
to move off of subversion to git or other alternatives, but
there are some features I'd love to have.

Apologies that this is going to be so long-winded, but I'm
trying to build up a case (in my own mind) for exactly what
I would like to see, and why I would like to see it.

>In his talk, Torvalds explained why he thinks decentralized
>version control systems (like GIT and Mercurial) are the way
>of the future, ...

While his presentation was rather grating to listen to, I
think he did cover some interesting points.

>Since I'd like to use some of his arguments as a jumping-off
>point for thoughts on Subversion's future, here they are in
>brief:
>
> * Optimizing merging is as important as optimizing
> branching (if not more so).
>
> * Speed matters: ....
>
> * Having all history locally (...) is useful.
>
> * Reducing the thickness of the "commit access" wall is
> good for development. Torvalds didn't make this argument
> terribly well, so I'll try to restate what I think was
> his point:

> The important question is, who can put changes into the
> repository that the project is publishing releases from?

I think you're approaching this from a different direction than
Linus does, and I think *most* projects don't work in the same
way that Linus likes to work. Git is obviously successful for
how Linus does work, and I think the key reason is that Linus
works in a "pull" model of updating, while most of us work in
a "push" model.

Linus has his repository, and he has no intention of giving the
world access to it. However, he's quite willing to pull specific
updates from other people. Strictly speaking, *Linus* is the
only one who can put changes into the repository "the project"
is publishing releases from. This works, because what he does
is let everyone else create *their* repositories, and thus they
have access to their repository -- not his. And then he pulls
changes from their repo into his repo. He pulls what he wants,
and when he feels like pulling it.

In a project like FreeBSD, there is a central repository, and a
few hundred developers have push access into it. Oh, there's
code-review, and you can get bashed around the head if enough
developers are not happy with some commit that you pushed into
the central repo, but each developer is pushing his own updates
into the common repository.

I cannot see Linus giving 400+ people access to his personal
repository, and I cannot see FreeBSD switching to have a "pull"
model for bringing updates into the "official" project repository.

>When I talked to Brian Fitzpatrick about this, he listed three
>things as top priorities:
>
> - Faster. Subversion does need to be faster for many ops.
> - Offline commits.
> - Local branches.
>
>I would add "better merging", but basically agree with Fitz
>(note that we're getting much-improved merging in Subversion 1.5).

What I think I would like is more than just local branches. I
want branches which are recognized as being totally separate
from the original repository. Disjoint, separately administered.
People who have zero write access to a master repository should
still be able to do track their changes, and track those changes
wrt the official repository.

Also, from what I can tell, what we're getting is much-improved
tracking of merges made within a single repository. That is a
major benefit, of course, but we're still in the mindset of a
working from a single master repository.

>But there's another factor...
>
>True decentralized systems are really hard for most people to
>wrap their heads around.
>
>... One of Subversion's biggest advantages, and one of the
>reasons it's taking over the world, is that it's really easy
>to understand. There's a repository; you check stuff out;
>you modify the stuff; you check it back in.

I think a full-blown distributed change system is harder for
the average person to understand, but I think subversion could
provide a subset of it that would be really easy to explain.

>[....] For many organizations, including open source projects,
>centralization is a feature: you want changes (and branches)
>to end up in the master repository sooner rather than later,
>so they'll be visible to everyone, so they'll be backed up,
>so they'll go through the central hook system, etc. It
>focuses the community on a shared object (Ben Collins-Sussman
>makes this argument in more detail at
>http://blog.red-bean.com/sussman/?p=20).

To quote from that blog:

    So while most people say: "isn't it great that I can fork
    the whole project without anyone knowing!" My reaction is,
    "yikes, why aren't you working with everyone else? why
    aren't you asking for commit access?" This is a problem
    that is solved socially: projects should actively encourage
    side-work of small teams and grant access to private
    branches early and often

This overlooks one key point. Even though CVS and subversion
do not support any kind of distributed model, developers *WILL*
do it. They *ALREADY* do it. I have commit access to the
FreeBSD project, for instance, but there are small works-in-
progress that I simply do not want to clutter up the main
repository with. No matter how frilly and wonderful you make
"collaboration" sound, the fact is that it is counter-productive
to put things in the official repository until you have some
idea how it's going to work out. I'm not going to commit
something and have that CVSUP-ed to 100,000 machines across
the planet, if I think I might rip out all those changes
tomorrow just to do it in a completely different manner.

Is it frightening? Well, that's too bad. It is what is in
fact happening, and it's absurd to pretend that it does not.
Or to pretend it go away if your SCM doesn't support it.

It happens that many developers in the FreeBSD project get
around this by having a *second* repository, which is sync'ed
from the official CVS repository and is in perforce. But I
don't feel like learning perforce just for this benefit. (The
developers who use it, use it due other benefits. But for
*me* this would be the only benefit from using the p4-based
repository).

And I *have* access to the FreeBSD repository. What about
someone who doesn't have any access, but wants to work on
some changes before sending in a PR? How are they supposed
to come up with a patch that they are confident in? Should
they send in one PR one day, and then send in another PR the
next day saying "Well, my previous patch was all screwed up,
but how about this one?". This is not a good way to build
credibility.

   - - - - - - - - - - - - - - - - -
   All of the above rambling is just background for:
   - - - - - - - - - - - - - - - - -

>And now I'm going to hand-wave on a lot of details. I don't
>mean to start the Subversion 2.0 design thread now, just to
>offer some thoughts on general goals.

I also have to hand-wave on the details of what I want, because
I'm still not completely sure what would work right...

I think what I want is to be able to create my own repository,
and maybe say that the HEAD branch in this new repository is a
read-only copy of HEAD in the master repository. The local
HEAD branch would automatically sync-up with the original.
This obviously presents some technical issues wrt revision
numbers, etc. Perhaps that could be tracked via a second
revprop for the mirrored version of the HEAD branch.

And really I'd like to specify more than one branch.
Something like "Create me a new repository which is a fork
of <OtherRespository>, starting at July 1/2006, and limit
that mirroring to just the HEAD and 6.x branches".

-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 3 03:54:42 2007

This is an archived mail posted to the Subversion Dev mailing list.