Re: major design changes...can we

From: Mark C. Chu-Carroll <mcc_at_watson.ibm.com>
Date: 2002-05-20 16:59:31 CEST

On Sun, 2002-05-19 at 03:44, Colin Putney wrote:
>
> On Saturday, May 18, 2002, at 10:28 AM, Mark C. Chu-Carroll wrote:
>
> > Actually, there's a whole lot to gain by integrating build support into
> > the SCM system.
>
> Well, that depends on what your goals are. The reason CVS and Subversion
> are so useful is that they know as little as possible about the contents
> of the files and directories they are versioning. The make a distinction
> between text and binary files and do simple line-ending conversions.
> That's it. As a result, you can use them to version just about anything.
>
> (Side note to Karl: I think this is a good example of the
> worse-is-better philosophy. I too am wary of it, because it has a short
> half-life and quickly degrades to easier-is-good-enough if you're not
> careful. But every now and then you run into a case like this!)
>
> If I understand correctly, Stellation takes the opposite approach, and
> parses out the contents of the source files in order to do more
> intelligent diffing and patching. That does afford some advantages, but
> it also limits flexibility as you can't work with datatypes the system
> doesn't understand.

Actually. Stellation takes an in-between point. If it knows something
about the semantics of the objects that it manages, it takes advantage
of it. If it doesn't, it treats them as either text or binary data
objects. But I really didn't intend to turn this into a Subversion
vs Stellation discussion; I try very hard to avoid doing that. I was
trying to point out some of the advantages of making it possible to
do things like tie builds into the system.

> So I can see how integrating build support into Stellation could be a
> win. You're already commited to understanding the semantics of the
> source tree. But with Subversion (and CVS) you lose a lot.

Not necessarily. If you look at a system like ClearCase, they've
kept artifact semantics out of the system, but they provided an
integrated build mechanism which works extremely well. What they
did was build a version of make which understands the repository. When
clearmake is ready to issue a command, it does a versioned extension
of that command, so that every input to the command is marked with
a version label. If there's a file in the repository that was
generated with the same version labels on all of its inputs, then
it will use that as the result of the command; otherwise, it runs the
command and caches its results.

> > The main factor is that by building things into the system, you
> > can do automated work that would be intractable for a human build
> > manager.
> >
> > For example... Andreas Zeller did some work with changeset based
> > systems on identifying problems. The basic idea is that a bunch
> > of programmers all checked in changes. Then, during the nightly
> > build, you discover that the system no longer compiles correctly,
> > or that it no longer passes the standard tests.
> >
> > But there's 30 changesets. That means 30 tests to determine if
> > one of those changesets is the one that broke the build. But what
> > if what breaks the build is a *combination* of the changes in more than
> > one changeset?
> >
> > Zeller's system did a binary-search like process to try to determine the
> > minimal group of changesets that cause the breakage.
> >
> > A system like that could be implemented outside of the system; but
> > it's a
> > heck of lot nicer to tie enough of it into the system that it can be
> > done
> > fully automatically.
>
> I have two somewhat contradictory responses to this.
>
> On the one hand, I'd say that this system is not significantly better
> than the one used by the Subversion team. Instead of concentrating the
> task of detecting problems at one point, that responsibility is
> distributed among the developers, each of whom is responsible for
> ensuring that his changes don't break anything. Since commits are
> atomic, each developer doesn't have to worry about a combinatorial
> explosion of changeset permutations. He just has to make sure that his
> changes, when applied to the current state of the tree, don't break
> anything.

That's fine up to a point, but it doesn't scale. That's what leads to
the kind of build process I was describing for a lot of really large
products. What happens when it takes three hours to do a build+test of
the system, but checkins occur, on average once an hour? It becomes
impossible for each programmer to ensure that they are testing against
the latest version of the system.

Try watching the build process on eclipse.org. They're using CVS. The
way that they make things work is to have each component of the system
work separately, so that individual programmers only need to stay
up-to-date with their own component. Then, once or twice a week, there
is an integration build, where they put the latest of all the components
together, and do a full build and test. If it's successful, it becomes
the new baseline, and everyone switches to it. It's a cumbersome system,
but it's the only way to make it work for something so large with so
many people constantly changing it.

> At the same time, build breakage is probably the easiest to detect and
> easiest to fix of all the possible problems a changeset could introduce.
> To really detect problems you need to test the behaviour of the software
> once it's built.
>
> The Subversion team does this with a suite of automated tests developed
> along-side of Subversion its self. Each developer ensures that his
> changes not only don't break the build, but also don't break any of the
> tests, again avoiding the need to test changeset permutations. This type
> of automated testing can't be built into the version control system
> because it's too domain-specific. So much so that it's part of the
> project being versioned.
>
> On the other hand, where automated testing along the lines Zeller
> proposes *is* useful, it's quite possible to build it on top of a
> version control system that knows nothing about the build process. The
> svn-breakage mailing list is a good example. Various machines with
> different CPU architectures and operating systems do automated
> checkouts, builds and tests of each changeset and mail the results to
> svn-breakage. It's simple, effective and flexible.
>
> Zeller's system or the user-work/user-commit system you propose could be
> implemented on top of Subversion as easily as within it.

That's mostly true, except that to make it work without space explosion
on the server, you need a make tool that temporarily caches build
products on the server. If you make derived products part of the normal
checkin, then you end up with a terrible amount of wasted space on the
server, and dramatically degraded checkin/chekout times.

Now, I'm not arguing that subversion should be trying to build
functionality like this now. All I'm saying is that it's worth keeping
things like this in mind so that you don't make design decisions that
will make it prohibitively difficult to do it later, if you want to.

-Mark

-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center 
<mcc@watson.ibm.com>
*** The (recently renamed) Stellation project:
***		http://domino.research.ibm.com/synedra/synedra.nsf
*** GPG Public key available at keyserver.net

application/pgp-signature attachment: This is a digitally signed message part

Received on Mon May 20 17:00:32 2002

This message: [ Message body ]
Next message: Karl Fogel: "Re: svn commit: rev 1980 - branches/issue-654-dev/subversion/include branches/issue-654-dev/subversion/libsvn_fs"
Previous message: Glenn A. Thompson: "Re: issue 654 icky."
In reply to: Colin Putney: "Re: major design changes...can we"
Next in thread: Paul Smith: "Re: major design changes...can we"
Reply: Paul Smith: "Re: major design changes...can we"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]