[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Sparse Directories vs Externals

From: David Chapman <dcchapman_at_earthlink.net>
Date: Wed, 25 Feb 2009 13:06:49 -0800

Les Mikesell wrote:
> David Chapman wrote:
>>> Method #1 -- the way your co-worker believes: You have two "streams",
>>> a development stream and an integration stream. Developers create
>>> their own independent development stream off of the integration
>>> stream, do their work, and then merge their changes back into the
>>> integration stream when completed. Advantages: Developers work in
>>> their own world that is unaffected by other developers. Disadvantage:
>>> Developers work in their own world that is unaffected by other
>>> developers.
>>> Method #2 -- The right way: Developers work all in the same stream
>>> which is where the code lives. This forces developers to work
>>> together, make sure they know what everyone is doing, and take small
>>> bites of code changes.
>> Don't try to speak for all of us. I learned very quickly when
>> writing optimization software that private branches are a Good
>> Thing. Otherwise my results (which took hours to compute) were
>> constantly polluted by seemingly harmless updates from other members
>> of the group. Yes, merging from trunk and reintegrating at the end
>> of feature development were not fun, but it was a lot better than
>> trying to determine, every day, which of the daily changes (mine or a
>> colleague's) caused a degradation in results.
> If you each make more than one change, shouldn't the point be to find
> how each specific change affects performance more than the ownership
> of the code?

Electronic Design Automation (EDA) software is a series of heuristic
optimizations applied to non-linear, discontinuous functions. An
experimental optimization might make some results better and some
results worse. If, for example, an evaluation of a change over 50
designs resulted in an average 0.5% improvement with no design degrading
by more than 0.2%, the change looked good and would probably be
accepted. Merging other changes during the course of the experiment
would increase the noise and make the analyses harder. I had to learn
this the hard way.

>> We also had our share of partial commits to trunk, especially at the
>> end of the work day. Since developers' schedules were not always the
>> same, a workspace update at the start of the work day might result in
>> a couple hours in which the workspace was unbuildable or would fail
>> on numerous regressions. Since the full regression suite took hours
>> of time on a server farm, it was impractical to run them for every
>> commit (or even every hour). Even the developer's regression suite
>> required an hour of CPU time (and a minimum 10 minutes of clock time
>> even on a server farm). If there was a commit while the regression
>> suite was running, then what? (Our answer was "update, recompile,
>> and commit your changes, but then rerun the developer regression
>> suite immediately".)
> Is there some point to testing head/trunk, knowing that's not a
> reproducible state? Your "If there's a commit" question doesn't make
> much sense. Shouldn't your regression be running against a branch or
> tag made for the purpose - or at least a specific revision of trunk?
> There are plenty of ways to get reproducible results, but trunk/head
> isn't one of them.

The developer's regression suite ensured that basic functionality was
not damaged by the changes in a local workspace. It was a precondition
to every commit. It was not designed to verify Quality of Results
(QoR); a separate overnight run on a server farm did that. Typically
developers would rotate duty for daily QoR analysis of the previous
night's runs. It might take one developer most of a day to analyze the
results when there was a lot of noise in the numbers.

>> Since we were using CVS, when a bad commit by another developer
>> messed up a workspace we were stuck - reverting to a previous
>> revision wasn't possible.
> Why can't you get any revision you want back from CVS? And why didn't
> you have tags at the likely places you'd want?

As I understand it, tagging is an expensive operation in CVS. There
were nightly tags, but not for every commit (50+ per day across all
branches in my group alone, hundreds of commits per day company-wide).

>> We had a binary search script that could determine which commit
>> caused the problem, and we had home phone numbers, but it still took
>> time to resolve these problems.
> I don't understand. Are you saying you actually broke CVS with a
> commit or that you couldn't do what you want with a known revision?
> Or that since you didn't tag, you didn't know which revision to get?

Finding compile errors was relatively straightforward, but still
required a call to the developer. Finding QoR degradations required
actually running test suites.

>> In short, there is no One Right Way. For agile development with
>> "shallow" code, development in a single line is probably better. For
>> optimization code that uses "deep" algorithms, private branches work
>> better. Branching in CVS was painful enough that developers often
>> had all their changes on their local hard disk for days on end;
>> Subversion would have made this much easier to deal with.
> With CVS you probably needed to tag 'expected good' points while the
> trunk advances, knowing the head may be unstable at times. You can do
> the same in subversion or branch for testing/release - or you can do
> the reverse and branch for development but it just seems natural to
> expect development and changes to be happening on trunk/head and
> specify a tag or revision number when you want something predictable.

Developing on trunk was always an experience; compilation was never a
given (with 200 developers company-wide, we never knew whether an update
would allow a compile) and QoR was highly unstable. Branching occurred
whenever feature development was close enough to being done that QoR
stability became a higher priority than keeping trunk continuously up to
date. Branch lifetime was weeks at best with multiple developers
continuing to make changes. And those changes would of course have to
be reintegrated into trunk.

"Expected good" would be a judgment call; an optimization that improved
four product lines might be disastrous for a fifth (where "disastrous"
means more than 1% worse QoR). And that change could be in an early
optimization, yet not show worse results until very late in the
optimization process as performance estimates were refined.

    David Chapman         dcchapman_at_earthlink.net
    Chapman Consulting -- San Jose, CA
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-02-25 22:07:29 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.