RE: branching several times a day (was Re: Sourcesafe user needs primer on branching source control)

From: Rob van Oostrum <rob.vanoostrum_at_blastradius.com>
Date: 2004-03-12 20:37:35 CET

The potential problem with this merge into the private branch from the trunk
before merging the private branch in to the trunk:

- it destroys the integrity of the private branch. Instead of being able to
easily identify the changes made on the branch for the purpose for which the
branch was created (by diff-ing between the top and the bottom of the
branch, or between any 2 tags on that branch), you build up a collection of
change sets (start of branch -> before merge from trunk, after merge from
trunk -> before next merge from trunk, etc etc). This is a perfectly valid
mode of working as long as you're not too fussy about keeping your change
sets isolated and easy to identify.
- it duplicates changes over a multitude of branches. In large development
environments with many developers and lots of development activity, this
leads to a LOT of overhead in storage. Yes, diskspace is cheap these days,
but why introduce a scalability issue when there's no need to do so?

On the other hand, I agree with you in that you shouldn't want to dump
changes into the main line of development if the baseline of a private
branch is significantly out-of-date. So what I like to do in these cases is
setup a temporary integration branch off the main line of development and
merge the private branche changes back into this. What this does, is it
doesn't duplicate any changes on the main line (which are likely to be
larger in number and size than the changes on the private branch) by only
adding tags (cvs tag) or creating pointers (svn copy) for these files. After
sorting out any integration issues, the changes made on the integration
branch (private branch merge + merge conflict resolutions) are merged back
into the main line. This approach has a number of advantages over the
mainline->branch->mainline and branch->mainline approaches:

- it preserves the integrity of the changeset on the private branch
- it isolates the integration activity from both the mainline and the branch
- it offers scalability of approach by not creating duplicate delta entries
of mainline changes and offering a single integration stream for multiple
branches that may be looking to deliver into the mainline around the same
time.

After the integration into the mainline is complete, the integration branch
is removed (SVN) or documented as closed (CVS)

Either way may work, depending on the characteristics of a project ...

just my $.02

Rob

> -----Original Message-----
> From: Brad Appleton [mailto:brad@bradapp.net]
> Sent: Friday, March 12, 2004 2:09 PM
> To: users@subversion.tigris.org
> Subject: Re: branching several times a day (was Re: Sourcesafe user
> needs primer on branching source control)
>
>
> On Fri, Mar 12, 2004 at 11:18:12AM +0100, Stefan Haller wrote:
> > Brad Appleton <brad@bradapp.net> wrote:
> >
> > > So if you primarily work on one task at a time, you have a
> > > single branch all to yourself. When you are done with your
> > > change (and after you have "updated" from the main trunk)
> > > you "commit" your change to the main trunk.
> >
> > I'm not sure I really understand what you mean. Are you
> > saying you would first merge from the trunk to your branch
> > (the changes that other people have committed to the trunk
> > in the meantime), and then merge back from your branch to
> > the trunk?
>
> Yes. This is a commonly recurring standard best practice in
> most VC tool "communities" where the tool has decent branching
> support (i.e., not VSS :). In CVS and SVN the "update" command
> does this. In ClearCase/UCM it is called rebase (short for
> "rebaseline"), I think Perforce uses "sync". Bitkeeper calls it
> "pull". Other tools call it "import" or "merge-in".
>
> The idea is that, you are about to commit your changes to the
> codeline. if other changes have been committed to the codeline
> since you started your change, then your sandbox is not "up to
> date" with the latest "good" state of the codeline. Hence if you
> commit your changes, you will have potential inconsistencies and
> even merge-conflicts to reconcile, and you may "break" the build
> of the codeline. If you break the build, it impacts the whole
> team because none of them can commit their changes now either.
>
> So the prevailing wisdom that has emerged says, find a way to
> test the result of my changes + the codeline such that if the
> result fail, it only impacts me and my sandbox and not the result
> of the team. There are two ways to do this:
>
> A) Don't use "Latest-and-Greatest"!
> ----------------------------------
> Instead, only use the most recently "blessed" (e.g. promoted)
> baseline (label or tag). This offloads a lot of merge and
> build work and resultant labeling+promoting to a buildmeister
> and/or build-blesser. Having changes that are not "in sync"
> with the latest stuff becomes increasingly more common and
> it takes increasingly longer for builds to be blessed and for
> the codeline and sandboxes to be "in sync".
>
> The upside is that it is easier to isolate the set of
> changes that you had to make, because you don't have to
> checkout/merge/add any files/lines for changes that you had to
> merge-in from elsewhere. If your VC tool has decent support
> for being able to figure out which changes were REALLY made
> by you and which ones were simply carried-forward by you,
> this is less of a traceability concern.
>
> OTOH, it might be easier to "reuse" the un-synced changes
> in your workspace to "propagate forward" into a subsequent
> parallel supported release. (Then again, it might not be any
> easier, and could even be harder).
>
> B) Update your Sandbox to Keep Current
> --------------------------------------
> Use latest-and-greatest. Do an update as often as desired
> when there are new commits to the codeline. Keep your sandbox
> (and branch) in sync with the latest state of the codeline so
> that you don't have a "big bang" merge at the end of your task
> and have to reconcile a maximal number of changes and your own
> rework efforts. Instead do regular, frequent, and incremental
> integration into your own sandbox so you only merge small and
> easy chunks at a time, and decrease the amount of time and the
> likelihood of occurrence that the codeline may be broken and that
> you will have to do major rework before committing your changes.
>
> The upside is that frequent incremental integration helps keep
> everyone current and reduces the size and complexity of merge
> conflicts and eases their reconciliation. It also minimizes
> the window of time between when you are ready to commit your
> changes and when you have finished committing them and have
> verified the result is still consistent/correct.
>
> The downside is your branch contains lots of changes that were
> carried forward by you but not necessarily made by you. Again,
> this is more of a traceability concern. Some would say it also
> makes it harder to "subtract" the added functionality from the
> codeline if desired at a later date - and this is true to some
> extent. At the same time, following this practice decreases the
> likelihood that it will be necessary as well as the likelihood
> that a change will "break the build" (whereas if you haven't
> done it, and you here about this, you worry about how to undo a
> broken build because you are more used to it happening because
> you don't sync as frequently - a bit of a catch-22)
>
> So which is best?
> =================
> In general, most small and medium projects prefer the Frequent
> Incremental Update approach - what I call "Continuous Update"
> in my article "Codeline Merging and Locking: Continuous Update,
> Two-Phased Commits" in Nov'03 CMCrossroads news at:
> <http://www.cmcrossroads.com/newsletter/articles/agilenov03.pdf>
>
> Larger projects, particularly those that have dedicated
> build-meisters that typically don't let developers commit their
> own changes tend to eschew the "Latest-and-Greatest" and insist
> on using static, formally identified/blessed labels. It is
> more careful and controlled but also adds a lot of development
> "friction" and wait-time at the benefit of reducing the cost of
> rework by preventing the "big merges" (rather than amortizing
> them over small frequent chunks :-).
>
> In the end, both are different risk-management approaches that
> have their own appeal to their own audiences. There is "pay
> now" (the static baseline), and there is "pay later" (don't
> use anything and wait till it burns you), and there is "pay
> as you go" (the frequent and disciplined use of incremental
> integration, even during one's own change-task).
>
> However, I have noticed in the last 5 years that more and more
> shops are leaning toward developer "push"-style integration
> (allowing developers to merge/commit their own changes), and
> requiring them to rebase-before committing. To mitigate the
> risk, they use what I call a "docking line" and the developers
> push ("dock") their changes to this "active" development line,
> and then the SCM/Build folks can preview/audit the stability
> of what is there before deciding to "pull" the "docked" changes
> from the active development line over to the mainline or
> release-line branch.
>
> I personally find that in my experience, the more frequent and
> more incremental approach gives better overall stability and
> suitability PROVIDED that developers are disciplined about
> making sure their stuff works and won't-break-the-build before
> merging it and learn how to successfully merge, and generally
> do a good job of using encapsulation and modularity in their
> coding. It also means "code ownership" (e.g. of a module/class)
> can not be "exclusive" but is more like "stewardship" than
> ownership (exclusive code ownership makes it difficult to do
> this, and forces a more sequential-locking approach, and more
> "wait-time" for the code-owner to make the changes you would
> otherwise get their help on when reconciling merge conflicts).
>
> Good design, discipline and collaboration keep codelines
> consistent, correct and coherent, and make LATEST-AND-GREATEST
> with continuous/hyperfrequent integration+updates be very
> effective and HIGHLY productive. If you don't have all three
> of those things and continuous (the encapsulation/modularity,
> the discipline to test what you have to ensure you don't break
> the build, the ability to collaborate well to resolve merging
> concurrent changes) then you break something for either
> the SCM/Buildmeisters, or the QA/V&V, or the code-owners,
> and ultimately for project management. In those cases the
> formal static baselining and throw-it-over-the-wall "pull
> model" of integration is more rampant, and takes more time,
> but gives more reliable quality results (and results in more
> adversarial relationships between those competing roles).
>
> For more info on the "Docking Line" pattern, you can see the
> two sets of powerpoint slides from previous RUC conference
> presentations I've given at:
> http://acme.bradapp.net/#ClearCase
>
> For more info on "Active Development Line", "Release Line"
> and "Mainline" patterns you can see the "SCM Patterns" book
> (www.scmpatterns.com) and also see precursor descriptions
> of them in a rather comprehensive (and lengthy :) branching
> best practices paper at:
> http://acme.bradapp.net/branching/
>
> For more info in particular on "Continuous Update" and
> several companion practices that accompany it, see the
> aforementioned paper on codeline merging and locking
> http://www.cmcrossroads.com/newsletter/articles/agilenov03.pdf
>
> It talks about the following dozen or so locking-related
> practices and the circumstances (context) in which each is
> appropriate to use. Alternatives range from no locking and
> a single integration machine, to an integration token, to
> various forms of codeline locking.
>
> Continuous Workspace Update
> * Workspace Update
> + Post-Commit Notification
> * Private Checkpoint/Versions
> + Private Archive
> + Private Branch
> + Task Branch
> + Checkpoint Label
>
> Two-phased "Commit"
> (where the commit "transaction" is viewed as having two phases:
> a commit-phase, and a "preparation" phase that consists of:
> rebase+reconcile, rebuild+retest, resolve)
> * Pre-Commit Validation
> * Codeline Locking (and factors of team-size, build/test-time,
> parallel tasks, likelihood of collisions/conflicts,
> commit-duration and overlap)
> + Single Release Point (e.g., single integration machine)
> + Integration Token
> + Codeline Write-Lock
> - Full Codeline Lock
> - Partial Codeline Lock
> - Double-Checked Codeline Lock
> - Phased Codeline Lock
> It discusses appropriate context for the locking patterns
> based on the above mentioned factors.
>
> All of those locking-related patterns are successfully recurring
> solutions in common practice. But the context is important. Use
> a pattern in the wrong context, and at best you might simply
> be doing more than you really need to (at worst you could really
> foul things up).
>
> Hope that helps!!!
> --
> Brad Appleton <brad@bradapp.net> www.bradapp.net
> Software CM Patterns (www.scmpatterns.com)
> Effective Teamwork, Practical Integration
> "And miles to go before I sleep." -- Robert Frost
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Mar 12 20:38:19 2004

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]