Re: Identifying branch roots

From: Trent Nelson <trent_at_snakebite.org>
Date: Mon, 10 Oct 2011 11:57:15 -0400

On 07-Oct-11 6:59 AM, Julian Foad wrote:
> On Fri, 2011-10-07 at 11:29 +0100, Julian Foad wrote:
>> Stefan Sperling wrote:
>>> julianfoad wrote:
>>>> +/* This property marks a branch root. Branches with the same value of this
>>>> + * property are mergeable. */
>>>> +#define SVN_PROP_BRANCHING_ROOT "svn:ignore" /* ### should be "svn:branching-root" */
>>
>> Hi Stefan. Thanks for picking up on this.
>>
>>> I think your addition of a 'branch root' property is quite a significant
>>> step. Is this really necessary in order to improve the output of
>>> 'svn mergeinfo' or do you have additional steps planned that go beyond
>>> tuning output?
>>
>> Both. I think knowing whether the (requested) merge source and target
>> are branch roots (and indeed branches of the *same* "project" or tree)
>> is important for improving the output and diagnostics of "svn mergeinfo"
>> and "svn merge" commands.
>>
>> It could of course enable other new behaviours relating to branches, and
>> I don't know what those are yet (apart from trivial UI things like
>> answering "is this a branch?").
>>
>> So I'm working on the idea that it would be useful to have branch roots
>> identifiable by some mechanism, so I'll add "some mechanism" (currently
>> this property, but I'm totally open to a different mechanism such as
>> branch points being defined in a config file) and see what useful
>> behaviours I can come up with.
>>
>>> There has been some discussion about adding a property for this
>>> and similar purposes in the past, see
>>> http://svn.haxx.se/dev/archive-2009-09/0156.shtml
>>> (there are probably more threads about this topic)
>>
>> Yes, and it's time to figure out what we can usefully do with such
>> information and then we'll know exactly what branch configuration
>> information we need and what's a good way to store it.
>>
>> I'll reply to the rest in a further email.
>>
>> - Julian
Welp, I'm never going to get a better lead in than that, so, hi, folks!
Freelance SCM consultant here; used to specialise in ClearQuest, of all
things, but my last two gigs ended up revolving around Subversion.

Specifically, Subversion merges, in the enterprise, and the, uh, quirks
involved. Each client had different requirements, and thus, the
solution I ended up delivering to each one differed a bit. The first
solution was neat, and did all kinds of funky ClearQuest integration
and merge validation, but the second one is more applicable to this
discussion, so I'll describe that first.

In essence, it's a hook framework that attempts to enforce Subversion
best-practices by blocking* incoming commits if it detects one or more
of the following:

         (*) Sometimes it'll block, but phrase the error message
             along the lines of "if you *really* want to do this,
             re-try your commit with the phrase 'CONFIRM MULTI-ROOT
             RENAME' somewhere in your commit message".

     TagCopied
     TagRenamed
     TagRemoved
     TagModified
     TagReplaced
     TagSubtreeCopied
     TagSubtreeRenamed
     TagSubtreeRemoved
     TagSubtreeModified
     TagSubtreeReplaced
     MultipleUnknownAndKnownRootsModified
     MixedRootNamesInMultiRootCommit
     MixedRootTypesInMultiRootCommit
     SubversionRepositoryCheckedIn
     MergeinfoAddedToRepositoryRoot
     MergeinfoModifiedOnRepositoryRoot
     SubtreeMergeinfoAdded
     RootMergeinfoRemoved
     DirectoryReplacedDuringMerge
     EmptyMergeinfoCreated
     TagDirectoryCreatedManually
     BranchDirectoryCreatedManually
     BranchRenamedToTrunk
     TrunkRenamedToBranch
     TrunkRenamedToTag
     BranchRenamedToTag
     BranchRenamedOutsideRootBaseDir
     TagSubtreePathRemoved
     RenameAffectsMultipleRoots
     UncleanRenameAffectsMultipleRoots
     MultipleRootsCopied
     UncleanCopy
     FileRemovedFromTag
     CopyKnownRootSubtreeToValidAbsRootPath
     MixedRootsNotClarifiedByExternals
     CopyKnownRootToIncorrectlyNamedRootPath
     CopyKnownRootSubtreeToIncorrectlyNamedRootPath
     RenamedKnownRootToIncorrectlyNamedRootPath
     MixedChangeTypesInMultiRootCommit
     CopyKnownRootToKnownRootSubtree
     UnknownPathCopiedToIncorrectlyNamedNewRootPath
     RenamedKnownRootToKnownRootSubtree
     FileUnchangedAndNoParentCopyOrRename
     DirUnchangedAndNoParentCopyOrRename
     EmptyChangeSet
     CopyKnownRootToUnknownPath
     CopyKnownRootSubtreeToInvalidRootPath
     NewRootCreatedByRenamingUnknownPath
     UnknownPathCopiedToKnownRootSubtree
     NewRootCreatedByCopyingUnknownPath
     PathCopiedFromOutsideRootDuringNonMerge
     UnknownDirReplacedViaCopyDuringNonMerge
     DirReplacedViaCopyDuringNonMerge
     DirectoryReplacedDuringNonMerge
     PreviousPathNotMatchedToPathsInMergeinfo
     PreviousRevDiffersFromParentCopiedFromRev
     PreviousPathDiffersFromParentCopiedFromPath
     PreviousRevDiffersFromParentRenamedFromRev
     PreviousPathDiffersFromParentRenamedFromPath
     KnownRootPathReplacedViaCopy
     BranchesDirShouldBeCreatedManuallyNotCopied
     TagsDirShouldBeCreatedManuallyNotCopied
     CopiedFromPathNotMatchedToPathsInMergeinfo
     InvariantViolatedModifyContainsMismatchedPreviousPath
     InvariantViolatedModifyContainsMismatchedPreviousRev
     InvariantViolatedCopyNewPathInRootsButNotReplace
     MultipleRootsAffectedByRemove
     AbsoluteRootOfRepositoryCopied
     PropertyChangedButOldAndNewValuesAreSame
     CopiedOrRenamedUnknownPathToIncorrectlyNamedNewRootPath
     UnknownPathRenamedViaReplaceToExistingKnownRoot
     UnknownPathCopiedViaReplaceToExistingKnownRoot
     UnknownPathRenamedToKnownRootSubtree
     UnknownPathCopiedToKnownRootSubtree
     KnownRootSubtreeRenamedViaReplaceToExistingKnownRoot
     UncleanRenameOfRootAncestorPath
     RenamedKnownRootViaReplaceToExistingKnownRoot
     RootPathAncestorRenamedViaReplaceToExistingKnownRoot
     RenamedKnownRootViaReplaceToRootAncestorPath
     RenamedKnownRootViaReplaceToRootAncestorPath
     RootPathAncestorRenamedToValidAbsoluteRootPath
     RootPathAncestorRenamedToValidRootPathSubtree
     RootPathAncestorRenamedToKnownRootSubtree
     RootPathAncestorRenamedViaReplaceToRootAncestorPath
     RenamedKnownRootToUnknownPath
     RenamedKnownRootSubtreeToUnknownPath
     RenamedKnownRootSubtreeToValidRootPath
     RenamedKnownRootSubtreeToIncorrectlyNamedRootPath
     UncleanRename
     RenameRelocatedPathOutsideKnownRoot

         (There's probably room for another e-mail thread just
          discussing all of these conditions; let's just say,
          Subversion repositories in the enterprise rarely look
          like their usually-well-laid out open source repository
          brethren. What was the Blade Runner line? "I've seen
          things you people wouldn't believe."? ;-) My personal
          favorite: 'SubversionRepositoryCheckedIn'.)

So, as you can see, most of these conditions involve the concept of a
root. Thus, the ability to accurately discern what constitutes a root
took up a large portion of my time.

Hard-coding regexes and forcing all repositories to confirm to a pre-
defined repository layout worked like a charm for my first client, as
I was coming in before they had any Subversion repositories rolled out
into production. (Well, sort of.)

That unfortunately wasn't feasible for my second client. They were a
*huge* Subversion shop. At the time I came in they had something like
960 production repositories, and I wouldn't be surprised if they were
well over 1,000 by now. There was no standard layout between repos,
and a lot of repos used non-standard branches/tags/trunks paths so
trying to manage 'root detection' via regexes was a non-starter.

For example, a number of repos had layouts like this:

     /foo/trunk
     /foo/branches/1.0.x
     /foo/branches/bugzilla/1081

i.e. 'bugzilla' was just some random directory they created to hold
developer branches related to bugs. A regex approach would have
matched 'bugzilla' as the branch root, whereas, in fact, the branch
root would have been 1081.

The other non-starter was requiring the admin staff to have to go in
and manually specify what constituted a branch, i.e. setting a 'branch
root' property on relevant paths. The overhead that would have been
required to do that for ~1,000 repositories (with hundreds, if not
thousands of differently named branches/roots (i.e. not particularly
easy to automate reliably)) was not acceptable (for many enterprisey
reasons mainly surrounding cost).

So, I needed to design the branch detection logic in such a way that
it didn't require any hand-holding from the admins or support staff.

It took two attempts.

For the first attempt, I played around with the notion of a root *base*
directory, i.e. /branches and /tags. The first thing the framework
would do when processing a pre-commit was create a 'RepositoryRoots'
class (the framework was written in Python FWIW), which would recurse
through the repo up to N-levels deep in order to determine the valid
root base directories. Except for trunk, which was special, if a
directory had subdirectories that were created by copying another path
(i.e. how tags or branches are created), then the directory would be
considered a root base dir.

That lasted... about a day or two. It was a leaky abstraction at best,
and broke when I encountered repos with the more non-standard layouts.
(I'm not even sure if I've described it accurately above; but eh, who
cares, it's gone now.)

The problem with the regex and base-root-dir discovery approaches was
that they were essentially heuristic based. "This directory features
lots of subdirectories that were copies of other paths, therefore, it's
a good chance it's a valid root base directory."

In most cases, yes, that was a valid assumption, but not always. The
root detection logic was the most critical piece of my solution -- I
wasn't getting paid to correctly detect roots 70% of the time in 60% of
the repos. It needed to be 100% in 100%.

So, I thought to myself, how can I correctly and autonomously identify
a root with 100% accuracy? What one property did valid roots share
that I could interrogate? Heck, what even constitutes a root? A branch
is a root, so is a tag, so is trunk.

....and then it dawned on me. It seems so simple now, in retrospect:

     In the beginning, there was one root: trunk. Then it was copied
     elsewhere, and became a branch, or maybe a tag. These copies are
     also roots, and copies of them should also be considered roots.

Ah, so simple! I just need to start at revision 0 and work my way up to
HEAD, whilst keeping a record of roots I encounter along the way. And
that's pretty much it ;-)

Turns out, that approach has worked surprisingly well. It's been in
production at the second client's site for nearly a year now. They just
run the 'repo analysis' part of the code against new repositories before
enabling the hooks, and wallah, they get instant root detection and
prevention of some 80-something erroneous conditions.

Here are some techie' details about the implementation. So, the script
stores root information in a revision property called 'evn:roots' (set
against the root of the repository). The value of evn:roots at any
given revision will list all of the known roots in the repo at that
revision:

% svn pg --revprop -r26503 evn:roots svn://client.com/repos/foo
{'/build/branches/3.0.1/': {'created': 22323},
  '/build/branches/3.0.2/': {'created': 23129},
  '/build/branches/3.1.0/': {'created': 25804},
  '/build/branches/cvs/0.0.1/': {'created': 26389},
  '/build/branches/bugzilla/4144/': {'created': 22121},
  '/build/branches/bugzilla/6952/': {'created': 17661},
  '/build/release//3.0.0/': {'created': 20774},
  '/build/release/paris/3.0.0/': {'created': 20307},
  '/build/release/rome/3.0.1/': {'created': 22473},
  '/build/trunk/': {'created': 2919},
  '/src/trunk/': {'created': 9353},
  ...

The 'created' revision refers to the revision that the root was created
in. That's important, 'cause we store special metadata against the root
in the revprop for the revision it was created in:

% svn pg --revprop -r9353 svn://client.com/repos/foo
  ...
  '/src/trunk/': {
     'copies': {
         9834: [('/src/branches/2.1/', 9835)],
         9997: [('/src/branches/bugzilla/2800/', 9998)],
         10211: [('/src/branches/bugzilla/3326/', 10212)],
         10252: [('/src/branches/bugzilla/2160/', 10253)],
         10468: [('/src/branches/2.2/', 10469)],
         11148: [('/src/branches/2.3/', 11149)],
         11420: [('/src/branches/bugzilla/3720/', 11421)]},
     'created': 9353,
     'creation_method': 'created'},
  ...

i.e. we store all the subsequent forward-copies of this root, as well as
details of how it was created (which isn't very interesting in this
example, as it's trunk and was created via mkdir, but if it were a
branch or tag, it would contain details about where it was copied from).

Let's say I delete /src/trunk in r26504. The entry for it in evn:roots
in that revision will be gone; but a note will be made against the r9353
creation revprop to indicate which rev it was deleted in.

The importance of storing data like this becomes apparent when you deal
with situations like this:

  *hooks are turned off*
     r2: svn cp ^/trunk ^/branches/foo
     r3: svn rm ^/branches/foo
     r4: svn mkdir ^/branches/foo
  *repo is analysed, evn:roots are set, hooks are turned on*

An attempt to do the following would be blocked, because r4/HEAD of
/branches/foo was not created correctly (i.e. wasn't copied from an
existing root), and thus, isn't considered a root either:

svn cp /branches/foo /branches/bar

However, the following *would* work, because /branches/foo *was* a valid
root in r2:

svn cp -r2 /branches/foo /branches/bar

Thoughts?

Trent.
Received on 2011-10-10 19:39:44 CEST

This message: [ Message body ]
Next message: Paul Burba: "Re: svn commit: r1180154 - in /subversion/trunk/subversion: include/svn_sorts.h libsvn_client/merge.c libsvn_subr/mergeinfo.c libsvn_subr/sorts.c tests/libsvn_subr/mergeinfo-test.c"
Previous message: Julian Foad: "RE: svn commit: r1181040 - svn_ra_check_path2() in branches/tree-read-api ..."
In reply to: Julian Foad: "Re: Identifying branch roots"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]