Re: RFC: Standardizing a 'svn:branch' (boolean) property

From: Trent Nelson <trent_at_snakebite.org>
Date: Mon, 16 Jul 2012 22:14:06 -0700

On 7/16/12 8:57 PM, "Johan Corveleyn" <jcorvel_at_gmail.com> wrote:

>On Mon, Jul 16, 2012 at 3:33 PM, C. Michael Pilato <cmpilato_at_collab.net>
>wrote:
>> On 07/16/2012 08:11 AM, Bert Huijben wrote:
>>> As we couldn't think of a usage of the content I would suggest that we
>>>just
>>> always set the property to '*', just like how we handle svn:executable,
>>> svn:needs-lock, etc. This would also make sure that merges of this
>>>property
>>> won't need special handling.
>>
>> +1. Let's get the mechanics for recognizing branch roots in place
>>first.
>> We can worry about additional policy matters later when we have a better
>> idea what our users might require.
>
>+1 to some support for branch identification. But I'm not sure if such
>a simple property will provide that in a good way. OTOH I don't have a
>better suggestion now.
>
>First, a couple of use cases I have in mind if branch-roots can be
>identified:

I've noticed this thread has a lot of focus on "how to identify branch
roots" but not so much on what to do with that information. What are the
specific use cases we want to address?

In my case, I had a client with a *huge* Subversion deployment; nearly a
thousand production repos, tens of thousands of sub-teams, hundreds of
gigabytes of data, etc.

They had an awful experience with merging in particular -- one of the most
prominent teams had to essentially stop development for a few weeks whilst
the SCM admins tried to unbreak their mergeinfo (amongst other things).

50+ developers at 10% productivity x 2+ weeks = expensive.
The-CIO-has-heard-about-this-and-is-super-pissed expensive.

My remit was simple: prevent this sort of problem from ever happening
again. From that, I extrapolated I needed to block a good 20-30 different
types of commits that contributed to "repo entropy". For example:

    - TagDirectoryCreatedManually
    - BranchDirectoryCreatedManually
    - BranchRenamedToTrunk
    - TrunkRenamedToBranch
    - TrunkRenamedToTag
    - BranchRenamedToTag
    - BranchRenamedOutsideRootBaseDir
    - TagSubtreePathRemoved
    - RenameAffectsMultipleRoots
    - UncleanRenameAffectsMultipleRoots
    - MultipleRootsCopied
    - TagCopied
    - UncleanCopy
    - FileRemovedFromTag
    - CopyKnownRootSubtreeToValidAbsRootPath
    - MixedRootsNotClarifiedByExternals
    - CopyKnownRootToIncorrectlyNamedRootPath
    - CopyKnownRootSubtreeToIncorrectlyNamedRootPath
    - UnknownPathRenamedToIncorrectlyNamedNewRootPath
    - RenamedKnownRootToIncorrectlyNamedRootPath
    - MixedChangeTypesInMultiRootCommit
    - CopyKnownRootToKnownRootSubtree

(Full list of events currently blocked by Enversion:
http://people.apache.org/~trent/events.py)

Other requirements:

    - Minimal administrative overhead. Requiring administrators to
      manually specify branch roots (or trying to use regexes when each
      repo had an entirely different layout) does not scale when you're
      dealing with thousands of repositories. Ditto for requiring dump/
      load dances.

- 100% accuracy for branch identification.

    - No false negatives. Confidence in Subversion was at an all time
      low -- many teams were threatening to ditch it and set up their
      own P4/git repo. If anything was introduced that made the user
      experience more painful, there would be anarchy.

With all of those requirements set in stone, I came up with the evn:roots
revprop approach. Which, I'm happy to report, has been chugging along in
production at this client's site for about 18 months now. Enversion will
now block some... $(cat events.py | grep '^class ' | wc -l)... 100
different types of commits that contribute to "repo entropy".

For the record, here's an outline of Enversion's evn:roots approach. It
took a few failed attempts before I came up with the design below... but,
as I mentioned, it's been in production for 18+ months on just under a
thousand repos and has met all the original requirements, so I'm pretty
happy with it.

1. Analyze the repository via `evnadmin analyze <repo>`. This
processes rev 0 to HEAD sequentially.

    2. "In the beginning, there was /trunk". I'm amazed how much
       mileage I got out of this idiom. Essentially, the only way
       to 'create' a root from scratch is to `svn mkdir .*/trunk`.

    3. Once a .*/trunk mkdir is detected, an evn:roots entry is added
       for it in the revprop it was created in. For example, after
       analyzing r1 of the trac repo:

        % svn pg evn:roots --revprop -r1 `gru trac`
        {'/trunk/': {'copies': {},
                     'created': 1,
                     'creation_method': 'created'}}

4. When processing the next revision, the roots from the previous
revprop are inherited in a simplified format:

% svn pg evn:roots --revprop -r2 `gru trac`
{'/trunk/': {'created': 1 }}

i.e. the root name and the revision it was created in.

5. Analysis of each subsequent revision always inherits the previous
revision's roots (in the simplified format).

    6. With an up-to-date, definitive list of repository roots on hand
       each time we process a new revision, we can easily detect if a
       revision affects a root. A root can be affected in the following
       ways:

            - Copied (directly and indirectly).
            - Renamed (directly and indirectly).
            - Replaced (directly and indirectly).
            - Removed (directly and indirectly).

During analysis, we process the revision and update the roots
regardless of the action. However, once analysis is complete

and the hooks are enabled, we can block all the crazy stuff.

        This is an important point -- even though the end goal is to
        eventually block dodgy commits, we have to process such commits
        during analysis and update evn:roots accordingly. You have no
        idea how complicated this actually is. There are about seven
        extreme corner cases that Enversion still bombs out on --
        commits that I never would have thought even remotely possible
        until I saw them in the wild.

     7. Once we detect a root is affected, evn:roots is updated
        accordingly. In trac_at_r175, a new tag is created. Specifically,
        trunk_at_175 is copied to /tags/trac-0.5-rc1. That results in two
        changes.

        First, the evn:roots of r175's revprop includes the new root:
            % svn pg evn:roots --revprop -r175 `gru trac`
            {'/tags/trac-0.5-rc1/': {'copied_from': ('/trunk/', 174),
                                     'copies': {},
                                     'created': 175,
                                     'creation_method': 'copied'},
             '/trunk/': {'created': 1}}

        Second, we record that trunk was copied. This sort of metadata
        is always stored back in the revprop where the root was created,
        in this case, r1:

            % svn pg evn:roots --revprop -r1 `gru trac`
            {'/trunk/': {'copies': {174: [('/tags/trac-0.5-rc1/', 175)]},
                         'created': 1,
                         'creation_method': 'created'}}

As analysis continues, the entry for /trunk/ in r1's evn:roots
gets continually updated with relevant actions that affect it.

        If a root is detected as being removed (directly or indirectly)
        during analysis, a note is made in the originating evn:roots
        revprop that it was deleted (with reference to the rev it was
        removed in, and the type of removal (i.e. removed directly,
        removed indirectly due to ancestory path being removed, etc),
        and the root will no longer be inherited in future evn:roots.

As for Enversion, the good news is that it's free, open source, Apache 2.0
licensed and available on github. The bad news is that it's poorly
documented at the moment and the installation is a bit fiddly:

% git clone https://github.com/tpn/enversion.git
Cloning into 'enversion'...
remote: Counting objects: 56, done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 56 (delta 6), reused 55 (delta 5)
Unpacking objects: 100% (56/56), done.
% export PYTHONPATH=$PYTHONPATH:`pwd`/enversion
% export PATH=$PATH:`pwd`/enversion/bin
% evnadmin
Type 'evnadmin <subcommand> help' for help on a specific subcommand.

Available subcommands:
    analyze
    create
    disable-remote-debug (drd)
    doctest

[snip]

If you get that far, you'll be able to create a new Enversion-enabled
repository, or analyze an existing one. See (the incredibly terse)
https://github.com/tpn/enversion/blob/master/doc/quick-start.rst for a few
more hints.

FWIW, Snakebite sucks up 110% of my time at the moment, so I'm having to
neglect Enversion a bit. I'll be ramping back up on it soon, though. I'd
still love to hear from people having a play with it. It's production
ready from a functionality perspective, but definitely alpha quality from
a installer/documentation/docstrings/unit-tests perspective. Although
that's primarily a result of only being funded for 20 days rather than
negligence on my part ;-)

Regards,

Trent.
Received on 2012-07-17 07:14:44 CEST

This message: [ Message body ]
Next message: Trent Nelson: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"
Previous message: Greg Stein: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"
In reply to: Johan Corveleyn: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"
Next in thread: Trent Nelson: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"
Reply: Trent Nelson: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"
Reply: Branko ÄŒibej: "Re: RFC: Standardizing a 'svn:branch' (boolean) property"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]