Re: Commit Hooks Asynchronous?

From: Trent Nelson <trent_at_snakebite.org>
Date: Fri, 3 Feb 2012 19:51:49 -0500

On Fri, Feb 03, 2012 at 04:11:56PM -0800, Cory Finger wrote:
> Is there a way in subversion to disable the ability of the post-commit
> hook to run concurrently?
>
> Say Jim and Bob both commit a file at nearly the same time, I would
> like to disable the asynchronous capabilities of commit hooks so that
>
> Jim's commit hooks process, then Bob's commit hooks process.

There's no way of doing that out-of-the-box, as far as I know.

    You *can* manually enforce sequential access by blocking from within
    the context of a start-commit or pre-commit hook, but... well... it
    turns into the mother of all race conditions pretty quickly.

    I've got a hook framework* that analyzes commits; pre-commits are
    analyzed to see if they should be blocked based on a set of rules,
    and post-commits are used to update metadata about the commit which
    is stored in the said post-commit's revision property.

    The revprop metadata is inherently sequential; r6 depends on r5,
    etc. This design decision bit me in the ass a few months later
    because of the asynchronous nature of hooks; I was getting into
    situations where r6's post-commit would run before I'd finish
    processing r5.

    The end result was masses of ugly knee-jerk locking code, which I've
    included below as a testament to how fiddly it can be trying to
    enforce order with commits.

[*]: framework is called 'Enversion' (Enterprise Subversion). I'm in
     the process of getting it ready to open source under an Apache 2.0
     license. I mention it in this thread if you're interested in what
     it attempts to do:

http://mail-archives.apache.org/mod_mbox/subversion-dev/201110.mbox/%3C4E9315DB.20704@snakebite.org%3E

Regards,

Trent.

Hideous locking code below. Note that this isn't indicative of the
general code quality ;-)

--
    def _init_evn(self):
        self.is_rev_for_empty_repo = self.is_rev and self.rev == 0
        self.is_rev_for_first_commit = self.is_rev and self.rev == 1
        self.is_txn_for_first_commit = self.is_txn and self.base_rev == 0
        self.is_normal_rev_or_txn = (
            (self.is_txn and self.base_rev >= 1) or
            (self.is_rev and self.rev > 1)
        )
        # Test mutually-exclusive invariants.
        assert (
            (self.is_rev_for_empty_repo and (
                not self.is_txn_for_first_commit and
                not self.is_rev_for_first_commit and
                not self.is_normal_rev_or_txn
            )) or (self.is_rev_for_first_commit and (
                not self.is_rev_for_empty_repo and
                not self.is_txn_for_first_commit and
                not self.is_normal_rev_or_txn
            )) or (self.is_txn_for_first_commit and (
                not self.is_rev_for_empty_repo and
                not self.is_rev_for_first_commit and
                not self.is_normal_rev_or_txn
            )) or (self.is_normal_rev_or_txn and (
                not self.is_rev_for_empty_repo and
                not self.is_rev_for_first_commit and
                not self.is_txn_for_first_commit
            ))
        )
        rc0 = self.r0_revprop_conf
        if self.is_rev_for_empty_repo and 'version' not in rc0:
            rc0.last_rev = 0
            rc0.version = ESVN_VERSION
        version = self._load_evn_revprop_int('version', 1)
        if version != ESVN_VERSION:
            self.die(e.VersionMismatch % (ESVN_VERSION, version))
        if version == 1:
            self._init_evn_v1()
        else:
            raise UnexpectedCodePath
    def _init_evn_v1(self):
        if self.is_rev_for_empty_repo:
            return
        if self.is_txn_for_first_commit or self.is_rev_for_first_commit:
            assert self.base_rev == 0
            k = Dict()
            if self.is_txn:
                k.base_rev = 0
            else:
                k.rev = 1
            c = self.rconf(**k)
            self.__roots = (c, {})
            return
        assert self.base_rev > 0
        if self.is_rev:
            assert self.rev >= 2
        else:
            assert self.base_rev >= 1
        max_revlock_waits = self.conf.get('general', 'max-revlock-waits')
        # Quick sanity check of last_rev to make sure it's not higher than the
        # highest rev of the repository.
        self._reload_last_rev()
        highest_rev = svn.fs.youngest_rev(self.fs, self.pool)
        if self.last_rev > highest_rev:
            self.die(e.LastRevTooHigh % (self.last_rev, highest_rev))
        if self.is_rev and isinstance(self, RepositoryHook):
            with open(self.rev_lockfile, 'w') as f:
                f.write(str(os.getpid()))
                f.flush()
                f.close()
        if self.good_last_rev and self.good_base_rev_roots:
            self._last_rev_and_base_rev_roots_are_good()
            return
        dbg = self.log.debug
        a = (str(self.rev_or_txn), self.base_rev, self.last_rev)
        dbg('rev_or_txn: %s, base_rev: %d, last_rev: %d' % a)
        found = False
        fn = self.base_rev_lockfile
        count = itertools.count()
        while True:
            c = count.next()
            dbg('looking for revlock file: %s (attempt: %d)' % (fn, c))
            if os.path.isfile(fn):
                found = True
                dbg('found revlock after %d attempts' % c)
                break
            time.sleep(1)
            if c == max_revlock_waits:
                dbg('no revlock found after %d attempts' % c)
                break
        if self.good_last_rev and self.good_base_rev_roots:
            self._last_rev_and_base_rev_roots_are_good()
            return
        if found:
            s = None
            try:
                s = open(fn, 'r').read()
            except:
                pass
            if not s:
                dbg('failed to open/read lock file %s' % fn)
            else:
                try:
                    pid = int(s)
                    dbg("pid for lock file %s: %d" % (fn, pid))
                except:
                    dbg("invalid pid for lock file %s: %s" % (fn, s))
                else:
                    count = itertools.count()
                    still_running = pid_exists(pid)
                    while still_running:
                        a = (count.next(), pid)
                        dbg("[%d] pid %d is still running" % a)
                        time.sleep(1)
                        still_running = pid_exists(pid)
        good_last_rev = self.good_last_rev
        good_base_rev_roots = self.good_base_rev_roots
        if good_last_rev and good_base_rev_roots:
            self._last_rev_and_base_rev_roots_are_good()
            return
        if not good_last_rev:
            self._last_rev_is_bad()
        if not good_base_rev_roots:
            self._base_rev_roots_is_bad()
        raise UnexpectedCodePath
    @property
    def good_last_rev(self):
        self._reload_last_rev()
        if isinstance(self, RepositoryHook):
            return bool(self.last_rev == self.base_rev)
        else:
            assert self.is_rev
            return bool(self.last_rev >= self.base_rev)
    @property
    def good_base_rev_roots(self):
        brc = self.base_revprop_conf
        brc._reload()
        return bool(isinstance(brc.roots, Roots))
    def _last_rev_is_bad(self):
        if self.is_rev:
            a = (self.rev, self.base_rev, self.last_rev, self.latest_rev)
            if isinstance(self, RepositoryHook):
                m = e.LastRevNotSetToBaseRevDuringPostCommit
            else:
                m = e.OutOfOrderRevisionProcessingAttempt
        else:
            assert self.is_txn
            a = (self.base_rev, self.last_rev, self.latest_rev)
            if self.base_rev < self.last_rev:
                m = e.StaleTxnProbablyDueToHighLoad
            else:
                m = e.RepositoryOutOfSyncTxn
        self.die(m % a)
    def _base_rev_roots_is_bad(self):
        if self.is_rev:
            a = (self.rev, self.base_rev, self.last_rev, self.latest_rev)
            if isinstance(self, RepositoryHook):
                m = e.RootsMissingFromBaseRevDuringPostCommit
            else:
                m = e.InvalidRootsForRev
            self.die(m % a)
        else:
            assert self.is_txn
            a = (self.txn_name, self.base_rev, self.last_rev, self.latest_rev)
            m = e.RootsMissingFromBaseRevTxn
        self.die(m % a)
    def _last_rev_and_base_rev_roots_are_good(self):
        try:
            os.unlink(self.base_rev_lockfile)
        except:
            pass
        brc = self.base_revprop_conf
        assert isinstance(brc.roots, Roots)
        if self.is_txn:
            self.__roots = (brc, brc.roots)
            return
        assert self.is_rev
        rc = self.revprop_conf
        # XXX force override for now.
        self.__roots = (rc, self._inherit_roots(brc.roots))
        return
        if rc.roots is not None:
            # We must be re-processing an already-processed revision.
            assert isinstance(rc.roots, Roots)
            self.__roots = (rc, dict(rc.roots))
        else:
            # Processing a revision for the first time outside the context
            # of a post-commit hook, i.e. repo is being analysed.  As with
            # the post-commit processing logic above, we bring over a sim-
            # plified version of the base_rev's roots via _inherit_roots.
            self.__roots = (rc, self._inherit_roots(brc.roots))

Received on 2012-02-04 01:52:24 CET

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: Commit Hooks Asynchronous?"
Previous message: Cory Finger: "Re: Commit Hooks Asynchronous?"
In reply to: Cory Finger: "Re: Commit Hooks Asynchronous?"
Next in thread: Branko ÄŒibej: "Re: Commit Hooks Asynchronous?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]