Re: isn't variance adjusted patching horribly dangerous?

From: Tom Lord <lord_at_emf.net>
Date: 2003-04-09 02:46:41 CEST

Fixing the same bug in two different ways is a mistake, it
should be fixed in one place and merged to the other.

That detail of the example -- that a bug was fixed in two different
ways -- is in no way essential to the analysis. I mentioned it only
to make it easier to read all the code fragments without getting lost.

We could easilly construct a realistic example that had nothing to do
with redundant bug fixes. For example, the context lines may be
obtaining and releasing various locks -- the code in the middle
requiring certain locks to be held. A patch from T:19-T:20 that
relies one one set of locks being held may, by the same merging trick,
be silently applied to a B:15 in which different locks are held.

        Context diffs could be viewed as a safety measure, but they
        can also be viewed as way of allowing a patch to be applied to
        a modified source. Personally I value the latter more than
        the former.

[....]

        Whether it is a bug depends on the interpretation of the code,
        and that is beyond the scope of version control. There will
        be cases where adjacent changes should behave exactly as you
        show above.

Indeed there may be. The context mechanism and the finite window for
hunk offsets are proven good heuristics for detecting cases where a
tool, such as `patch' or a version control system, should not presume to
guess whether or not the changes should be combined. It was
considered quite a cool thing -- what 15 or 20 years ago? -- when
those heuristics were realized. Variance adjusted patching is a
mechanism designed specifically to disable those heuristics in exactly
the kind of context where they are almost always winning.

That said, you've given me a good idea for something better than
variance adjusted patching. When adjusting a patch, there are two
kinds of changes made: the line numbers of hunks may be altered, and
the contents of hunks may be altered.

A variation on variance adjusted patching would follow these three
rules:

1) Whenever the _contents_ of a hunk are modified, the
hunk is marked "soft conflict".

        2) Whenever the line numbers of a hunk are modified
           such that, instead of being a mere performance enhancement,
           application of the hunk would violate the usual search
           window of the patch algorithm, the hunk is marked "soft
           conflict".

        3) The outcome of patching has three, instead of two kinds of
           result for each hunk. A hunk may generate a conflict or
           apply cleanly, as before -- but if a hunk is marked "soft
           conflict", then it is treated as a conflict, but the
           conflict record is distinct from ordinary conflicts. (In
           terms of conflict markers, instead of "mine" and "your"
           sections, you'd have "mine", "yours", and "variance
           adjusted".)

Those rules would give users the opportunity to review those cases
where variance adjusted patching has overridden the usual semantics of
patch, but they would also be able to suggest the result of the
current variance adjusted patching algorithm as a resolution.

        It's not as if merge affects the repository, the user gets to
        compare the sources of the merge before merging, and also gets
        to view the result of the merge before committing.

Indeed they do. But such reviews always imply a cost/benefit
trade-off. In a large merge, the cost of a detailed review can be
quite high. Context is used to help optimize the use of resources
dedicated to review. Conflicts highlight areas of the code where,
probabilistically, detailed review and human intervention is most
appropriate. Soft conflicts, as described above, could enhance this.

        It boils down to the question: how often should adjacent,
        non-overlapping changes result in a conflict? I prefer the
        answer "never", you appear to prefer "always". I don't
        suppose either of us can prove which is the most likely to be
        correct in the real world.

There is considerable empirical evidence that I'm basically right.
For example, for how many years now has the documentation for Larry
Wall's patch suggested -- what is it ... uh -- three lines of context?
Or for how many years has that been the default in various versions of
diff? And in that time, how much folklore has spread around to the
effect that that's a lousy default and you should override those
parameters when using those tools?

Just in general, textual patching is inherently risky business.
Variance adjusted patching increases that risk in some notable ways.
While for many projects, the costs of introducing a bug are
inconsequential, for many others, the costs are quite high. It seems
to me to be a cavalier attitude to regard these costs as a mere matter
of subjective opinion -- and a more sensible approach to manage them
with care.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 9 02:37:55 2003

This message: [ Message body ]
Next message: Brian Denny: "Re: The `on_disk' and `in_repos' templates."
Previous message: Brian Denny: "Re: The `on_disk' and `in_repos' templates."
In reply to: Philip Martin: "Re: isn't variance adjusted patching horribly dangerous?"
Next in thread: Philip Martin: "Re: isn't variance adjusted patching horribly dangerous?"
Reply: Philip Martin: "Re: isn't variance adjusted patching horribly dangerous?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]