This is a _pretty good_ example of a proof:
From: Tom Lord <lord@regexps.com>
To: arch-dev@regexps.com
Subject: [arch-dev] towards formal patch set semantics
Sender: arch-dev-admin@regexps.com
X-BeenThere: arch-dev@regexps.com
X-Mailman-Version: 2.0.9
Precedence: bulk
List-Help: <mailto:arch-dev-request@regexps.com?subject=help>
List-Post: <mailto:arch-dev@regexps.com>
List-Subscribe: <http://www.regexps.com/mailman/listinfo/arch-dev>,
<mailto:arch-dev-request@regexps.com?subject=subscribe>
List-Id: a discussion list for arch developers <arch-dev.regexps.com>
List-Unsubscribe: <http://www.regexps.com/mailman/listinfo/arch-dev>,
<mailto:arch-dev-request@regexps.com?subject=unsubscribe>
List-Archive: <http://www.regexps.com/pipermail/arch-dev/>
Date: Mon, 8 Jul 2002 14:03:16 -0700 (PDT)
X-UIDL: '$9"!0GF"!a7X!!-W_"!
! How Whole Tree Patching Works
This note contains a formalized description of how whole tree
patch sets handle tree rearrangements.
There are two limitations: the math of the formalization is "pretty
good", but could use a little more precision for a final draft
patch set standard; only the (interesting) problem of tree
rearrangements is addressed -- other details of patching aren't
addressed (yet).
This is long and rather dry with lots of equations -- so it isn't
for all readers.
On the other hand, there's an important theorem proven here
({theorem: rearrangement inventory equivalence}) whose significance
is that it proves that our patch set format contains enough
information for exact patching (in spite of not containing complete
inventories for the `orig' and `mod' trees).
Additionally, if you want to try formally defining operations such
as patch set composition, this document contains enough formalism
for you to work out how to compose tree rearrangements.
* overview
In whole tree patching, we have three filesystem trees: `orig',
`mod', and `target'.
First, we'll compare `orig' and `mod' to produce a data stream
called a "patch set" the describes the changes between them.
Second, we'll "apply" the patch set to the target -- which means
we want to make the same set of changes to the target tree.
Two difficulties arise: one, the `target' tree may not be identical
to the `orig' tree, in which case it isn't a priori clear what it
means to apply the "same changes"; two, we want to include file tree
structural rearrangements among the changes we record when computing
patch sets and replay while applying them.
* the focus of this document is on tree-structure patching
** patch sets have two parts
A whole tree patch set has two parts:
*) a set of __individual file changes__
*) a __tree rearrangement__
** individual file changes
Patch set contents for individual programs are created by programs
such as `diff' and applied by programs such as `patch'.
The details are fairly banal and aren't important to this note.
** tree rearrangements
Tree rearrangements are based on comparing `orig' and `mod' to see
how files and directories have been rearranged, and
applying that "same" transformation to `target'.
The problem is defining what "same" means when the target tree
does not precisely match the original tree. So, exploring that
topic is what comes next:
* what's a tree structure (formally)?
Let's do some intuitive math.
** tag sets
<<<
DirTags := "the set of directory tags"
FileTags := "the set of file tags"
SymlinkTags := "the set of symlink tags"
AllTags := DirTags union FileTags union SymlinkTags
>>>
The `StudlyCaps' words are set names. The `:=' means "is defined to
be". The definition is in quotes which means that it's an
informal definition -- you're supposed to be able to fill in the
details for yourself (e.g., you know what a tag string is because
you are familiar with `arch').
** Tag Set Disjointness Axiom
<<<
Axiom:
DirTags int FileTags
== DirTags int SymlinkTags
== FileTags int SymlinkTags
== -empty-
>>>
`int' is the set intersection operator. `-empty-' is the
empty set. Along with the axiom, we can define some predicates:
*** tag predicates
<<<
dir_tag?(x) := x member DirTags
file_tag?(x) := x member FileTags
symlink_tag?(x) := x member SymlinkTags
>>>
** relative path names
<<<
RelPaths := "the set of simplified down-relative paths"
>>>
`Simplified' means that the paths do not contain `./' or `/..'
components. `down-relative' means that the the paths do not
begin with '/' or `..'.
** so what's a tree (structure)?
Each of our three trees is characterized by an _inventory_ function
whose type we approximately know:
<<<
inventory_orig : AllTags <-> RelPaths
inventory_mod : AllTags <-> RelPaths
inventory_target : AllTags <-> RelPaths
>>>
(The notation means that each `inventory_*' is an invertible
function between a subset of `AllTags' and a subset of `RelPaths'.)
The precise domains and ranges of those functions are of
interest:
<<<
orig_tags := Dom (inventory_orig)
mod_tags := Dom (inventory_mod)
target_tags := Dom (inventory_target)
orig_paths := Rng (inventory_orig)
mod_paths := Rng (inventory_mod)
target_paths := Rng (inventory_target)
>>>
** comparing two trees
Let's start to compare `orig' and `mod' in order to produce a
patch set.
*** interesting sets while comparing trees
A few interesting sets are:
<<<
removed_tags := Dom (inventory_orig) - Dom (inventory_mod)
added_tags := Dom (inventory_mod) - Dom (inventory_orig)
common_tags := Dom (inventory_mod) int Dom (inventory_orig)
>>>
The sets `added_' and `removed_tags' are the deleted and new files,
symlinks and directories. The set `common_tags' are files found
in both `orig' and `mod' (whether moved or not).
*** lemma: disjointness of tree comparison sets
<<<
Lemma:
removed_tags int added_tags
== removed_tags int common_tags
== added_tags int common_tags
== -empty-
Proof:
From the definitions, we know that:
[i] removed_tags == A - B
[ii] added_tags == B - A
[iii] common_tags == A int B
From that it follows that:
[iv] removed_tags int B == empty (from [i])
[v] added_tags int A == empty (from [ii])
[vi] removed_tags is_subset A (from [i])
[vi] added_tags is_subset B (from [ii])
[vii] common_tags is_subset A (from [iii])
[viii] common_tags is_subset B (from [iii])
From that it follows that __each of the following sets
is `-empty-'__:
[ix] removed_tags int added_tags (from [iv] + [vi])
[x] removed_tags int common_tags (from [iv] + [viii])
[xi] added_tags int added_tags (from [v] + [vii])
QED
>>>
*** the set of moved files, directories, and symlinks
Another set of interest is:
<<<
moved_tags := { x |
(x is_in common_tags)
and
(inventory_orig(x) != inventory_mod(x))
}
>>>
*** lemma: disjointedness of interesting tree comparison sets
<<<
Lemma:
removed_tags int added_tags
== removed_tags int moved_tags
== added_tags int moved_tags
== -empty-
Proof:
From the definitions, we know that:
moved_tags is_subset common_tags
thus this lemma follows trivially from
the lemma "disjointness of tree comparison sets".
QED
>>>
*** the two tree-relative sets of interesting tags
We just constructed three sets of tags: `added_', `removed_', and
`moved_' tags. Not all of those tags are present in both the `orig'
and `mod' trees. Just these:
<<<
orig_rearrangement_tags := removed_tags union moved_tags
mod_rearrangement_tags := added_tags union moved_tags
>>>
*** theorem: rearrangement set equivalences
This theorem is useful because it means that given only
the two `_rearrangement_tags' sets, we can recover the
three `added_/remove_/moved_' sets.
<<<
Theorem:
[I] moved_tags == orig_rearrangement_tags int mod_rearrangement_tags
[II] removed_tags == orig_rearrangement_tags - mod_rearrangement_tags
[III] added_tags == mod_rearrangement_tags - orig_rearrangement_tags
Proof:
With shorthand:
A := added_tags
M := moved_tags
R := removed_tags
G := orig_rearrangement_tags
D := mod_rearrangement_tags
of [I]...
[i] (from defs)
(G int D)
== ((R union M) int (A union M))
[ii] (from set algebra)
== (A int (R union M)) union (M int (R union M))
== (A int R) union (A int M) union M
[iv] (from {lemma: disjointedness of interesting tree comparison sets})
== -empty- union -empty- union M
== M
of [II] (and [III] by symmetry):
[i] (from defs)
(G - D)
== (R union M) - (A union M)
[ii] (from set algebra)
== R - (A union M)
== (R - A) - M
[iii] (from {lemma: disjointedness of interesting tree
comparison sets})
== R - M
== R
QED
>>>
*** interesting functions while comparing trees
Given those sets, we can define some new sets functions by restricting
the domains of the `inventory_' functions:
<<<
rearrangement_inventory_orig :=
restrict(inventory_orig, orig_rearrangement_tags)
rearrangement_inventory_mod :=
restrict(inventory_mod, mod_rearrangement_tags)
>>>
*** theorem: rearrangement inventory equivalence
Here is an interesting and really important equation:
<<<
Theorem:
inventory_mod(x)
== by-cases:
-undefined- if x is_in removed_tags
rearrangement_inventory_mod(x) if x is_in moved_tags
rearrangement_inventory_mod(x) if x is_in added_tags
inventory_orig(x) otherwise
inventory_orig(x)
== by-cases:
-undefined- if x is_in added_tags
rearrangement_inventory_orig(x) if x is_in moved_tags
rearrangement_inventory_orig(x) if x is_in removed_tags
inventory_mod(x) otherwise
Proof:
[omitted for brevity but simple]
>>>
What that tells us is that if we have `rearrangement_inventory_mod',
the `added_/removed_/moved_tags' sets, and `inventory_orig', then
we can reconstruct `inventory_mod'.
Symmetrically, with `rearrangement_inventory_orig', the `added_/...'
sets, and `inventory_mod', we can reconstruct `inventory_orig'.
What that means in english is that if we have a patch set plus a
copy of either `orig' or `mod', and the patch contains rearrangement
inventories for the `orig' and `mod' trees, then we have all the
information we need for forwards or backwards exact tree
rearrangements. (We know we have all the information we need
because the theorem above tells us how to reconstruct all of the
relevant information there is from just the inventory of the tree we
have plus the contents of the patch set.)
* what goes in a patch set?
Patch set syntax (described elswhere) includes `rename' directives
and `change' directives. Collectively, these include (as a subset
of the useful information they contain), specifications
of the `rearrangement_inventory_orig' and
`rearrangement_inventory_mod' tables.
That design is at least _plausible_ since (as we just proved) it is
sufficient for forwards and backwards exact tree rearrangements.
Whether or not it's sufficient for inexact patching is a matter
of subjective judgement. The semantics of inexact patching
haven't been formally (only informally) specified so far. Is it
a usefully specified semantics?
* what's missing?
Still needed are formal models of (in terms of impact on inventory and
file contents) inexact tree rearrangements and individual file change
application.
-----
Offer feedback/support:
http://svcs.affero.net/rm.php?m=lord%40regexps%2Ecom&ls=towards%20formal%20patch%20set%20semantics&ll=arch-dev@regexps.com
_______________________________________________
arch-dev mailing list
arch-dev@regexps.com
http://www.regexps.com/mailman/listinfo/arch-dev
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Aug 10 05:05:10 2002