what proofs look like (relates to why i find you guys ridiculous to try to work with in spite of the fact we both work on "open source")

From: Tom Lord <lord_at_regexps.com>
Date: 2002-08-10 05:12:00 CEST

This is a _pretty good_ example of a proof:

From: Tom Lord <lord@regexps.com>
To: arch-dev@regexps.com
Subject: [arch-dev] towards formal patch set semantics
Sender: arch-dev-admin@regexps.com
X-BeenThere: arch-dev@regexps.com
X-Mailman-Version: 2.0.9
Precedence: bulk
List-Help: <mailto:arch-dev-request@regexps.com?subject=help>
List-Post: <mailto:arch-dev@regexps.com>
List-Subscribe: <http://www.regexps.com/mailman/listinfo/arch-dev>,
<mailto:arch-dev-request@regexps.com?subject=subscribe>
List-Id: a discussion list for arch developers <arch-dev.regexps.com>
List-Unsubscribe: <http://www.regexps.com/mailman/listinfo/arch-dev>,
<mailto:arch-dev-request@regexps.com?subject=unsubscribe>
List-Archive: <http://www.regexps.com/pipermail/arch-dev/>
Date: Mon, 8 Jul 2002 14:03:16 -0700 (PDT)
X-UIDL: '$9"!0GF"!a7X!!-W_"!

! How Whole Tree Patching Works

This note contains a formalized description of how whole tree
patch sets handle tree rearrangements.

  There are two limitations: the math of the formalization is "pretty
  good", but could use a little more precision for a final draft
  patch set standard; only the (interesting) problem of tree
  rearrangements is addressed -- other details of patching aren't
  addressed (yet).

This is long and rather dry with lots of equations -- so it isn't
for all readers.

  On the other hand, there's an important theorem proven here
  ({theorem: rearrangement inventory equivalence}) whose significance
  is that it proves that our patch set format contains enough
  information for exact patching (in spite of not containing complete
  inventories for the `orig' and `mod' trees).

  Additionally, if you want to try formally defining operations such
  as patch set composition, this document contains enough formalism
  for you to work out how to compose tree rearrangements.

* overview

In whole tree patching, we have three filesystem trees: `orig',
`mod', and `target'.

First, we'll compare `orig' and `mod' to produce a data stream
called a "patch set" the describes the changes between them.

Second, we'll "apply" the patch set to the target -- which means
we want to make the same set of changes to the target tree.

  Two difficulties arise: one, the `target' tree may not be identical
  to the `orig' tree, in which case it isn't a priori clear what it
  means to apply the "same changes"; two, we want to include file tree
  structural rearrangements among the changes we record when computing
  patch sets and replay while applying them.

* the focus of this document is on tree-structure patching

** patch sets have two parts

A whole tree patch set has two parts:

*) a set of __individual file changes__
*) a __tree rearrangement__

** individual file changes

  Patch set contents for individual programs are created by programs
  such as `diff' and applied by programs such as `patch'.
  The details are fairly banal and aren't important to this note.

** tree rearrangements

  Tree rearrangements are based on comparing `orig' and `mod' to see
  how files and directories have been rearranged, and
  applying that "same" transformation to `target'.

  The problem is defining what "same" means when the target tree
  does not precisely match the original tree. So, exploring that
  topic is what comes next:

* what's a tree structure (formally)?

Let's do some intuitive math.

** tag sets

<<<
        DirTags := "the set of directory tags"
        FileTags := "the set of file tags"
        SymlinkTags := "the set of symlink tags"

AllTags := DirTags union FileTags union SymlinkTags
>>>

  The `StudlyCaps' words are set names. The `:=' means "is defined to
  be". The definition is in quotes which means that it's an
  informal definition -- you're supposed to be able to fill in the
  details for yourself (e.g., you know what a tag string is because
  you are familiar with `arch').

** Tag Set Disjointness Axiom

<<<
Axiom:

                   DirTags int FileTags
                == DirTags int SymlinkTags
                == FileTags int SymlinkTags
                == -empty-
>>>

`int' is the set intersection operator. `-empty-' is the
empty set. Along with the axiom, we can define some predicates:

*** tag predicates

<<<
        dir_tag?(x) := x member DirTags
        file_tag?(x) := x member FileTags
        symlink_tag?(x) := x member SymlinkTags
>>>

** relative path names

<<<
RelPaths := "the set of simplified down-relative paths"
>>>

  `Simplified' means that the paths do not contain `./' or `/..'
  components. `down-relative' means that the the paths do not
  begin with '/' or `..'.

** so what's a tree (structure)?

Each of our three trees is characterized by an _inventory_ function
whose type we approximately know:

<<<
        inventory_orig : AllTags <-> RelPaths
        inventory_mod : AllTags <-> RelPaths
        inventory_target : AllTags <-> RelPaths
>>>

(The notation means that each `inventory_*' is an invertible
function between a subset of `AllTags' and a subset of `RelPaths'.)

The precise domains and ranges of those functions are of
interest:

<<<
        orig_tags := Dom (inventory_orig)
        mod_tags := Dom (inventory_mod)
        target_tags := Dom (inventory_target)

        orig_paths := Rng (inventory_orig)
        mod_paths := Rng (inventory_mod)
        target_paths := Rng (inventory_target)

>>>

** comparing two trees

Let's start to compare `orig' and `mod' in order to produce a
patch set.

*** interesting sets while comparing trees

A few interesting sets are:

<<<
        removed_tags := Dom (inventory_orig) - Dom (inventory_mod)
        added_tags := Dom (inventory_mod) - Dom (inventory_orig)
        common_tags := Dom (inventory_mod) int Dom (inventory_orig)
>>>

  The sets `added_' and `removed_tags' are the deleted and new files,
  symlinks and directories. The set `common_tags' are files found
  in both `orig' and `mod' (whether moved or not).

*** lemma: disjointness of tree comparison sets

<<<
        Lemma:
                   removed_tags int added_tags
                == removed_tags int common_tags
                == added_tags int common_tags
                == -empty-

Proof:

From the definitions, we know that:

                [i] removed_tags == A - B
                [ii] added_tags == B - A
                [iii] common_tags == A int B

From that it follows that:

[iv] removed_tags int B == empty (from [i])
[v] added_tags int A == empty (from [ii])

                  [vi] removed_tags is_subset A (from [i])
                  [vi] added_tags is_subset B (from [ii])
                  [vii] common_tags is_subset A (from [iii])
                  [viii] common_tags is_subset B (from [iii])

From that it follows that __each of the following sets
is `-empty-'__:

                [ix] removed_tags int added_tags (from [iv] + [vi])
                [x] removed_tags int common_tags (from [iv] + [viii])
                [xi] added_tags int added_tags (from [v] + [vii])

QED
>>>

*** the set of moved files, directories, and symlinks

Another set of interest is:

<<<
        moved_tags := { x |
                                 (x is_in common_tags)
                                and
                                 (inventory_orig(x) != inventory_mod(x))
                           }
>>>

*** lemma: disjointedness of interesting tree comparison sets

<<<
        Lemma:
                   removed_tags int added_tags
                == removed_tags int moved_tags
                == added_tags int moved_tags
                == -empty-

Proof:

From the definitions, we know that:

moved_tags is_subset common_tags

thus this lemma follows trivially from
the lemma "disjointness of tree comparison sets".

QED
>>>

*** the two tree-relative sets of interesting tags

  We just constructed three sets of tags: `added_', `removed_', and
  `moved_' tags. Not all of those tags are present in both the `orig'
  and `mod' trees. Just these:

<<<
orig_rearrangement_tags := removed_tags union moved_tags
mod_rearrangement_tags := added_tags union moved_tags
>>>

*** theorem: rearrangement set equivalences

  This theorem is useful because it means that given only
  the two `_rearrangement_tags' sets, we can recover the
  three `added_/remove_/moved_' sets.

<<<
Theorem:

[I] moved_tags == orig_rearrangement_tags int mod_rearrangement_tags

          [II] removed_tags == orig_rearrangement_tags - mod_rearrangement_tags

          [III] added_tags == mod_rearrangement_tags - orig_rearrangement_tags

Proof:

With shorthand:

                A := added_tags
                M := moved_tags
                R := removed_tags

G := orig_rearrangement_tags
D := mod_rearrangement_tags

of [I]...

[i] (from defs)

(G int D)
== ((R union M) int (A union M))

           [ii] (from set algebra)

== (A int (R union M)) union (M int (R union M))
== (A int R) union (A int M) union M

[iv] (from {lemma: disjointedness of interesting tree comparison sets})

== -empty- union -empty- union M
== M

of [II] (and [III] by symmetry):

[i] (from defs)

(G - D)
== (R union M) - (A union M)

[ii] (from set algebra)

== R - (A union M)
== (R - A) - M

[iii] (from {lemma: disjointedness of interesting tree
comparison sets})

== R - M
== R

QED

>>>

*** interesting functions while comparing trees

Given those sets, we can define some new sets functions by restricting
the domains of the `inventory_' functions:

<<<

rearrangement_inventory_orig :=
restrict(inventory_orig, orig_rearrangement_tags)

rearrangement_inventory_mod :=
restrict(inventory_mod, mod_rearrangement_tags)

>>>

*** theorem: rearrangement inventory equivalence

Here is an interesting and really important equation:

<<<
Theorem:

                inventory_mod(x)

             == by-cases:

                -undefined- if x is_in removed_tags
                rearrangement_inventory_mod(x) if x is_in moved_tags
                rearrangement_inventory_mod(x) if x is_in added_tags
                inventory_orig(x) otherwise

                inventory_orig(x)

             == by-cases:

                -undefined- if x is_in added_tags
                rearrangement_inventory_orig(x) if x is_in moved_tags
                rearrangement_inventory_orig(x) if x is_in removed_tags
                inventory_mod(x) otherwise

Proof:

[omitted for brevity but simple]
>>>

  What that tells us is that if we have `rearrangement_inventory_mod',
  the `added_/removed_/moved_tags' sets, and `inventory_orig', then
  we can reconstruct `inventory_mod'.

Symmetrically, with `rearrangement_inventory_orig', the `added_/...'
sets, and `inventory_mod', we can reconstruct `inventory_orig'.

  What that means in english is that if we have a patch set plus a
  copy of either `orig' or `mod', and the patch contains rearrangement
  inventories for the `orig' and `mod' trees, then we have all the
  information we need for forwards or backwards exact tree
  rearrangements. (We know we have all the information we need
  because the theorem above tells us how to reconstruct all of the
  relevant information there is from just the inventory of the tree we
  have plus the contents of the patch set.)

* what goes in a patch set?

  Patch set syntax (described elswhere) includes `rename' directives
  and `change' directives. Collectively, these include (as a subset
  of the useful information they contain), specifications
  of the `rearrangement_inventory_orig' and
  `rearrangement_inventory_mod' tables.

That design is at least _plausible_ since (as we just proved) it is
sufficient for forwards and backwards exact tree rearrangements.

  Whether or not it's sufficient for inexact patching is a matter
  of subjective judgement. The semantics of inexact patching
  haven't been formally (only informally) specified so far. Is it
  a usefully specified semantics?

* what's missing?

  Still needed are formal models of (in terms of impact on inventory and
  file contents) inexact tree rearrangements and individual file change
  application.

-----
Offer feedback/support:
http://svcs.affero.net/rm.php?m=lord%40regexps%2Ecom&ls=towards%20formal%20patch%20set%20semantics&ll=arch-dev@regexps.com

_______________________________________________
arch-dev mailing list
arch-dev@regexps.com
http://www.regexps.com/mailman/listinfo/arch-dev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Aug 10 05:05:10 2002

This message: [ Message body ]
Next message: Peter Davis: "Deleting an Added Directory"
Previous message: Mix: "Re: Default ignores, cmdline ignores"
Next in thread: Greg Hudson: "Re: what proofs look like"
Reply: Greg Hudson: "Re: what proofs look like"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]