Re: Checkpointing mock-up option 3 (backed by a local repo)

From: Julian Foad <julianfoad_at_apache.org>
Date: Tue, 1 Aug 2017 14:19:24 +0100

Nathan Hartman wrote:
> Julian Foad wrote:
> Performing an 'update' with a checkpoint series is a bigger ask than it
> might at first seem. In effect, it requires rebasing the series of
> checkpoints on the new base, which gets ugly because of the need to
> handle conflicts (which is ugly enough already in the existing
> single-depth WC).
>
> Why does update with a checkpoint require rebasing? [...]

[tl;dr: Update fundamentally involves rebasing the working change.
Rebasing all the checkpoints as well is one approach but has drawbacks.
We examine other approaches.]

==========

Keep in mind that a revision -- or any versioned state -- can be thought
of in two ways: as a complete snapshot of the tree, or as representing a
change relative to the previous snapshot.

First let us be sure we understand that updating *is* rebasing. To
explain this, think about a plain WC with no checkpointing. The WC has a
base tree and a working tree. The work you do in a working copy is to
create a change based on the base. An 'update' is a request to update
the base and also to adjust the working version so that it represents
the 'same' change that it previously did, except the change is now
against the new base. Hence update is 'rebasing' your local change. The
adjustment of the old change (against the old base) to a new change
(against the new base) is accomplished by a merge.

Illustration: a plain old WC based on r20 is like a one-commit local
branch based on r20.

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ ||
[ ||
[ WC (BASE)-(WORK)

[(X) represents a tree snapshot. WORK means the working/on-disk state in
the WC, including what libsvn_wc calls 'actual'.]

Updating to r30 involves merging (r20:30) with (rBASE:WORK) to create
(WORK'), and setting the new base to r30.

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ ||
[ ||
[ WC (BASE')-(WORK')

==========

Now, what is a checkpoint series? We are exploring a couple of different
definitions.

1. The saved-patches definition: a series of patches, each representing
the state of the WC at the time it was saved, enabling an earlier state
of the WC to be recovered. (The 'option 1' design implements the patches
part of this, and may later be extended to record the WC base.)

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ ||
[ ||
[ WC (BASE)
[ \\\\_(P1)
[ \\\_...
[ \\_(Pn)
[ \_(WORK)

2. The local-branch definition: a series of changes, rather like
revisions in a repository (and implemented as such in 'option 3'
design). The zero'th checkpoint "P0" is (a copy of) what was the
original WC base tree. Then there are successive snapshots P1, P2... Pn
and then a working change which is currently being edited. This behaves
like you would expect from a private, local branch in Subversion based
on the original base point from the original repository.

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ ||
[ ||
[ WC (P0)-(P1)-...-(Pn=BASE)
[ \
[ \_(WORK)

Each of these two definitions of checkpointing suggests its own
conceptual approaches to updating.

1. For saved-patches checkpointing, we might update the WC base and
working version, and start saving any new checkpoints relative to the
new base.

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ || ||
[ || ||
[ WC (BASE) (BASE')
[ \\\\_(P1) \\_... <= (Pn+1) will go here
[ \\\_... \
[ \\_(Pn) \
[ \ \
[ \_(WORK) \_(WORK')

A consequence of different checkpoints having different bases is that
rolling back (or forward) to a checkpoint based on a different base
would require one of:

  * apply the patch using fuzzy matching
  * (if online) rebase the desired checkpoint at this time
  * (if online) update the WC base to the checkpoint's recorded base

An alternative would be that updating attempts to rebase every recorded
checkpoint patch, but see 2(a) below.

2. For local-branch checkpointing, there are at least two ways we could
update:

(a) Update P0 and successively rebase all the changes represented by the
checkpoints: rewrite P1 (by merging) to be based on P0, rebase P2 on P1,
and so on.

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ ||
[ ||
[ WC (P0')-(P1')-...-(Pn'=BASE')
[ \
[ \_(WORK')

This rebasing of multiple changes, which we can accurately think of as
rebasing this local branch, is a difficult place to go, given the
suboptimal state of conflict handling in our existing single-change WC.

In the saved-patches definition of checkpointing, rebasing all the
checkpoints on update would have similar drawbacks.

(b) Keep the existing checkpoints based on the older base, and bring in
the new base to be used only for new checkpoints. For this illustration,
assume we had no outstanding working changes before running the update.
We merge the incoming changes with the *complete* checkpoint series and
record the result as (Pn').

[ repo -----(r10)------(r20)------(r30)------(r40)---
[ || ||
[ || ||
[ | (P0) (P0')
[ | \ \
[ WC | \ \
[ | \ \
[ | (P1)-...-(Pn)--(Pn'=BASE)
[ \
[ \_(WORK')

The WC now contains two "base" snapshots (P0 and P0') copied from the
repository, and (n) checkpoint snapshots, and a merged checkpoint.

==========

At this point I am not saying one of these is better than another, just
trying to understand the options.

- Julian
Received on 2017-08-01 15:19:29 CEST

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: Benchmarks for PUT for various fsfs config settings."
Previous message: Markus Schaber: "RE: Benchmarks for PUT for various fsfs config settings."
Next in thread: Nathan Hartman: "Re: Checkpointing mock-up option 3 (backed by a local repo)"
Reply: Nathan Hartman: "Re: Checkpointing mock-up option 3 (backed by a local repo)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]