Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 12 Sep 2008 11:38:20 +0100

Hi Greg.

Here's a bunch more theoretical waffle from me on the subject! Enjoy :-)

STAND-ALONE TREES, OR TREES LINKED INTO A WC

One big distinction I now see is this:

When we define the meaning of one kind of Tree (say WORKING), I prefer
to define it as a stand-alone entity which can answer questions about
itself. However, I did say "WORKING gets its file content from ACTUAL",
which is contrary to that. The alternative design is that the concept of
"WORKING tree" has meaning only when it is embedded in a WC which links
it to a corresponding BASE tree and a corresponding ACTUAL tree. In that
case, it can answer questions that involve getting data from its
corresponding BASE or ACTUAL tree.

I then went on to suggest as an option that ACTUAL could present the
properties from WORKING. But it would be a bad idea to have each tree
depend on the other like this, because it would introduce a cyclic
dependency between those two trees. That doesn't sound too bad when you
first think about reading from an existing tree, but when you think
about preparing some modifications, or especially building a new tree
from scratch, it would get really hairy.

If we define the trees as stand-alone concepts that can exist with or
without being linked in to a WC, it becomes relatively easy to build a
new tree in memory, such as from the "dry run" result of a merge. All
the tree manipulation functions can be used, and we don't have to link
this dry-run tree into the WC in order to create and examine it. This
could remove a whole bunch of complexity that is currently in wc-1.0 to
handle dry runs of certain client-layer operations such as "merge" which
the WC would otherwise not need to know so much about.

On Thu, 2008-09-11 at 15:27 -0700, Greg Stein wrote:
> On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
> >...
> > Maybe you don't have the same definition of "a tree" as I do. I am
> > assuming we mean the sort of tree that is described by a Subversion
> > delta editor. A tree of nodes; each node is either a file or a dir; each
> > node has properties; each dir has 0 or more child nodes; each file has
> > content which is a blob of 0 or more bytes.
>
> Sure...
>
> > When you say, "Files/dirs present, but not in WORKING: unversioned
> > nodes", what about them? They are part of the ACTUAL tree? Yes, I say.
>
> ACTUAL, yes.
>
> > When you say, "Files/dirs in WORKING, but not present: missing nodes",
> > what about them? They are part of the ACTUAL tree? Yes, I say.
>
> No. Those nodes are *missing* from ACTUAL. They should be there since
> WORKING says they should be. Thus, they are missing.

I agree. Sorry, I made a careless copy-n-paste-o mistake here. So, we
agree that such files/dirs are NOT part of the ACTUAL tree.

> > And files/dirs that are in WORKING and present on disk as nodes of the
> > correct type? Yes, I say. How about you?
>
> In both WORKING and ACTUAL, yes.
>
> > And files/dirs that are on disk where WORKING says there's a node of the
> > other type? Yes, I say. How about you?
>
> The node is in both WORKING and ACTUAL, but there is now a problem. I
> don't know that we have a name for this kind of change. This isn't
> really unversioned or missing... something else.

"Obstructed" is a word that we use.

But, in terms of defining the meaning of the tree kind "ACTUAL", I don't
see a problem with saying that the ACTUAL tree contains a directory at
path "foo/bar", while the corresponding WORKING tree contains a file at
path "foo/bar" (or a directory, or a symlink, or nothing).

When we want to describe the CHANGE of node kind, that's when we're
considering the relationship between two kinds of tree rooted at the
same path. As far as I'm concerned, I am presently concentrating on the
definition of one kind of tree. We'll come to expressing relationships
between different kinds later.

> Also: note that we should be talking about symlinks, too. They are
> moving to a first-order node type in the new WC library.

OK. Good.

> > And the properties of each node are?
>
> Whatever WORKING says about the properties. ACTUAL cannot represent them.

Ah... Two different levels of abstraction, perhaps.

Your first answer, "Whatever WORKING says", is the answer to "from a
high level point of view, what are the properties of the working node at
PATH?" Indeed, from this point of view, ACTUAL does not have the
answer[1], and so the desired answer needs to be fetched from WORKING
instead. The question is, which layer redirects and fetches the answer
from WORKING instead: the caller, or the ACTUAL tree?

Let design 1 be: the caller redirects its question. The caller has to
know that ACTUAL does not have the working properties. The model could
be that the ACTUAL tree says "You ask me? I tell you there are no
properties." The caller knows to ignore this answer and go elsewhere if
it really wants to find the WORKING properties.[2]

Let design 2 be: the ACTUAL tree redirects to the WORKING tree. The
model of the ACTUAL tree is that it knows the properties, even though
under the hood it has to go to the corresponding WORKING tree to find
them. This model is very different because the trees are not
independent. Whenever we ask a question about an ACTUAL tree there has
to be a corresponding WORKING tree linked to it, or provided by the
caller.

Let's implement the "svn add" subcommand, in pseudo-Python, assuming
design (1).

  # Add the disk node at PATH to the tree TREE, recursively.
  # Make a full in-memory representation, including file contents.
  # (That's not a good example of how to implement for real.)
  # Give every node in the tree no properties.

  def build_actual_tree(tree, path):
    disk_node_kind = os.get_node_kind(path)
    if disk_node_kind == file:
      tree.add_file(path = target_path,
                    content = os.readfile(target_path),
                    properties = {})
    elif disk_node_kind == dir:
      tree.add_dir(path = target_path,
                   properties = {})
      for child_path in os.readdir(target_path):
        build_actual_tree(tree, child_path)

  # Take an unversioned "actual" tree NEW_ACTUAL_SUBTREE, and
  # schedule it for addition in the working copy WC.
  # Assume NEW_ACTUAL_SUBTREE has no properties, and set the
  # "working" properties to ones calculated by the auto-props
  # mechanism.

  def wc.add_unversioned_tree(new_actual_subtree):
    new_working_subtree = new_actual_subtree.deep_copy()
    for node in new_working_subtree:
      assert node.properties == {}
      node.properties = generate_auto_props(node)
    new_base_subtree = SvnTreeCreateEmpty()
    wc.add_subtree(new_base_subtree,
                   new_working_subtree,
                   new_actual_subtree)

  # Make the unversioned disk tree at TARGET_PATH become versioned
  # in the working copy WC which must already include TARGET_PATH's
  # parent dir as a versioned directory.

  def svn_client_add(wc, target_path):
    new_actual_subtree = SvnTreeCreateEmpty()
    build_actual_tree(new_actual_subtree, target_path)
    wc.add_unversioned_tree(new_actual_subtree)

The point I hope it demonstrates is that we can construct and manipulate
an ACTUAL tree model by itself, and only later link it to a WORKING tree
and a BASE tree within a WC.

I should repeat the experiment with design (2) and contrast them, but I
haven't time.

> Unversioned nodes (things in ACTUAL, but not WORKING) will (obviously)
> have no properties.

> >> >> BASE + Subversion-managed changes = WORKING.
> >> >> WORKING + non-Subversion-managed changes = ACTUAL.
> >>
> >> Yup. Note that WORKING *may* include text-mod flags. If somebody does
> >> an "svn edit", then a flag will get recorded saying "looks like this
> >> file was modified" (or is likely to have been). But WORKING is purely
> >> an admin thing. You have to look at ACTUAL to find *real* text mods.
> >
> > You're now talking about WORKING including "flags". This is not
> > impossible: I've wondered whether these "trees" need to be augmented by
> > bits of metadata like this. So, are you're saying that the term "WORKING
> > tree" defines of a set of state recorded in the implementation, rather
> > than defining an abstract tree concept?
>
> Not sure what you mean by "recorded in the implementation". The
> WORKING tree has a set of flags (and other state) that records its
> delta from BASE. Simple as that.

Right, in the implementation of the WC library with its metadata store.
But in the MODEL of the working tree, i.e. what the API user sees when
asking questions about it, the tree consists of only files and
directories and properties. Well, and some other metadata about it (its
relationship to the repository etc.), but it should not expose the flags
that record its delta from BASE. In other words, I would expect an API
like

svn_wc_get_property 'MIME type' of path 'A/foo' in WORKING tree ()

to respond with

"text/plain"

not with

"same as in BASE"

Basically, I am talking purely from a black-box perspective as a user of
the WC library, whereas I think you are talking about what goes on
inside the WC library.

[...]
> > Whether we need to expose three
> > trees, to be able to distinguish not only the pristine version but also
> > between the working version as told to Subversion, and the nodes on disk
> > as modified outside Subversion, I'm not 100% sure, but it seems
> > reasonable that we do need to distinguish these.
>
> Yes. The BASE is very distinct, and has separate APIs to operate on
> it. The WORKING/ACTUAL is a much more grey boundary, and the API
> doesn't try to expose them as two entirely separate trees.

I was trying to say we should expose three trees (BASE, WORKING, ACTUAL)
separately, but you're saying we should expose two (BASE,
WORKING/ACTUAL). You may well be right. In that case, we need an
out-of-band (out-of-tree) mechanism for describing the differences
between WORKING and ACTUAL.

> Definitely seems that it would be a Good Thing to enumerate how these
> two trees can differ. It is a finite list. I'll update the doc with
> that.
>
> Cheers,
> -g

[1] assuming that the definitions of WORKING and ACTUAL are, as we have
mostly been assuming, fairly closely tied to what data is stored where
in the implementation. For example, saying that ACTUAL does not directly
"have" the properties because they are not operating-system artifacts.

[2] Or the model could be that the ACTUAL tree says "You ask me for
properties? Don't be daft. ERROR!"

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-12 12:38:39 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]