[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

From: Greg Stein <gstein_at_gmail.com>
Date: Fri, 12 Sep 2008 09:36:41 -0700

Hey Julian,

Thanks for all the thinking about this, but I'm just not seeing it as
being all that complicated. Near the end of this note, you point out
more or less what I'm thinking:

* one BASE tree and its API
* one API to access the WORKING/ACTUAL "tree" in a blended form

The WORKING and ACTUAL trees are conceptually different, but are
generally used together. I've detailed the potential differences in
wc-ng-design.

I also tend to disagree with the notion of trying to work with these
trees independently. All three are tied to a specific path in the
local filesystem. Given PATH, you will have an associated BASE tree, a
WORKING tree, and at PATH on the disk, the ACTUAL tree. I don't see a
need to work with them independently because that will simply never
happen (nor need to, afaik).

And note that we generally shouldn't try to construct trees (and
especially not their contents!) in memory since that is unbounded.
Yes, I know we do, but we should avoid it whenever possible.

The design is currently along the lines of (2), and the user of the
API will never have to redirect. You use one of the two tree APIs
based on what you're looking for.

Cheers,
-g

On Fri, Sep 12, 2008 at 3:38 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
> Hi Greg.
>
> Here's a bunch more theoretical waffle from me on the subject! Enjoy :-)
>
>
> STAND-ALONE TREES, OR TREES LINKED INTO A WC
>
> One big distinction I now see is this:
>
> When we define the meaning of one kind of Tree (say WORKING), I prefer
> to define it as a stand-alone entity which can answer questions about
> itself. However, I did say "WORKING gets its file content from ACTUAL",
> which is contrary to that. The alternative design is that the concept of
> "WORKING tree" has meaning only when it is embedded in a WC which links
> it to a corresponding BASE tree and a corresponding ACTUAL tree. In that
> case, it can answer questions that involve getting data from its
> corresponding BASE or ACTUAL tree.
>
> I then went on to suggest as an option that ACTUAL could present the
> properties from WORKING. But it would be a bad idea to have each tree
> depend on the other like this, because it would introduce a cyclic
> dependency between those two trees. That doesn't sound too bad when you
> first think about reading from an existing tree, but when you think
> about preparing some modifications, or especially building a new tree
> from scratch, it would get really hairy.
>
> If we define the trees as stand-alone concepts that can exist with or
> without being linked in to a WC, it becomes relatively easy to build a
> new tree in memory, such as from the "dry run" result of a merge. All
> the tree manipulation functions can be used, and we don't have to link
> this dry-run tree into the WC in order to create and examine it. This
> could remove a whole bunch of complexity that is currently in wc-1.0 to
> handle dry runs of certain client-layer operations such as "merge" which
> the WC would otherwise not need to know so much about.
>
>
> On Thu, 2008-09-11 at 15:27 -0700, Greg Stein wrote:
>> On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
>> >...
>> > Maybe you don't have the same definition of "a tree" as I do. I am
>> > assuming we mean the sort of tree that is described by a Subversion
>> > delta editor. A tree of nodes; each node is either a file or a dir; each
>> > node has properties; each dir has 0 or more child nodes; each file has
>> > content which is a blob of 0 or more bytes.
>>
>> Sure...
>>
>> > When you say, "Files/dirs present, but not in WORKING: unversioned
>> > nodes", what about them? They are part of the ACTUAL tree? Yes, I say.
>>
>> ACTUAL, yes.
>>
>> > When you say, "Files/dirs in WORKING, but not present: missing nodes",
>> > what about them? They are part of the ACTUAL tree? Yes, I say.
>>
>> No. Those nodes are *missing* from ACTUAL. They should be there since
>> WORKING says they should be. Thus, they are missing.
>
> I agree. Sorry, I made a careless copy-n-paste-o mistake here. So, we
> agree that such files/dirs are NOT part of the ACTUAL tree.
>
>> > And files/dirs that are in WORKING and present on disk as nodes of the
>> > correct type? Yes, I say. How about you?
>>
>> In both WORKING and ACTUAL, yes.
>>
>> > And files/dirs that are on disk where WORKING says there's a node of the
>> > other type? Yes, I say. How about you?
>>
>> The node is in both WORKING and ACTUAL, but there is now a problem. I
>> don't know that we have a name for this kind of change. This isn't
>> really unversioned or missing... something else.
>
> "Obstructed" is a word that we use.
>
> But, in terms of defining the meaning of the tree kind "ACTUAL", I don't
> see a problem with saying that the ACTUAL tree contains a directory at
> path "foo/bar", while the corresponding WORKING tree contains a file at
> path "foo/bar" (or a directory, or a symlink, or nothing).
>
> When we want to describe the CHANGE of node kind, that's when we're
> considering the relationship between two kinds of tree rooted at the
> same path. As far as I'm concerned, I am presently concentrating on the
> definition of one kind of tree. We'll come to expressing relationships
> between different kinds later.
>
>> Also: note that we should be talking about symlinks, too. They are
>> moving to a first-order node type in the new WC library.
>
> OK. Good.
>
>> > And the properties of each node are?
>>
>> Whatever WORKING says about the properties. ACTUAL cannot represent them.
>
> Ah... Two different levels of abstraction, perhaps.
>
> Your first answer, "Whatever WORKING says", is the answer to "from a
> high level point of view, what are the properties of the working node at
> PATH?" Indeed, from this point of view, ACTUAL does not have the
> answer[1], and so the desired answer needs to be fetched from WORKING
> instead. The question is, which layer redirects and fetches the answer
> from WORKING instead: the caller, or the ACTUAL tree?
>
> Let design 1 be: the caller redirects its question. The caller has to
> know that ACTUAL does not have the working properties. The model could
> be that the ACTUAL tree says "You ask me? I tell you there are no
> properties." The caller knows to ignore this answer and go elsewhere if
> it really wants to find the WORKING properties.[2]
>
> Let design 2 be: the ACTUAL tree redirects to the WORKING tree. The
> model of the ACTUAL tree is that it knows the properties, even though
> under the hood it has to go to the corresponding WORKING tree to find
> them. This model is very different because the trees are not
> independent. Whenever we ask a question about an ACTUAL tree there has
> to be a corresponding WORKING tree linked to it, or provided by the
> caller.
>
> Let's implement the "svn add" subcommand, in pseudo-Python, assuming
> design (1).
>
> # Add the disk node at PATH to the tree TREE, recursively.
> # Make a full in-memory representation, including file contents.
> # (That's not a good example of how to implement for real.)
> # Give every node in the tree no properties.
>
> def build_actual_tree(tree, path):
> disk_node_kind = os.get_node_kind(path)
> if disk_node_kind == file:
> tree.add_file(path = target_path,
> content = os.readfile(target_path),
> properties = {})
> elif disk_node_kind == dir:
> tree.add_dir(path = target_path,
> properties = {})
> for child_path in os.readdir(target_path):
> build_actual_tree(tree, child_path)
>
> # Take an unversioned "actual" tree NEW_ACTUAL_SUBTREE, and
> # schedule it for addition in the working copy WC.
> # Assume NEW_ACTUAL_SUBTREE has no properties, and set the
> # "working" properties to ones calculated by the auto-props
> # mechanism.
>
> def wc.add_unversioned_tree(new_actual_subtree):
> new_working_subtree = new_actual_subtree.deep_copy()
> for node in new_working_subtree:
> assert node.properties == {}
> node.properties = generate_auto_props(node)
> new_base_subtree = SvnTreeCreateEmpty()
> wc.add_subtree(new_base_subtree,
> new_working_subtree,
> new_actual_subtree)
>
> # Make the unversioned disk tree at TARGET_PATH become versioned
> # in the working copy WC which must already include TARGET_PATH's
> # parent dir as a versioned directory.
>
> def svn_client_add(wc, target_path):
> new_actual_subtree = SvnTreeCreateEmpty()
> build_actual_tree(new_actual_subtree, target_path)
> wc.add_unversioned_tree(new_actual_subtree)
>
>
> The point I hope it demonstrates is that we can construct and manipulate
> an ACTUAL tree model by itself, and only later link it to a WORKING tree
> and a BASE tree within a WC.
>
> I should repeat the experiment with design (2) and contrast them, but I
> haven't time.
>
>
>> Unversioned nodes (things in ACTUAL, but not WORKING) will (obviously)
>> have no properties.
>
>> >> >> BASE + Subversion-managed changes = WORKING.
>> >> >> WORKING + non-Subversion-managed changes = ACTUAL.
>> >>
>> >> Yup. Note that WORKING *may* include text-mod flags. If somebody does
>> >> an "svn edit", then a flag will get recorded saying "looks like this
>> >> file was modified" (or is likely to have been). But WORKING is purely
>> >> an admin thing. You have to look at ACTUAL to find *real* text mods.
>> >
>> > You're now talking about WORKING including "flags". This is not
>> > impossible: I've wondered whether these "trees" need to be augmented by
>> > bits of metadata like this. So, are you're saying that the term "WORKING
>> > tree" defines of a set of state recorded in the implementation, rather
>> > than defining an abstract tree concept?
>>
>> Not sure what you mean by "recorded in the implementation". The
>> WORKING tree has a set of flags (and other state) that records its
>> delta from BASE. Simple as that.
>
> Right, in the implementation of the WC library with its metadata store.
> But in the MODEL of the working tree, i.e. what the API user sees when
> asking questions about it, the tree consists of only files and
> directories and properties. Well, and some other metadata about it (its
> relationship to the repository etc.), but it should not expose the flags
> that record its delta from BASE. In other words, I would expect an API
> like
>
> svn_wc_get_property 'MIME type' of path 'A/foo' in WORKING tree ()
>
> to respond with
>
> "text/plain"
>
> not with
>
> "same as in BASE"
>
> Basically, I am talking purely from a black-box perspective as a user of
> the WC library, whereas I think you are talking about what goes on
> inside the WC library.
>
>
> [...]
>> > Whether we need to expose three
>> > trees, to be able to distinguish not only the pristine version but also
>> > between the working version as told to Subversion, and the nodes on disk
>> > as modified outside Subversion, I'm not 100% sure, but it seems
>> > reasonable that we do need to distinguish these.
>>
>> Yes. The BASE is very distinct, and has separate APIs to operate on
>> it. The WORKING/ACTUAL is a much more grey boundary, and the API
>> doesn't try to expose them as two entirely separate trees.
>
> I was trying to say we should expose three trees (BASE, WORKING, ACTUAL)
> separately, but you're saying we should expose two (BASE,
> WORKING/ACTUAL). You may well be right. In that case, we need an
> out-of-band (out-of-tree) mechanism for describing the differences
> between WORKING and ACTUAL.
>
>> Definitely seems that it would be a Good Thing to enumerate how these
>> two trees can differ. It is a finite list. I'll update the doc with
>> that.
>>
>> Cheers,
>> -g
>
> [1] assuming that the definitions of WORKING and ACTUAL are, as we have
> mostly been assuming, fairly closely tied to what data is stored where
> in the implementation. For example, saying that ACTUAL does not directly
> "have" the properties because they are not operating-system artifacts.
>
> [2] Or the model could be that the ACTUAL tree says "You ask me for
> properties? Don't be daft. ERROR!"
>
> - Julian
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-12 18:36:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.