unified tree structure

From: Sam TH <sam_at_uchicago.edu>
Date: 2001-04-27 00:35:57 CEST

On Thu, Apr 26, 2001 at 04:37:52PM -0500, Ben Collins-Sussman wrote:
> Sam TH <sam@uchicago.edu> writes:
>
> > Yeah, I understand what you're saying now, and it makes more sense.
> > But I still disagree. I don't think we should partition tests this
> > way. I think we should do as much as possible to make sure that every
> > representation of every change is consistent. Otherwise, the test
> > suite will miss things that it could catch, which I think would be a
> > serious problem.
>
> I just can't wrap my brain around your Grand Unified Tree Theory. Can
> you give examples of how it might work? Or problems it would solve?
>

Ok, here goes.

> > What I would evntually like to see is something like this:
> >
> > return compare_lots_of_trees(exp_tree, wc_tree, output_tree . . .)
> >
> > So that you just verify the consistency of everything. Obviously, for
> > updates, for example, you can't use the sam wc_tree as for checkouts,
> > since the update doesn't touch the whole tree. But I think we should
> > try for as much consistency as possible.
>
> That's exactly why I'm confused. What does a "universal tree" mean in
> your world?
>
> Since examining disk contents is highest priority, let's assume that
> any given "tree" object represents all information about a working
> copy: it's structure, file contents, and properties.
>
> Now, when I get subcommand output that looks like
>
> D /foo/bar
> A /foo/baz/bop
> UU /gleeb/blort
>
> You want to write python code that actually *understands* the status
> codes and modifies a pristine tree object? And then compare this tree
> to actual working copy on disk?

What I suggested in a previous email was something like this:

We take a regular working copy. We modify the repository it's from
via some other working copy. Then we update the working copy. We now
have a number of representations of what happend.

1. The output of 'svn up'.
2. The expected changes.
3. The changes to the entries files.
4. The changes to the actual working copy.

I really think that it's important that we verify that all of these
are consistent. The way you're talking about testing this update is
to do basically two tests:

1. The output against the expected output.
2. The new working copy aginst the expected new working copy.

There are two major problems with this solution, as I see it.

A. We have to maintain two different expected output sets, which can
certainly skew if we aren't careful. In fact, I think that it would
be safe to assume that at some point, they will skew.

  B. We are going to want to do tests where this model is simply
impractical. There are a number of possible cases:
     i. really huge data sets
     ii. lots and lots (like hundreds) of changes. This would
require hundreds of expected copies.

My suggestion would be as follows:

We build a tree from the working copy before. We run the update. We
build another wc tree. We look at the trees. We create a tree based
on the differences. We can do the same (at no extra cost) for the
entries files. Then we compare these trees to the tree from the
output.

> This seems enormously complex to me. It sounds like you're basically
> talking about re-implementing libsvn_wc in python. Besides... how
> could you possibly modify the tree object to match what's on disk?
> No subcommand output actually describes the textual or property
> patches coming in.
>

Yes, this involves potentially replicating some subversion
functionality. But I think that's inevitable. In order to test that
we get the right output, we have two choices:

People can duplicate the functionality of subversion, and create lots
of expected copies.

We can implement the functionality in python.

I think the latter will be less work, and more flexible. And it's a
job for a computer anyway.

> That's why I think the *best* we can do is match regular expressions
> against subcommand output. That's what I was doing before you came
> along. Your trees are great -- I can use them as (essentially fancy)
> regexp matchers on subcommand output, and can use slightly different
> trees to examine disk contents. There's already a whole lot of
> elegance and overlap going on. But I just can't see the our use-cases
> of trees ever coming together. It doesn't make sense to me.

Ok, here's the major difference. You're looking at the output, and
seeing a bunch of strings, which can be dealt with handily via trees.
But when I look at the output, I see a tree, represented as a set of
strings. Subversion is all about operations on trees, and the output
is just a representation of that.

Fundamentally, the output is just one more representation of what's
going on in the tree. I think that we need to verify that
representation against all the others.

Does that make more sense?

sam th --- sam_at_uchicago.edu --- http://www.abisource.com/~sam/
OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key
DeCSS: http://samth.dynds.org/decss

application/pgp-signature attachment: stored

Received on Sat Oct 21 14:36:29 2006

This message: [ Message body ]
Next message: Sam TH: "Re: build_tree_from_entries"
Previous message: Ben Collins-Sussman: "Re: CVS update: subversion/subversion/tests/clients/cmdline svn_tree.py svn_wc.py README example_tests1.py svn_output.py svn_test_main.py xml_tests.py"
In reply to: Ben Collins-Sussman: "Re: CVS update: subversion/subversion/tests/clients/cmdline svn_tree.py svn_wc.py README example_tests1.py svn_output.py svn_test_main.py xml_tests.py"
Next in thread: Ben Collins-Sussman: "Re: unified tree structure"
Maybe reply: Ben Collins-Sussman: "Re: unified tree structure"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]