Re: difference between working tree and actual tree

From: Stefan Sperling <stsp_at_elego.de>
Date: Wed, 22 Jul 2009 10:56:06 +0100

On Wed, Jul 22, 2009 at 04:34:18PM +0800, HuiHuang wrote:
> Hey,
>
> In notes/wc-ng-design the description of working tree and actual tree is as
> followfing:
>
> * WORKING: The tree that represent's the user's view of the WC with their
> local modifications (assuming the user told Subversion about these
> modifications with "svn add" etc. as required). In implementation, the
> WORKING tree has the structure and properties recorded in the WC, and
> the file content present on the local disk. (If a file cannot be
> accessed because the tree structure on the local disk is incompatible,
> this is an error, known as an "obstruction".)
>
> * ACTUAL: The tree on the local disk, ignoring Subversion
> administrative directories and other nodes that Subversion has
> knowingly put there such as conflict reject files, and regarding
> every node as having no Subversion properties.
>
> My understanding is that actual tree is the same as working tree except that
> the former does not include subversion-about data(such as subversion adm data,
> conflict data, subversion properties and so on), is it right?

Yes.

The wc-1 working copy code has no clear distinction between "Subversion
thinks a file should be on disk" and "a file is on disk".
You have to actually look at the disk every time you want to know
about items on disk.

If you look at the code, you'll see a mixture of adm_access related
calls (for locking a directory), and functions reading from entries
(for getting meta data about items), and also svn_io_check_path()
(for checking items on disk). And all these functions occur together
in a single code path.

Take this long example from libsvn_wc/adm_ops.c, function erase_from_wc():

      /* First handle the versioned items, this is better (probably) than
         simply using svn_io_get_dirents2 for everything as it avoids the
         need to do svn_io_check_path on each versioned item */
      err = svn_wc_adm_retrieve(&dir_access, adm_access, path, pool);

      /* If there's no on-disk item, be sure to exit early and
         not to return an error */
      if (err)
        {
          svn_node_kind_t wc_kind;
          svn_error_t *err2 = svn_io_check_path(path, &wc_kind, pool);

          if (err2)
            {
              svn_error_clear(err);
              return err2;
            }

if (wc_kind != svn_node_none)
return err;

          svn_error_clear(err);
          return SVN_NO_ERROR;
        }

      SVN_ERR(svn_wc_entries_read(&ver, dir_access, FALSE, pool));
      iterpool = svn_pool_create(pool);
      for (hi = apr_hash_first(pool, ver); hi; hi = apr_hash_next(hi))
        {
          const void *key;
          void *val;
          const char *name;
          const char *down_path;

          apr_hash_this(hi, &key, NULL, &val);
          name = key;
          entry = val;

if (!strcmp(name, SVN_WC_ENTRY_THIS_DIR))
continue;

          svn_pool_clear(iterpool);
          down_path = svn_dirent_join(path, name, iterpool);
          SVN_ERR(erase_from_wc(down_path, adm_access, entry->kind,
                                cancel_func, cancel_baton, iterpool));
        }

This code tries to handle all the possible conditions in one go.
Is it versioned? Is it on disk? Is it missing in meta data even
though it should be there? (The last question is hidden in the
for loop -- if no entries are read, we'll do nothing!)

And the code is also attempting to answer the question
"What should we do if any of these conditions is true or false?"
Ultimately it's asking: Is it safe to remove this item from disk?
And it's easy to miss some condition when doing all these checks
in a single code path.

Having different kinds of trees which represent certain aspects
of the actual working copy tree is an attempt to make this more
straightforward. Essentially, what we do right now is that we use
a complicated algorithm. But we could use a more complex data
structure that provides a layer of abstraction, and then we can
have a much simpler algorithm.

You want to know if an item exists on disk? Ask the ACTUAL tree.
You want to know if an item is versioned (known to Subversion)?
Ask the WORKING tree. You want to know what an item looked like
before it was locally modified? Ask the BASE tree.

You want to know if it is safe to remove an item? Check WORKING
(does Subversion think the item should be there?) and if the item
is in WORKING, check ACTUAL (is it really on disk?).
(And the adm access locking stuff can go away altogether because we're
now using an embedded database instead of plain files for meta data.)

The reader of the code can see which tree is being checked,
and will instantly understand more about what the code is doing.
And it will be easier to see errors, for example places where
we're not checking the correct kind of tree.

Stefan
Received on 2009-07-22 11:56:32 CEST

This message: [ Message body ]
Next message: HuiHuang: "Re: difference between working tree and actual tree"
Previous message: HuiHuang: "difference between working tree and actual tree"
In reply to: HuiHuang: "difference between working tree and actual tree"
Next in thread: HuiHuang: "Re: difference between working tree and actual tree"
Reply: HuiHuang: "Re: difference between working tree and actual tree"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]