Re: [WIP] 'svn patch' should remove empty dirs

From: Daniel Näslund <daniel_at_longitudo.com>
Date: Mon, 22 Feb 2010 19:38:13 +0100

On Mon, Feb 22, 2010 at 08:44:47AM +0100, Stefan Sperling wrote:
> On Sun, Feb 21, 2010 at 11:02:27PM +0100, Daniel Näslund wrote:

[..]

> > + unidiff_patch = [
> > + "Index: psi\n",
> > + "===================================================================\n",
> > + "--- A/D/H/psi\t(revision 0)\n",
> > + "+++ A/D/H/psi\t(revision 0)\n",
> > + "@@ -1 +0,0 @@\n",
> > + "-This is the file 'psi'.\n",
> > + "Index: omega\n",
> > + "===================================================================\n",
> > + "--- A/D/H/omega\t(revision 0)\n",
> > + "+++ A/D/H/omega\t(revision 0)\n",
> > + "@@ -1 +0,0 @@\n",
> > + "-This is the file 'omega'.\n",
> > + ]
> > +
> > + svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
> > +
> > + # We should be able to handle one path beeing missing.
> > + os.remove(os.path.join(wc_dir, 'A', 'D', 'H', 'chi'))
> > +
> > +
> > + expected_output = [
> > + 'D %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'chi'),
> > + 'D %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'psi'),
> > + 'D %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'omega'),
> > + ]
>
> I think we want a single 'D' notification for the entire directory.
> Notification for every single child is redundant.

Yeah, sorry. The test part is _very_ much a WIP. I just used it to test
for finding missing targets. It will of course be changed to check for
parent dirs to be removed.

[...]

> > +/* Helper for collect_deleted_targets(). Checks if PARENT_DIR_ABSPATH is
> > + * EMPTY when treating the targets in TARGETS_TO_BE_DELETED as already
> > + * deleted. */
> > +static svn_error_t *
> > +is_dir_empty(svn_boolean_t *empty, const char *parent_dir_abspath,
> > + svn_wc_context_t *wc_ctx,
> > + apr_array_header_t *targets_to_be_deleted,
> > + apr_pool_t *result_pool, apr_pool_t *scratch_pool)
> > +{
> > + apr_dir_t *dir;
> > + apr_finfo_t this_entry;
> > + apr_int32_t flags = APR_FINFO_NAME;
> > + apr_pool_t *iterpool;
> > + svn_error_t *err;
> > + apr_array_header_t *targets_found;
> > + const apr_array_header_t *missing_targets;
> > + const char *abspath;
> > + int i;
> > +
> > + SVN_DBG(("In is_dir_empty()\n"));
> > + targets_found = apr_array_make(scratch_pool, 0, sizeof(const char *));
> > +
> > + SVN_ERR(svn_io_dir_open(&dir, parent_dir_abspath, scratch_pool));
> > +
> > + iterpool = svn_pool_create(scratch_pool);
> > +
> > + /* First we collect the entries on-disk...*/
> > + while (1)
> > + {
> > + svn_pool_clear(iterpool);
> > + err = svn_io_dir_read(&this_entry, flags, dir, iterpool);
>
> Hmmm... have you seen svn_io_get_dir_filenames()?

Nope. This part was one of the things I refered to as boilerplate.
Thanks!

[...]

> > +
> > + /* .. Then we collect the nodes that the db knows of. We do that since a
> > + * missing node will not be found by simply checking the on-disk. We need
> > + * to compare what the db knows with what's actually there. If a node is
> > + * missing we add it to our targets_found. */
>
> We might want to check the DB first. The DB handle is already open,
> so may be cheaper to run the DB queries some of which can be answered
> without hitting the disk.
> And if the DB prevents us from deleting the directory, we don't bother
> checking the disk ourselves. Note that some DB query functions might
> do disk i/o, I'm not sure which and under what circumstances
> though.

A perfectly valid optimization. Will do that.

> > + SVN_ERR(svn_wc__node_get_children(&missing_targets, wc_ctx,
> > + parent_dir_abspath, FALSE, result_pool,
> > + scratch_pool));
> > +
> > + /* Check for missing targets. */
> > + for (i = 0; i < missing_targets->nelts; i++)
> > + {
> > + const char *target;
> > + svn_node_kind_t kind;
> > +
> > + target = APR_ARRAY_IDX(missing_targets, i, const char *);
> > +
> > + SVN_ERR(svn_io_check_path(target, &kind, scratch_pool));
>
> Using svn_io_* is not using the DB, really.
> What we want here is something like svn_wc__node_is_status_present() or
> maybe svn_wc__node_is_status_absent(). Not sure. The point is getting
> info from the DB, so we'll eventually be calling svn_wc__db_read_info().
> The DB can tell us if a node is known as missing, obstructed, or
> locally deleted. Maybe using just some svn_wc__node_* functions tells
> us all we need to know, even?

absent, excluded and not_present are the three states making up the
hidden concept of the entries code. They refer to conditions when the wc
has not gained access no a node for some reason. The entries code
suggests authz restrictions, setting svn_depth_excluded (1.5 clients
doesn't handle that) and deleting and commiting a node without updating
its parent. I've set the show_hidden parameter of
svn_wc__node_get_children() to FALSE and not included those. When
thinking about it, I should have included those. Even if those nodes
won't appear in our wc they may in other peoples!

We want the missing nodes, those that the db thinks is there but are ...
missing. The db doesn't know who they are. To find those we need to
check what's on-disk.

Greg talked me through the show_hiddden and missing concept. Hopefully I
got it right.

In the def of svn_wc__db_status_t:
[[[
/* This node was named by the server, but no information was provided. */
svn_wc__db_status_absent,

/* This node has been administratively excluded. */
svn_wc__db_status_excluded,

/* This node is not present in this revision. This typically happens
   when a node is deleted and committed without updating its parent.
   The parent revision indicates it should be present, but this node's
   revision states otherwise. */
svn_wc__db_status_not_present,
]]]

not_present, absent and (not mentioned above) excluded are terms
refering to
>
> > + SVN_DBG(("Testing if '%s' is missing\n", target));
> > +
> > + if (kind == svn_node_none)
> > + {
> > + SVN_DBG(("Found missing node '%s'\n", target));
> > + APR_ARRAY_PUSH(targets_found, const char *) = target;
> > + }
> > + }
> > +
> > + *empty = TRUE;
> > +
> > + /* Finally, we check to see if we're going to delete all entries in the
> > + * dir. */
> > + for (i = 0; i < targets_found->nelts; i++)
> > + {
>
> Can we integrate the strcmp() loop into the loop that iterates over the dirents,
> skipping dirents which don't match a deletion target?
> Then we could simply compare the number of elements of the targets found
> and targets deleted arrays here, instead of looping again.

Probably. In any case, there _is_ a way to write it better than what
I've done now.

[...]

> > +/* Helper for collect_deleted_targets(). Compare A and B and return an
> > + * integer greater than, equal to, or less than 0, according to whether A
> > + * has more subdirs, just as many or less subdir than B. */
> > +static int
> > +sort_compare_paths_by_depth(const svn_sort__item_t *a,
> > + const svn_sort__item_t *b)
> > +{
> > + const char *astr, *bstr;
> > + int n_a, n_b, i;
> > +
> > + astr = a->key;
> > + bstr = b->key;
> > +
> > + for (i = 0; i < a->klen; i++)
> > + if (astr[i] == '/')
> > + n_a++;
> > +
> > + for (i = 0; i < b->klen; i++)
> > + if (bstr[i] == '/')
> > + n_b++;
> > +
> > + if (n_a > n_b)
> > + return 1;
> > + else if (n_a == n_b)
> > + return 0;
> > + else
> > + return -1;
> > +}
> > +
> > +/* Determine which TARGETS are to be deleted. Return those targets in
> > + * DELETE_ALL_OF_THIS, allocated in RESULT_POOL. If all targets in a
> > + * directory is to be deleted, the dir itself is set to be deleted. The
> > + * remaining targets will be returned in TARGETS, allocated in RESULT_POOL.
> > + * If no target will be deleted, set DELETE_ALL_OF_THIS to NULL and return
> > + * the original TARGETS. Do temporary allocations in SCRATCH_POOL. */
> > +static svn_error_t *
> > +collect_deleted_targets(apr_array_header_t **delete_all_of_this,
>
> :)
> Maybe s/delete_all_of_this/condensed_deletion_target_list/ or something?

Maybe. I was thinking of concanating the two arrays and feed it to
install_patched_targets(). The only reason for using two arrays would be
if the deleted targets should be run before the other targets but I
don't see why we would want that.

[...]

> > +
> > + /* We sort the keys here for allowing the paths with the greatest depth to
> > + * be processed first. */
> > + sorted_keys = svn_sort__hash(targets_to_be_deleted,
> > + sort_compare_paths_by_depth,
> > + scratch_pool);
> > +
> > + /* The keys are sorted in ascending order, but we want them in descending
> > + * order. */
>
> You could also flip the logic of your compare function to sort entries
> in descending order.

Probably a good idea. It's better to use the for(i=0,...) idiom when
possible.

[...]

> > /* Install a patched TARGET into the working copy at ABS_WC_PATH.
> > * Use client context CTX to retrieve WC_CTX, and possibly doing
> > * notifications. If DRY_RUN is TRUE, don't modify the working copy.
> > @@ -1371,6 +1671,7 @@
> > const char *patch_eol_str;
> > apr_file_t *patch_file;
> > apr_array_header_t *targets;
> > + apr_array_header_t *delete_all_of_this;
> > int i;
> > apply_patches_baton_t *btn;
> >
> > @@ -1413,6 +1714,10 @@
> > }
> > while (patch);
> >
> > + SVN_ERR(collect_deleted_targets(&delete_all_of_this, &targets,
> > + btn->ctx->wc_ctx, scratch_pool,
> > + result_pool));
> > +
> > /* Install patched targets into the working copy. */
> > for (i = 0; i < targets->nelts; i++)
> > {
>
> Where is delete_all_of_this used?

Not there yet, it's the next step! :)

Thanks for the review. Appreciate it!
Daniel
Received on 2010-02-22 19:38:55 CET

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: Will upgrade from 1.6 to 1.7 benifit from dump/reload?"
Previous message: Julian Foad: "WC-NG presence 'incomplete'"
In reply to: Stefan Sperling: "Re: [WIP] 'svn patch' should remove empty dirs"
Next in thread: Neels J Hofmeyr: "Re: [WIP] 'svn patch' should remove empty dirs"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]