[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Teach ra_dav editor parser to filter unwanted data (for sparse directories)

From: Karl Fogel <kfogel_at_red-bean.com>
Date: 2007-05-06 04:43:15 CEST

"C. Michael Pilato" <cmpilato@collab.net> writes:
> Teach the ra_DAV editor parser to filter out unwanted stuff based on
> the requested depth of the operation. This is useful for
> compatibility with older depth-ignorant servers.
>
> * subversion/libsvn_ra_dav/fetch.c
> (report_baton_t): Add 'is_file' and 'depth' members.
> (okay_to_edit): New helper function.
> (start_element, end_element): Use okay_to_edit() to avoid making
> editor and RA callback calls for items outside the scope of the
> requested edit operation. Use the report baton's 'is_file' member
> to determine if we're handling a file instead of the 'file_baton'
> member, which now may be NULL simply because we're "in too deep"
> in the edit.

That's a great explanation... which I think might be better in the
code than in the log message :-).

> (add_node_props): Rework this function to reduce redundancy, and use
> the report baton's 'is_file' member to make the file/dir
> determination.
> (make_reporter): Populate report baton's 'depth' member.
>
>
> Index: subversion/libsvn_ra_dav/fetch.c
> ===================================================================
> --- subversion/libsvn_ra_dav/fetch.c (revision 24685)
> +++ subversion/libsvn_ra_dav/fetch.c (working copy)
> @@ -149,7 +149,12 @@
> #define DIR_DEPTH(rb) ((rb)->dirs->nelts)
> #define PUSH_BATON(rb,b) (APR_ARRAY_PUSH((rb)->dirs, void *) = (b))
>
> - /* These items are only valid inside add- and open-file tags! */
> + /* Are we currently handling a file? */
> + svn_boolean_t is_file;

Specifically, that explanation (from the log message) could go here,
in the comment above the new member.

> + /* These items are only valid inside add- and open-file tags, and
> + even then, only when handling files that aren't out-of-scope by
> + way of the requested 'depth' for the edit drive. */
> void *file_baton;
> apr_pool_t *file_pool;
> const char *result_checksum; /* hex md5 digest of result; may be null */
> @@ -201,6 +206,9 @@
> otherwise, it stays false (i.e., it's not a modern server). */
> svn_boolean_t receiving_all;
>
> + /* The requested depth of the operation. */
> + svn_depth_t depth;
> +
> } report_baton_t;
>
> static const svn_ra_dav__xml_elm_t report_elements[] =
> @@ -1991,6 +1999,39 @@
> di->pool = pool;
> }
>
> +/* Return TRUE iff we are allowed to open/add/modify a node of kind
> + KIND based on our current position in an editor drive as determined
> + by examining the report baton RB. */
> +static svn_boolean_t
> +okay_to_edit(report_baton_t *rb,
> + svn_node_kind_t kind)

Might want to say here that the usual motivation for this check has to
do with the requested depth for the edit.

> +{
> + /* The effective depth is based not just on the number of dir batons
> + on the stack, but also whether there was a target for the
> + operation. If there was a target, then the root of the edit has
> + a depth of -1 and the target has a depth of 0. If there was no
> + target, or then the root of the edit has a depth of 0. */

This uses the word "depth" in a way that's different from the
svn_depth_t sense. Might want to make that clear, or use
"levels_deep" instead, or something.

> + int effective_depth = rb->dirs->nelts - (*(rb->target) ? 1 : 0);
> + switch (rb->depth)
> + {
> + case svn_depth_empty:
> + return (kind == svn_node_dir && effective_depth <= 0);
> + case svn_depth_files:
> + return ((effective_depth <= 0)
> + || (kind == svn_node_file && effective_depth == 1));
> + case svn_depth_immediates:
> + return (effective_depth <= 1);
> + case svn_depth_infinity:
> + return TRUE;
> + default:
> + abort();
> + }
> + /* we should never get here */
> + return FALSE;
> +}

*nod* The conditional logic looks good to me.

> /* This implements the `ne_xml_startelm_cb' prototype. */
> static svn_error_t *
> start_element(int *elem, void *userdata, int parent, const char *nspace,
> @@ -2004,7 +2045,7 @@
> svn_stringbuf_t *cpath = NULL;
> svn_revnum_t crev = SVN_INVALID_REVNUM;
> dir_item_t *parent_dir;
> - void *new_dir_baton;
> + void *new_dir_baton = NULL;
> svn_stringbuf_t *pathbuf;
> apr_pool_t *subpool;
> const char *base_checksum = NULL;

This new initialization was mysterious to me when I first saw it, but
when I got to the changes later on, it made sense.

> @@ -2038,6 +2079,10 @@
> name = svn_xml_get_attr_value("name", atts);
> /* ### verify we got it. punt on error. */
>
> + /* Ignore editor drive portions outside our requested depth. */
> + if (! okay_to_edit(rb, svn_node_dir))
> + break;
> +
> parent_dir = &TOP_DIR(rb);
> pathbuf = svn_stringbuf_dup(parent_dir->pathbuf, parent_dir->pool);
> svn_path_add_component(pathbuf, name);
> @@ -2051,6 +2096,10 @@
> name = svn_xml_get_attr_value("name", atts);
> /* ### verify we got it. punt on error. */
>
> + /* Ignore editor drive portions outside our requested depth. */
> + if (okay_to_edit(rb, svn_node_file))
> + break;
> +
> parent_dir = &TOP_DIR(rb);
> pathbuf = svn_stringbuf_dup(parent_dir->pathbuf, parent_dir->pool);
> svn_path_add_component(pathbuf, name);
> @@ -2063,6 +2112,7 @@
> case ELEM_resource:
> att = svn_xml_get_attr_value("path", atts);
> /* ### verify we got it. punt on error. */
> +
> svn_stringbuf_set(rb->current_wcprop_path, att);
> rb->in_resource = TRUE;
> break;

Yeah, the existing formatting is inconsistent in this function.

> @@ -2090,9 +2140,6 @@
> subpool = svn_pool_create(rb->pool);
> SVN_ERR((*rb->editor->open_root)(rb->edit_baton, base,
> subpool, &new_dir_baton));
> -
> - /* push the new baton onto the directory baton stack */
> - push_dir(rb, new_dir_baton, pathbuf, subpool);
> }
> else
> {
> @@ -2104,17 +2151,23 @@
> subpool = svn_pool_create(parent_dir->pool);
>
> pathbuf = svn_stringbuf_dup(parent_dir->pathbuf, subpool);
> - svn_path_add_component(pathbuf, rb->namestr->data);
> + svn_path_add_component(pathbuf, name);
>
> - SVN_ERR((*rb->editor->open_directory)(pathbuf->data,
> - parent_dir->baton, base,
> - subpool,
> - &new_dir_baton));
> + /* Only really call the editor if we're within our requested
> + depth. */
> + if (okay_to_edit(rb, svn_node_dir))
> + SVN_ERR((*rb->editor->open_directory)(pathbuf->data,
> + parent_dir->baton, base,
> + subpool,
> + &new_dir_baton));
>
> - /* push the new baton onto the directory baton stack */
> - push_dir(rb, new_dir_baton, pathbuf, subpool);
> }
>
> + /* Push the new baton onto the directory baton stack. If this
> + directory was outside our requested depth, NEW_DIR_BATON will
> + be NULL. */
> + push_dir(rb, new_dir_baton, pathbuf, subpool);

Aah, now I get what was going on with that initialization. And that
code duplication has been there forever, nice to see it cleaned up.

> /* Property fetching is NOT implied in replacement. */
> TOP_DIR(rb).fetch_props = FALSE;
> break;
> @@ -2140,13 +2193,17 @@
>
> pathbuf = svn_stringbuf_dup(parent_dir->pathbuf, subpool);
> svn_path_add_component(pathbuf, rb->namestr->data);
> +
> + /* Only really call the editor if we're within our requested depth. */
> + if (okay_to_edit(rb, svn_node_dir))
> + SVN_ERR((*rb->editor->add_directory)(pathbuf->data, parent_dir->baton,
> + cpath ? cpath->data : NULL,
> + crev, subpool,
> + &new_dir_baton));
>
> - SVN_ERR((*rb->editor->add_directory)(pathbuf->data, parent_dir->baton,
> - cpath ? cpath->data : NULL,
> - crev, subpool,
> - &new_dir_baton));
> -
> - /* push the new baton onto the directory baton stack */
> + /* Push the new baton onto the directory baton stack. If this
> + directory was outside our requested depth, NEW_DIR_BATON will
> + be NULL. */
> push_dir(rb, new_dir_baton, pathbuf, subpool);
>
> /* Property fetching is implied in addition. This flag is only

*nod*, all makes sense.

> @@ -2154,15 +2211,16 @@
> a modern server. */
> TOP_DIR(rb).fetch_props = TRUE;
>
> - bc_url = svn_xml_get_attr_value("bc-url", atts);
> -
> /* In non-modern report responses, we're just told to fetch the
> props later. In that case, we can at least do a pre-emptive
> depth-1 propfind on the directory right now; this prevents
> individual propfinds on added-files later on, thus reducing
> the number of network turnarounds (though not by as much as
> - simply getting a modern report response!). */
> - if ((! rb->receiving_all) && bc_url)
> + simply getting a modern report response!).
> +
> + Only bother to do this if we really added the directory. */
> + bc_url = svn_xml_get_attr_value("bc-url", atts);
> + if (TOP_DIR(rb).baton && ((! rb->receiving_all) && bc_url))
> {
> apr_hash_t *bc_children;
> SVN_ERR(svn_ra_dav__get_props(&bc_children,
> @@ -2204,67 +2262,64 @@
> break;
>
> case ELEM_open_file:
> - att = svn_xml_get_attr_value("rev", atts);
> - /* ### verify we got it. punt on error. */
> - base = SVN_STR_TO_REV(att);
> -
> + case ELEM_add_file:
> name = svn_xml_get_attr_value("name", atts);
> /* ### verify we got it. punt on error. */
> svn_stringbuf_set(rb->namestr, name);
>
> - parent_dir = &TOP_DIR(rb);
> - rb->file_pool = svn_pool_create(parent_dir->pool);
> - rb->result_checksum = NULL;
> -
> /* Add this file's name into the directory's path buffer. It will be
> removed in end_element() */
> + parent_dir = &TOP_DIR(rb);
> svn_path_add_component(parent_dir->pathbuf, rb->namestr->data);
>
> - SVN_ERR((*rb->editor->open_file)(parent_dir->pathbuf->data,
> - parent_dir->baton, base,
> - rb->file_pool,
> - &rb->file_baton));
> + /* If it's okay to make this edit, do so. */
> + if (okay_to_edit(rb, svn_node_file))
> + {
> + rb->file_pool = svn_pool_create(parent_dir->pool);
> + if (elm->id == ELEM_open_file)
> + {
> + att = svn_xml_get_attr_value("rev", atts);
> + /* ### verify we got it. punt on error. */
> + base = SVN_STR_TO_REV(att);
>
> - /* Property fetching is NOT implied in replacement. */
> - rb->fetch_props = FALSE;
> + SVN_ERR((*rb->editor->open_file)(parent_dir->pathbuf->data,
> + parent_dir->baton, base,
> + rb->file_pool,
> + &rb->file_baton));
>
> - break;
> + /* Property fetching is NOT implied in replacement. */
> + rb->fetch_props = FALSE;
> + }
> + else
> + {
> + att = svn_xml_get_attr_value("copyfrom-path", atts);
> + if (att != NULL)
> + {
> + cpath = rb->cpathstr;
> + svn_stringbuf_set(cpath, att);
> +
> + att = svn_xml_get_attr_value("copyfrom-rev", atts);
> + /* ### verify we got it. punt on error. */
> + crev = SVN_STR_TO_REV(att);
> + }
>
> - case ELEM_add_file:
> - name = svn_xml_get_attr_value("name", atts);
> - /* ### verify we got it. punt on error. */
> - svn_stringbuf_set(rb->namestr, name);
> -
> - att = svn_xml_get_attr_value("copyfrom-path", atts);
> - if (att != NULL)
> - {
> - cpath = rb->cpathstr;
> - svn_stringbuf_set(cpath, att);
> -
> - att = svn_xml_get_attr_value("copyfrom-rev", atts);
> - /* ### verify we got it. punt on error. */
> - crev = SVN_STR_TO_REV(att);
> + SVN_ERR((*rb->editor->add_file)(parent_dir->pathbuf->data,
> + parent_dir->baton,
> + cpath ? cpath->data : NULL,
> + crev, rb->file_pool,
> + &rb->file_baton));
> +
> + /* Property fetching is implied in addition. This flag is only
> + for parsing old-style reports; it is ignored when talking to
> + a modern server. */
> + rb->fetch_props = TRUE;
> + }
> }
>
> - parent_dir = &TOP_DIR(rb);
> - rb->file_pool = svn_pool_create(parent_dir->pool);
> + /* Record file-specific state. */
> rb->result_checksum = NULL;
> + rb->is_file = TRUE;
>
> - /* Add this file's name into the directory's path buffer. It will be
> - removed in end_element() */
> - svn_path_add_component(parent_dir->pathbuf, rb->namestr->data);
> -
> - SVN_ERR((*rb->editor->add_file)(parent_dir->pathbuf->data,
> - parent_dir->baton,
> - cpath ? cpath->data : NULL,
> - crev, rb->file_pool,
> - &rb->file_baton));
> -
> - /* Property fetching is implied in addition. This flag is only
> - for parsing old-style reports; it is ignored when talking to
> - a modern server. */
> - rb->fetch_props = TRUE;
> -
> break;

I haven't reviewed this section as carefully as the rest, there was so
much code unduplication going on as well as adding in the new changes.
Not objecting, mind you, but it made the diff hard to follow :-). Did
you test it against old and new servers?

I should just apply the patch and look at the result, but for now I'll
move on.

> case ELEM_txdelta:
> @@ -2276,6 +2331,11 @@
> if (! rb->receiving_all)
> break;
>
> + /* No file baton? No file contents. (We leave rb->base64_decoder
> + NULL as a flag to other handlers of this tag.) */
> + if (! rb->file_baton)
> + break;
> +
> SVN_ERR((*rb->editor->apply_textdelta)(rb->file_baton,
> NULL, /* ### base_checksum */
> rb->file_pool,
> @@ -2301,7 +2361,8 @@
> else
> svn_stringbuf_setempty(rb->encoding);
> }
> -
> + /* (In case you're wordering, the actual work of setting the
> + property is done at tag closure time.) */
> break;

"wondering" :-)

> case ELEM_remove_prop:
> @@ -2309,8 +2370,12 @@
> /* ### verify we got it. punt on error. */
> svn_stringbuf_set(rb->namestr, name);
>
> - /* Removing a prop. */
> - if (rb->file_baton == NULL)
> + /* No baton? No propchanges. */
> + if (! (rb->is_file ? rb->file_baton : TOP_DIR(rb).baton))
> + break;
> +
> + /* Remove the property. */
> + if (! rb->is_file)
> SVN_ERR(rb->editor->change_dir_prop(TOP_DIR(rb).baton,
> rb->namestr->data,
> NULL, TOP_DIR(rb).pool));
> @@ -2320,6 +2385,10 @@
> break;
>
> case ELEM_fetch_props:
> + /* No baton? No propfetching. */
> + if (! (rb->is_file ? rb->file_baton : TOP_DIR(rb).baton))
> + break;
> +
> if (!rb->fetch_content)
> {
> /* If this is just a status check, the specifics of the
> @@ -2328,7 +2397,7 @@
> property mod. */
> svn_stringbuf_set(rb->namestr, SVN_PROP_PREFIX "BOGOSITY");
>
> - if (rb->file_baton == NULL)
> + if (! rb->is_file)
> SVN_ERR(rb->editor->change_dir_prop(TOP_DIR(rb).baton,
> rb->namestr->data,
> NULL, TOP_DIR(rb).pool));
> @@ -2340,7 +2409,7 @@
> else
> {
> /* Note that we need to fetch props for this... */
> - if (rb->file_baton == NULL)
> + if (! rb->is_file)
> TOP_DIR(rb).fetch_props = TRUE; /* ...directory. */
> else
> rb->fetch_props = TRUE; /* ...file. */
> @@ -2348,6 +2417,10 @@
> break;
>
> case ELEM_fetch_file:
> + /* No baton? No propfetching. */
> + if (! rb->file_baton)
> + break;
> +
> base_checksum = svn_xml_get_attr_value("base-checksum", atts);
> rb->result_checksum = NULL;

Looks good. I'm pretty sold on this whole idea of ignoring the extra
data at the RA layer level -- these changes are clearer, and in any
case, in a sense it's the RA layer's responsibility anyway. It's the
reliable transport mechanism, and should be the one to cope with old
servers sending too much data.

> @@ -2383,6 +2456,13 @@
>
> parent_dir = &TOP_DIR(rb);
>
> + /* Ignore editor drive portions outside our requested depth. */
> + /* ### TODO(sd): We lie here and say this is a file when we
> + actually have no idea what the kind is. Better to be
> + intentionally conservative than accidentally liberal. */
> + if (! okay_to_edit(rb, svn_node_file))
> + break;

On the other hand, maybe it's always okay to let delete_entry()
through? If the client has a depth of sufficient shallowness that the
item isn't present locally anyway, then deleting it is a no-op; and if
the item *is* present, then we certainly want to delete it, because by
definition the depth on the client side is deep enough for the item to
be present to be deleted.

I didn't say that very well, but do you know what I mean?

> /* Pool use is a little non-standard here. When lots of items in the
> same directory get deleted each one will trigger a call to
> editor->delete_entry, but we don't have a pool that readily fits
> @@ -2416,71 +2496,44 @@
> {
> svn_ra_dav__resource_t *rsrc = NULL;
> apr_hash_t *props = NULL;
> + const char *url;
>
> /* Do nothing if parsing a modern report, because the properties
> already come inline. */
> if (rb->receiving_all)
> return SVN_NO_ERROR;
>
> - /* Do nothing if we aren't fetching content. */
> + /* Do nothing if we aren't fetching content. */
> if (!rb->fetch_content)
> return SVN_NO_ERROR;
>
> - if (rb->file_baton)
> - {
> - if (! rb->fetch_props)
> - return SVN_NO_ERROR;
> + /* Do nothing if we aren't fetching props. */
> + if (! (rb->is_file ? rb->fetch_props : TOP_DIR(rb).fetch_props))
> + return SVN_NO_ERROR;
>
> - /* Check to see if your parent directory already has your props
> - stored, possibly from a depth-1 propfind. Otherwise just do
> - a propfind directly on the file url. */
> - if ( ! ((TOP_DIR(rb).children)
> - && (props = apr_hash_get(TOP_DIR(rb).children, rb->href->data,
> - APR_HASH_KEY_STRING))) )
> - {
> - SVN_ERR(svn_ra_dav__get_props_resource(&rsrc,
> - rb->ras,
> - rb->href->data,
> - NULL,
> - NULL,
> - pool));
> - props = rsrc->propset;
> - }
> + /* Do nothing if we have no baton. */
> + if (! (rb->is_file ? rb->file_baton : TOP_DIR(rb).baton))
> + return SVN_NO_ERROR;
>
> - SVN_ERR(add_props(props,
> - rb->editor->change_file_prop,
> - rb->file_baton,
> - pool));
> + /* Check to see if your parent directory already has your props
> + stored, possibly from a depth-1 propfind. Otherwise just do a
> + propfind directly on the URL. */
> + url = rb->is_file ? rb->href->data : TOP_DIR(rb).vsn_url;
> + if (! ((TOP_DIR(rb).children)
> + && (props = apr_hash_get(TOP_DIR(rb).children, url,
> + APR_HASH_KEY_STRING))))
> + {
> + SVN_ERR(svn_ra_dav__get_props_resource(&rsrc, rb->ras, url, NULL,
> + NULL, pool));
> + props = rsrc->propset;
> }
> +
> + if (rb->is_file)
> + return add_props(props, rb->editor->change_file_prop,
> + rb->file_baton, pool);
> else
> - {
> - if (! TOP_DIR(rb).fetch_props)
> - return SVN_NO_ERROR;
> -
> - /* Check to see if your props are already stored, possibly from
> - a depth-1 propfind. Otherwise just do a propfind directly on
> - the directory url. */
> - if ( ! ((TOP_DIR(rb).children)
> - && (props = apr_hash_get(TOP_DIR(rb).children,
> - TOP_DIR(rb).vsn_url,
> - APR_HASH_KEY_STRING))) )
> - {
> - SVN_ERR(svn_ra_dav__get_props_resource(&rsrc,
> - rb->ras,
> - TOP_DIR(rb).vsn_url,
> - NULL,
> - NULL,
> - pool));
> - props = rsrc->propset;
> - }
> -
> - SVN_ERR(add_props(props,
> - rb->editor->change_dir_prop,
> - TOP_DIR(rb).baton,
> - pool));
> - }
> -
> - return SVN_NO_ERROR;
> + return add_props(props, rb->editor->change_dir_prop,
> + TOP_DIR(rb).baton, pool);
> }

And if add_props() were documented, this would have been a bit easier
to review. Not your fault, of course -- just griping!

The rest of the changes looked pretty straightforward. They're
complex enough that I wouldn't swear I missed no bugs, but the overall
shape looks good.

Any idea how hard this would be for ra_svn? :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun May 6 04:43:42 2007

This is an archived mail posted to the Subversion Dev mailing list.