[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r14722 - trunk/subversion/libsvn_repos

From: <kfogel_at_collab.net>
Date: 2005-05-16 17:28:59 CEST

djh@tigris.org writes:
> --- trunk/subversion/libsvn_repos/log.c (original)
> +++ trunk/subversion/libsvn_repos/log.c Fri May 13 08:41:18 2005
> @@ -187,21 +187,223 @@
> return SVN_NO_ERROR;
> }
>
> +/* This is used by svn_repos_get_logs3 to get a revision
> + root for a path. */
> +static svn_error_t *
> +path_history_root (svn_fs_root_t **root,
> + svn_fs_t *fs,
> + const char* path,
> + svn_revnum_t rev,
> + svn_repos_authz_func_t authz_read_func,
> + void* authz_read_baton,
> + apr_pool_t* pool)
> +{
> + /* Get a revision root for REV. */
> + SVN_ERR (svn_fs_revision_root (root, fs, rev, pool));
> +
> + if (authz_read_func)
> + {
> + svn_boolean_t readable;
> + SVN_ERR (authz_read_func (&readable, *root, path,
> + authz_read_baton, pool));
> + if (! readable)
> + return svn_error_create (SVN_ERR_AUTHZ_UNREADABLE, NULL, NULL);
> + }
> + return SVN_NO_ERROR;
> +}

I don't want to seem ungrateful, but...

Static internal functions get complete doc strings too :-). Just
using all-caps for params, not full Doxygen markup, of course.

> +/* This is used by svn_repos_get_logs3 to keep track of multiple
> + path history information while working through history. */
> +struct path_info
> +{
> + svn_fs_root_t *root;
> + const char *path;
> + svn_fs_history_t *hist;
> + apr_pool_t *newpool;
> + apr_pool_t *oldpool;
> + svn_revnum_t history_rev;
> +};

Documentation would be good here too. For instance, how are newpool
and oldpool related? From looking at the code, I can see there's some
sort of swaparoo going on, so presumably there are two separate
operations with overlapping lifetimes going on. But as a reader, I'd
like to know the details.

> -/* Implements svn_repos_history_func_t interface. Accumulate history
> - revisions in the apr_array_header_t * which is the BATON. */
> +/* This helper to svn_repos_get_logs3 is used to get the path's
> + history. */
> static svn_error_t *
> -history_to_revs_array (void *baton,
> - const char *path,
> - svn_revnum_t revision,
> - apr_pool_t *pool)

Seeing this instance of history_to_revs_array() disappear made me
wonder if we're still using the svn_repos_history_func_t interface
anywhere else. Turns out the only place left is in svnlook/main.c.
And, wouldn't you know it, that code is still using the deprecated
svn_repos_history(), instead of svn_repos_history2(). That's
unrelated to your change, of course; I'll make a separate post about
it. Of course, svn_repos_history2() still takes the same callback
interface, so we won't be deprecating svn_repos_history_func_t itself.

(There's also another history_to_revs_array() in
tests/libsvn_repos/repos-test.c, and furthermore it gets passed to
another call to svn_repos_history() there!)

> +get_history (struct path_info *info,
> + svn_fs_t *fs,
> + svn_boolean_t cross_copies,
> + svn_repos_authz_func_t authz_read_func,
> + void *authz_read_baton,
> + svn_revnum_t start)

This needs a complete doc string too. (Yes, I saw the minimal doc
string above, embedded in the subtraction hunk of the diff.)

Among other things, without a doc string it's harder to review... :-).

> {
> - apr_array_header_t *revs_array = baton;
> - APR_ARRAY_PUSH (revs_array, svn_revnum_t) = revision;
> + apr_pool_t *temppool;
> +
> + SVN_ERR (svn_fs_history_prev (&info->hist, info->hist, cross_copies,
> + info->newpool));
> + if (! info->hist)
> + return SVN_NO_ERROR;
> +
> + /* Fetch the location information for this history step. */
> + SVN_ERR (svn_fs_history_location (&info->path, &info->history_rev,
> + info->hist, info->newpool));
> +
> + /* Is the history item readable? If not, done with path. */
> + if (authz_read_func)
> + {
> + svn_boolean_t readable;
> + svn_fs_root_t *history_root;
> + SVN_ERR (svn_fs_revision_root (&history_root, fs,
> + info->history_rev,
> + info->newpool));
> + SVN_ERR (authz_read_func (&readable, history_root,
> + info->path,
> + authz_read_baton,
> + info->newpool));
> + if (! readable)
> + info->hist = NULL;
> + }
> +
> + /* Now we can clear the old pool. */
> + temppool = info->oldpool;
> + info->oldpool = info->newpool;
> + svn_pool_clear (temppool);
> + info->newpool = temppool;
> +
> + /* If this history item predates our START revision then
> + don't fetch any more for this path. */
> + if (info->history_rev < start)
> + info->hist = NULL;
> return SVN_NO_ERROR;
> }

Wouldn't it be marginally better for the "< start" check to come
before the authz_read_func() call? Both potentially end the history
walk, but the rev check is much cheaper -- arbitrarily cheaper, I
guess, in that we don't really know what authz_read_func() does.

Hmm, I even wonder if it's formally correct to invoke the
authz_read_func() on a path in a history_rev < start. Thoughts?

> +/* This helper to svn_repos_get_logs3 is used to advance the path's
> + history to the next one *if* the revision it is at is equal to
> + or greater than CURRENT. The CHANGED flag is only touched if
> + this path has history in the CURRENT rev. */
> +static svn_error_t *
> +check_history (svn_boolean_t *changed,
> + struct path_info *info,
> + svn_fs_t *fs,
> + svn_revnum_t current,
> + svn_boolean_t cross_copies,
> + svn_repos_authz_func_t authz_read_func,
> + void *authz_read_baton,
> + svn_revnum_t start)

Closer to a complete doc string, but should mention every parameter,
say exactly under what circumstances authz_read_func is called with
authz_read_baton, etc. Static functions are interfaces too. Also,
say "*CHANGED" (for maximum accuracy), and say exactly what is done
with it.

> +{
> + /* If we're already done with histories for this path,
> + don't try to fetch any more. */
> + if (! info->hist)
> + return SVN_NO_ERROR;
> +
> + /* If the last rev we got for this path is less than CURRENT,
> + then just return and don't fetch history for this path.
> + The caller will get to this rev eventually or else reach
> + the limit. */
> + if (info->history_rev < current)
> + return SVN_NO_ERROR;
> +
> + /* If the last rev we got for this path is equal to CURRENT
> + then set *CHANGED to true and get the next history
> + rev where this path was changed. */
> + *changed = TRUE;
> + SVN_ERR (get_history (info, fs, cross_copies, authz_read_func,
> + authz_read_baton, start));
> + return SVN_NO_ERROR;
> +}

The (currently minimal) doc string for get_history() says:

   /* This helper to svn_repos_get_logs3 is used to get the path's
      history. */

But it's not just a helper to svn_repos_get_logs3() -- it's also
apparently a helper for check_history(). This is one reason why
documenting something in terms of its being a helper for something
else can get slippery.

Say, have I bludgeoned you enough yet? :-)

> +/* Helper to find the next interesting history revision. */
> +static svn_revnum_t
> +next_history_rev (apr_array_header_t *histories)
> +{
> + svn_revnum_t next_rev = -1;
> + int i;
> +
> + for (i = 0; i < histories->nelts; ++i)
> + {
> + struct path_info *info = APR_ARRAY_IDX (histories, i,
> + struct path_info *);
> + if (! info->hist)
> + continue;
> + if (info->history_rev > next_rev)
> + next_rev = info->history_rev;
> + }
> +
> + return next_rev;
> +}

Did you mean SVN_INVALID_REVNUM instead of -1, or did you truly want -1?

Blah blah docstring blah.

> +/* Helper function used by svn_repos_get_logs3 to send history
> + info to the caller's callback. */
> +static svn_error_t *
> +send_change_rev (svn_revnum_t rev,
> + svn_fs_t *fs,
> + svn_boolean_t discover_changed_paths,
> + svn_repos_authz_func_t authz_read_func,
> + void *authz_read_baton,
> + svn_log_message_receiver_t receiver,
> + void *receiver_baton,
> + apr_pool_t *pool)

Same about docstring. For example, by "the caller's callback" do you
just mean the receiver and receiver_baton being passed in here?

> +{
> + svn_string_t *author, *date, *message;
> + apr_hash_t *r_props, *changed_paths = NULL;
> +
> + SVN_ERR (svn_fs_revision_proplist (&r_props, fs, rev, pool));
> + author = apr_hash_get (r_props, SVN_PROP_REVISION_AUTHOR,
> + APR_HASH_KEY_STRING);
> + date = apr_hash_get (r_props, SVN_PROP_REVISION_DATE,
> + APR_HASH_KEY_STRING);
> + message = apr_hash_get (r_props, SVN_PROP_REVISION_LOG,
> + APR_HASH_KEY_STRING);
> +
> + /* Discover changed paths if the user requested them
> + or if we need to check that they are readable. */
> + if ((rev > 0)
> + && (authz_read_func || discover_changed_paths))
> + {
> + svn_fs_root_t *newroot;
> + svn_error_t *patherr;
> +
> + SVN_ERR (svn_fs_revision_root (&newroot, fs, rev, pool));
> + patherr = detect_changed (&changed_paths,
> + newroot, fs,
> + authz_read_func, authz_read_baton,
> + pool);
> +
> + if (patherr
> + && patherr->apr_err == SVN_ERR_AUTHZ_UNREADABLE)
> + {
> + /* All changed-paths are unreadable, so clear all fields. */
> + svn_error_clear (patherr);
> + changed_paths = NULL;
> + author = NULL;
> + date = NULL;
> + message = NULL;
> + }
> + else if (patherr
> + && patherr->apr_err == SVN_ERR_AUTHZ_PARTIALLY_READABLE)
> + {
> + /* At least one changed-path was unreadable, so omit the
> + log message. (The unreadable paths are already
> + missing from the hash.) */
> + svn_error_clear (patherr);
> + message = NULL;
> + }
> + else if (patherr)
> + return patherr;
> +
> + /* It may be the case that an authz func was passed in, but
> + the user still doesn't want to see any changed-paths. */
> + if (! discover_changed_paths)
> + changed_paths = NULL;
> + }
> +
> + SVN_ERR ((*receiver) (receiver_baton,
> + changed_paths,
> + rev,
> + author ? author->data : NULL,
> + date ? date->data : NULL,
> + message ? message->data : NULL,
> + pool));
> +
> + return SVN_NO_ERROR;
> +}

The code itself is very readable, nice.

I was going to say something about how maybe we shouldn't call
svn_fs_revision_proplist() until we know we're going to use at least
some of the results... But then I realized I'd be killed for
gratuitous cycle-counting.

> svn_error_t *
> svn_repos_get_logs3 (svn_repos_t *repos,
> @@ -217,11 +419,14 @@
> void *receiver_baton,
> apr_pool_t *pool)
> {
> - svn_revnum_t this_rev, head = SVN_INVALID_REVNUM;
> + svn_revnum_t head = SVN_INVALID_REVNUM;
> apr_pool_t *subpool = svn_pool_create (pool);
> + apr_pool_t *sendpool = svn_pool_create (pool);
> svn_fs_t *fs = repos->fs;
> apr_array_header_t *revs = NULL;
> - int count = 0;
> + svn_revnum_t hist_end = end;
> + svn_revnum_t hist_start = start;
> + int i;
>
> SVN_ERR (svn_fs_youngest_rev (&head, fs, pool));
>
> @@ -241,6 +446,13 @@
> (SVN_ERR_FS_NO_SUCH_REVISION, 0,
> _("No such revision %ld"), end);
>
> + /* Get an ordered copy of the start and end. */
> + if (start > end)
> + {
> + hist_start = end;
> + hist_end = start;
> + }
> +
> /* If paths were specified, then we only really care about revisions
> in which those paths were changed. So we ask the filesystem for
> all the revisions in which any of the paths was changed.
> @@ -258,170 +470,137 @@
> && (! svn_path_is_empty (APR_ARRAY_IDX (paths, 0, const char *))))
> || (paths->nelts > 1)))
> {
> - /* If there is only one path, we'll just get its sorted changed
> - revisions. Else, we'll be combining all our findings into a
> - hash (to remove duplicates) and then generating a sorted
> - array from that hash. */
> - if (paths->nelts == 1)
> - {
> - /* Get the changed revisions for this path. */
> - const char *this_path = APR_ARRAY_IDX (paths, 0, const char *);
> - revs = apr_array_make (pool, 64, sizeof (svn_revnum_t));
> - SVN_ERR (svn_repos_history2 (fs, this_path,
> - history_to_revs_array, revs,
> - authz_read_func, authz_read_baton,
> - start, end,
> - strict_node_history ? FALSE : TRUE,
> - pool));
> - }
> - else
> - {
> - int i;
> - apr_hash_t *all_revs = apr_hash_make (pool);
> - apr_hash_index_t *hi;
> + svn_revnum_t current;
> + apr_array_header_t *histories;
> + svn_boolean_t any_histories_left = TRUE;
> + int sent_count = 0;
> +
> + histories = apr_array_make (subpool, paths->nelts,
> + sizeof (struct path_info *));
> + /* Create a history object for each path so we can walk through
> + them all at the same time until we have all changes or LIMIT
> + is reached.
> +
> + There is some pool fun going on due to the fact that we have
> + to hold on to the old pool with the history before we can
> + get the next history. */

Aaaah. Something like this comment is what I was looking for in the
struct field documentation earlier.

> + for (i = 0; i < paths->nelts; i++)
> + {
> + const char *this_path = APR_ARRAY_IDX (paths, i, const char *);
> + svn_fs_root_t *root;
> + struct path_info *info = apr_palloc (subpool,
> + sizeof (struct path_info));
> +
> + SVN_ERR (path_history_root (&root, fs, this_path, hist_end,
> + authz_read_func, authz_read_baton,
> + subpool));
> + info->root = root;
> + info->path = this_path;
> + info->oldpool = svn_pool_create (subpool);
> + info->newpool = svn_pool_create (subpool);
> + SVN_ERR (svn_fs_node_history (&info->hist, info->root, info->path,
> + info->oldpool));
> + SVN_ERR (get_history (info, fs,
> + strict_node_history ? FALSE : TRUE,
> + authz_read_func, authz_read_baton,
> + hist_start));
> + *((struct path_info **) apr_array_push (histories)) = info;
> + }
>
> - /* And the search is on... */
> - for (i = 0; i < paths->nelts; i++)
> + /* Loop through all the revisions in the range and add any
> + where a path was changed to the array, or if they wanted
> + history in reverse order just send it to them right away. */
> + for (current = hist_end;
> + current >= hist_start && any_histories_left;
> + current = next_history_rev (histories))
> + {
> + svn_boolean_t changed = FALSE;
> + svn_pool_clear (sendpool);
> + any_histories_left = FALSE;
> + for (i = 0; i < histories->nelts; i++)

Maybe this seems nitpicky, but: I'd reverse the order of the
svn_pool_clear() and the assignment to any_histories_left, and perhaps
also put a blank line between them. I sort of expect state variable
initialization to happen in one block, and the pool-clearing to not be
part of that block (though to still happen before the for-loop, of
course).

Speaking of which, why does this function even *have* subpool? Unlike
sendpool, it's not being used as a true loop pool. I can see why it's
tempting to have subpool, given that we do a some allocation for each
path in the first for-loop in the function, but still: either the
caller is calling svn_repos_get_logs3() in a loop or it is not. If it
is, then it should be doing the usual loop pool routine anyway; if it
is not, then why should svn_repos_get_logs3() insist on using a
subpool?

Maybe there is a good reason for it, but in that case we should
probably document that reason in a comment by subpool.

(Btw, I realize this isn't part of your change; it's just something I
happened to notice thanks to your diff.)

> {
> - const char *this_path = APR_ARRAY_IDX (paths, i, const char *);
> - apr_array_header_t *changed_revs =
> - apr_array_make (pool, 64, sizeof (svn_revnum_t));
> - int j;
> -
> - /* Get the changed revisions for this path, and add them to
> - the hash (this will eliminate duplicates). */
> - SVN_ERR (svn_repos_history2 (fs, this_path,
> - history_to_revs_array, changed_revs,
> - authz_read_func, authz_read_baton,
> - start, end,
> - strict_node_history ? FALSE : TRUE,
> - pool));
> - for (j = 0; j < changed_revs->nelts; j++)
> - {
> - /* We're re-using the memory allocated for the array
> - here in order to avoid more allocations. */
> - svn_revnum_t *chrev =
> - (((svn_revnum_t *)(changed_revs)->elts) + j);
> - apr_hash_set (all_revs, chrev, sizeof (chrev), (void *)1);
> - }
> + struct path_info *info = APR_ARRAY_IDX (histories, i,
> + struct path_info *);
> +
> + /* Check history for this path in current rev. */
> + SVN_ERR (check_history (&changed, info, fs, current,
> + strict_node_history ? FALSE : TRUE,
> + authz_read_func, authz_read_baton,
> + hist_start));
> + if (info->hist != NULL)
> + any_histories_left = TRUE;
> }

The sense of the 'strict_node_history' boolean is reversed from the
'cross_copies' booleans of the helper functions. Might it not be good
to align the helpers, at least all the way until get_history, at which
point you only have to do the "X ? Y : Z" reversal once.

> - /* Now that we have a hash of all the revisions in which any of
> - our paths changed, we can convert that back into a sorted
> - array. */
> - revs = apr_array_make (pool, apr_hash_count (all_revs),
> - sizeof (svn_revnum_t));
> - for (hi = apr_hash_first (pool, all_revs);
> - hi;
> - hi = apr_hash_next (hi))
> + /* If any of the paths changed in this rev then add or send it. */
> + if (changed)
> {
> - const void *key;
> - svn_revnum_t revision;
> -
> - apr_hash_this (hi, &key, NULL, NULL);
> - revision = *((const svn_revnum_t *)key);
> - (*((svn_revnum_t *) apr_array_push (revs))) = revision;
> + /* If they wanted it in reverse order we can send it completely
> + streamily right now. */
> + if (start > end)
> + {
> + SVN_ERR (send_change_rev (current, fs,
> + discover_changed_paths,
> + authz_read_func, authz_read_baton,
> + receiver, receiver_baton,
> + sendpool));
> + if (limit && ++sent_count > limit)
> + break;
> + }
> + else
> + {
> + /* This array must be allocated in pool -- it will be used
> + in the processing loop later. */
> + if (! revs)
> + revs = apr_array_make (pool, 64, sizeof (svn_revnum_t));
> +
> + /* They wanted it in forward order, so we have to buffer up
> + a list of revs and process it later. */
> + *(svn_revnum_t*) apr_array_push (revs) = current;
> + }

Given the (current) existence of subpool, I don't see why we have to
allocate the revs array in pool. Why not just use subpool, and move
the svn_pool_destroy(subpool) call later?

Of course, in the long run I'm semi-tentatively advocating that
subpool go away, so depending on how that turns out, it might continue
to make sense to use 'pool' above. However, in that case, the comment
about it still wouldn't be necessary, because 'pool' would be the
obvious pool to use :-).

> }
> -
> - /* Now sort the array */
> - qsort ((revs)->elts, (revs)->nelts, (revs)->elt_size,
> - svn_sort_compare_revisions);
> }
> -
> - /* If no revisions were found for these entries, we have nothing
> - to show. Just return now before we break a sweat. */
> - if (! (revs && revs->nelts))
> - return SVN_NO_ERROR;
> }
> -
> - for (this_rev = start;
> - ((start >= end) ? (this_rev >= end) : (this_rev <= end));
> - ((start >= end) ? this_rev-- : this_rev++))
> + else
> {
> - svn_string_t *author, *date, *message;
> - apr_hash_t *r_props, *changed_paths = NULL;
> -
> - svn_pool_clear (subpool);
> -
> - /* If we have a list of revs for use, check to make sure this is
> - one of them. */
> - if (revs)
> + /* They want history for the root path, so every rev has a change. */
> + int count = hist_end - hist_start + 1;
> + if (limit && count > limit)
> + count = limit;
> + for (i = 0; i < count; ++i)
> {
> - int i, matched = 0;
> - for (i = 0; ((i < revs->nelts) && (! matched)); i++)
> - {
> - if (this_rev == ((svn_revnum_t *)(revs->elts))[i])
> - matched = 1;
> - }
> -
> - if (! matched)
> - continue;
> + svn_pool_clear (sendpool);
> + if (start > end)
> + SVN_ERR (send_change_rev (hist_end - i, fs,
> + discover_changed_paths,
> + authz_read_func, authz_read_baton,
> + receiver, receiver_baton, sendpool));
> + else
> + SVN_ERR (send_change_rev (hist_start + i, fs,
> + discover_changed_paths,
> + authz_read_func, authz_read_baton,
> + receiver, receiver_baton, sendpool));
> }
> -
> - SVN_ERR (svn_fs_revision_proplist (&r_props, fs, this_rev, pool));
> - author = apr_hash_get (r_props, SVN_PROP_REVISION_AUTHOR,
> - APR_HASH_KEY_STRING);
> - date = apr_hash_get (r_props, SVN_PROP_REVISION_DATE,
> - APR_HASH_KEY_STRING);
> - message = apr_hash_get (r_props, SVN_PROP_REVISION_LOG,
> - APR_HASH_KEY_STRING);
> -
> - /* Discover changed paths if the user requested them
> - or if we need to check that they are readable. */
> - if ((this_rev > 0)
> - && (authz_read_func || discover_changed_paths))
> - {
> - svn_fs_root_t *newroot;
> - svn_error_t *patherr;
> -
> - SVN_ERR (svn_fs_revision_root (&newroot, fs, this_rev, subpool));
> - patherr = detect_changed (&changed_paths,
> - newroot, fs,
> - authz_read_func, authz_read_baton,
> - subpool);
> -
> - if (patherr
> - && patherr->apr_err == SVN_ERR_AUTHZ_UNREADABLE)
> - {
> - /* All changed-paths are unreadable, so clear all fields. */
> - svn_error_clear (patherr);
> - changed_paths = NULL;
> - author = NULL;
> - date = NULL;
> - message = NULL;
> - }
> - else if (patherr
> - && patherr->apr_err == SVN_ERR_AUTHZ_PARTIALLY_READABLE)
> - {
> - /* At least one changed-path was unreadable, so omit the
> - log message. (The unreadable paths are already
> - missing from the hash.) */
> - svn_error_clear (patherr);
> - message = NULL;
> - }
> - else if (patherr)
> - return patherr;
> -
> - /* It may be the case that an authz func was passed in, but
> - the user still doesn't want to see any changed-paths. */
> - if (! discover_changed_paths)
> - changed_paths = NULL;
> - }
> -
> - SVN_ERR ((*receiver) (receiver_baton,
> - changed_paths,
> - this_rev,
> - author ? author->data : NULL,
> - date ? date->data : NULL,
> - message ? message->data : NULL,
> - subpool));
> -
> - if (++count == limit)
> - break;
> }
>
> svn_pool_destroy (subpool);
> + if (revs)
> + {
> + /* Work loop for processing the revisions we found since they wanted
> + history in forward order. */
> + for (i = 0; i < revs->nelts; ++i)
> + {
> + svn_pool_clear (sendpool);
> + SVN_ERR (send_change_rev (APR_ARRAY_IDX (revs, revs->nelts - i - 1,
> + svn_revnum_t),
> + fs, discover_changed_paths,
> + authz_read_func, authz_read_baton,
> + receiver, receiver_baton, sendpool));
> + if (limit && i >= limit)
> + break;
> + }
> + }

Why does the final work loop come so far after the only code that
could cause it to happen? That is to say: revs can only be non-NULL
if certain things happened at the end of the we-have-explicit-paths,
and is mutually exclusive with the 'else' (not-have-explicit-paths)
case, so why not just do the work loop inside the if-body?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon May 16 18:05:41 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.