[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Was: Enhancement needed in svn status -u

From: Paul Burba <paulb_at_softlanding.com>
Date: 2006-10-11 22:49:58 CEST

Daniel Rall <dlr@collab.net> wrote on 10/06/2006 03:49:32 PM:

> On Mon, 02 Oct 2006, Paul Burba wrote:
> ....
> > > > > > The background on all of this can be found here:
> > > > > > http://svn.haxx.se/dev/archive-2005-08/0257.shtml
> ...
> > Recall that r16344 was the ultimate solution for providing out of date

> > info about a WC and that this originally arose due to problems with
the
> > Subclipse Synchronize view that Mark Phippard described in
> > http://svn.haxx.se/subdev/archive-2005-08/0020.shtml. Recently Mark
asked
> > me to look into a several problems/limitations that still adversely
affect
> > the Synchronize view, these are:
> >
> > 1) Unless a directory has a pending child
> > addition/deletion, or there is a prop change on the
> > directory itself, the out of date (ood_*) fields
> > in svn_wc_status2_t are not set, or are incorrectly
> > set.
> >
> > 2) The out of date (ood_*) fields for the root of WC
> > are *never* set.
> >
> > 3) OOD info never bubbles up beyond an out of date
> > path's immediate parent (if even that). E.g. if
> > WC/A/B/E/beta is out of date, A, and A/B don't
> > have any ood info, and as described in #1, A/B/E
> > might not have any either.
> >
> > 4) The OOD_last_cmt_rev field for deleted paths is
> > always SVN_INVALID_REVNUM.
> >
> > The attached patch attempts to address all of these issues:
> >
> > 1) ... When we call tweak_statushash() for a dir_baton in
> > close_directory() we are passing the db->parent_baton, but the ood
info
> > for dir_baton is in itself. So when tweak_statushash() sets the ood
info
> > on the parent_baton it just resets all ood fields to the parent's
existing
> > values. To fix this I added a second baton arg to tweak_statushash so
the
> > ood info is available.
>
> This is to let the OOD info bubble up?
>
> > 2) Following your and Lieven's example from r21685, I set the
> > eb->anchor_status->ood_* fields directly in close_directory().
>
> +1. This chunk of the patch can be committed at any time. This patch
> is so big that I might lean towards multiple commits, even though it
> slightly complicates the backport.

Glad to break it up appropriately when the time comes.

> > 3) This is closely related to #1: We only call tweak_statushash() in
> > close_directory() in the first place if a dir is added, deleted, or
has a
> > prop change. I changed this to also make the call if
> > dir_baton->ood_last_commit_rev is set to anything but the default
value of
> > SVN_INVALID_REVNUM.
>
> Okay.
>
> > 4) This was a bit trickier to solve. I saw no straightforward way to
find
> > out the revision an out of date path was deleted within libsvn_wc, so
I
> > added a new function to libsvn_repos: svn_repos_deleted_rev(). I then
use
> > this new function in reporter.c's delta_dirs() and update_entry() to
get
> > the rev a path was deleted and then pass this info as the revision
> > argument to the svn_delta_editor_t's delete_entry() callback.
>
> The algorithm looks functional, but I'd like to know more about the
> performance profile of this new code. See my comments further on down
> below in its implementation.
>
> > I'm not
> > totally clear from this comment how the revision arg is supposed to be

> > used, currently we just pass SVN_INVALID_REVNUM. The doc string for
> > delete_entry() says:
> >
> > /** Remove the directory entry named @a path, a child of the
directory
> > * represented by @a parent_baton. If @a revision is set, it is
used as
> > a
> > * sanity check to ensure that you are removing the revision of @a
path
> > * that you really think you are.
> > *
> > * All allocations should be performed in @a pool.
> > */
> >
> > But I'm not sure what a "sanity check" implies exactly. I'd like to
think
> > that what I'm doing (passing the rev the path was deleted) is
compatible
> > with the doc string, but it seems *too* easy that an essentially
unused
> > arg already exists and does just what I require :-)
>
> The "If @a revision is set, it is used as a sanity check" part of this
> gives me concern that returning an incorrect "deleted rev" might have
> side-effects. What type of sanity check is performed on the revnum,
> and by what code?

In IRC regarding the above:

[16:29] pburba: I'd need to look at every implementation of
svn_delta_editor_t's delete_entry() and see what it does with the revision
arg yes?
[16:32] dlr: Perhaps only the "client"-side implementations?
[16:38] pburba: I don't understand enough about that editors to answer
that...why only client side?
[16:38] dlr: I don't think that any FS editor consumes the libsvn_repos
module.
[16:39] dlr: Direct and indirect users of that module would be affected.
[16:39] dlr: e.g. libsvn_ra_local, mod_dav_svn, svnserve, libsvn_client,
libsvn_wc?
[16:46] pburba: You've lost me I'm afraid...
[16:47] dlr: All I'm saying is that you ought to be able to ignore
libsvn_fs, as far as usage of that "deleted rev" is concerned.

I realized I needed to understand production and consumption of tree
deltas better. I still have only a basic grasp of this, but here is how I
understand the problem:

If the following are all true...

  1) An svn operation causes a tree delta
     to be produced in libsvn_repos.

  2) libsvn_repos/reporter.c's delta_dirs()
     or update_entry() is used to construct
     the tree delta.

  3) The delta's producer calls the
     svn_delta_editor_t's delete_entry()
     callback from delta_dirs()
     or update_dirs().

  4) The delete_entry() implementation isn't
     just a wrapper around the actual
     implementation callback in the edit baton.
     e.g. commit_util.c(1483):delete_entry()

  5) The implementation isn't just the network
     layer consuming the edit operation and
     passing it to it's ultimate consumer.
     e.g. libsvn_ra_svn\editor.c(157):ra_svn_delete_entry()

...Then with my patch there is now the possibility of unintended
side-effects as the delta consumer's delete_entry() implementations are
now getting a revision argument other than SVN_INVALID_REVNUM. So if I
understand the risk correctly, there are 6 delete_entry() implmentations
that meet the above criteria and are potential risks:

libsvn_client\repos_diff.c(428):delete_entry(const char *path,
svn_revnum_t base_revision, void *parent_baton, apr_pool_t *pool)

libsvn_client\repos_diff_summarize.c(119):delete_entry(const char *path,
svn_revnum_t base_revision, void *parent_baton, apr_pool_t *pool)

libsvn_wc\diff.c(980):delete_entry(const char *path, svn_revnum_t
base_revision, void *parent_baton, apr_pool_t *pool)

libsvn_wc\update_editor.c(1037):delete_entry(const char *path,
svn_revnum_t revision, void *parent_baton, apr_pool_t *pool)

libsvn_delta\default_editor.c(75):delete_entry(const char *path,
svn_revnum_t revision, void *parent_baton, apr_pool_t *pool)

All six implmentations are all safe as none use the svn_revnum_t argument.

 
> > Assuming my changes to libsvn_repos are ok, there is still the problem
of
> > older servers not providing this information. My solution is to look
to
> > deleted path's parent folder and use it's out of date info. There is
some
> > chance that this info is actually correct, and even if it is wrong it
> > would still be valid in the sense that it is a revision higher than
the
> > revision the path was deleted.
>
> See my comment immediately above.
>
> > Re the Subclipse Synchronize use case this
> > is at least somewhat helpful. And since the present alternative is no

> > info at all (i.e. SVN_INVALID_REVNUM) this slightly flawed alternative

> > seems at least a partial improvement.
> ...
> > Did you create a test for this? I didn't see anything committed, but
if
> > you have any significant work in progress could I have a copy? Problem
#1
> > above highlights the need for a real test of this functionality.
>
> Looks like I did not create a test. Paul mentioned on IRC that he has
> one in progress.

I added a second status test to
subversion/bindings/java/javahl/src/org/tigris/subversion/javahl/tests/BasicTests.java,
it's included in the attached patch. It replicates what my temporary test
did, but actually *tests* the expected results rather than relying on a
manual check.
 
> ...
> > P.S. In testing this patch I used the following hack to make the debug

> > version of svn status show all the OOD info on the command line. It
also
> > adds a temporary status test to setup various out of date scenarios.
It
> > may prove useful to anyone who tries out this patch:
>
> This status test won't be permanent, then? Do we have any OOD info
> tests?

No, because it relies on the hack I made to the debug version of svn
status, so it isn't really usefil. The JavaHL test will replace it (see
above).
 
> > Index: subversion/include/svn_repos.h
> > ===================================================================
> > --- subversion/include/svn_repos.h (revision 21732)
> > +++ subversion/include/svn_repos.h (working copy)
> > @@ -835,6 +835,24 @@
> > apr_pool_t *pool);
> >
> >
> > +/**
> > + * Set @a *deleted to the revision @a path was most recently deleted
> > + * in @a fs, within the inclusive revision bounds set by @a start and
> > + * @end. If @a path does not exist in @a root within the @a start
and
>
> This doc string should be "@a end" rather than "@end".

Fixed.
 
> > + * @a end bounds, or is not deleted within those bounds, set @a
*deleted
> > + * to SVN_INVALID_REVNUM. Use @a pool for memory allocation.
> > + *
> > + * @since New in 1.5.
> > + */
> > +svn_error_t *
> > +svn_repos_deleted_rev(svn_fs_t *fs,
> > + const char *path,
> > + svn_revnum_t start,
> > + svn_revnum_t end,
> > + svn_revnum_t *deleted,
> ...
> > Index: subversion/libsvn_repos/rev_hunt.c
> > ===================================================================
> > --- subversion/libsvn_repos/rev_hunt.c (revision 21732)
> > +++ subversion/libsvn_repos/rev_hunt.c (working copy)
> ...
> > +svn_error_t *
> > +svn_repos_deleted_rev(svn_fs_t *fs,
> > + const char *path,
> > + svn_revnum_t start,
> > + svn_revnum_t end,
> > + svn_revnum_t *deleted,
> > + apr_pool_t *pool)
> > +{
> > + apr_pool_t *subpool = svn_pool_create(pool);
>
> We can delay initialization of this sub-pool until after our START and
> END have been checked.

Moved...
 
> > + svn_fs_root_t *root;
> > + svn_revnum_t curr_rev;
> > + *deleted = SVN_INVALID_REVNUM;
> > +
> > + /* Validate the revision range. */
> > + if (! SVN_IS_VALID_REVNUM(start))
> > + return svn_error_createf
> > + (SVN_ERR_FS_NO_SUCH_REVISION, 0,
> > + _("Invalid start revision %ld"), start);
> > + if (! SVN_IS_VALID_REVNUM(end))
> > + return svn_error_createf
> > + (SVN_ERR_FS_NO_SUCH_REVISION, 0,
> > + _("Invalid end revision %ld"), end);
>
> Init sub-pool here.

...to here.
 
> > + /* Ensure that the input is ordered. */
> > + if (start > end)
> > + {
> > + svn_revnum_t tmprev = start;
> > + start = end;
> > + end = tmprev;
> > + }
> > +
> > + curr_rev = end;
> > +
> > + while (curr_rev >= start)
> > + {
> > + svn_node_kind_t kind_p;
> > + svn_pool_clear(subpool);
> > +
> > + /* Get a revision root for curr_rev. */
> > + SVN_ERR(svn_fs_revision_root(&root, fs, curr_rev, subpool));
> > +
> > + SVN_ERR(svn_fs_check_path(&kind_p, root, path, subpool));
> > +
> > + if (kind_p == svn_node_none)
> > + {
> > + curr_rev--;
> > + }
> > + else
> > + {
> > + if (curr_rev != end)
> > + *deleted = curr_rev + 1;
> > + break;
> > + }
> > + }
>
> This "while" loop seems rather performance-intensive. On a repository
> with a long history (hundreds of thousands of revs+), how much time
> does this take?

Unfortunately it is very slow. I created a greek test repos with 53600
revisions. In r2 path /A/D/H/omega is deleted. The remaining revs are
simply text mods to the other files. I checked out r1 via file:// and ran
"time svn st -u -v" in Cygwin:

With all of my patch in place except the two calls to
svn_repos_deleted_rev() in reporter.c, the time to run the command was
less than a second:

  real 0m0.390s
  user 0m0.015s
  sys 0m0.015s

With the the calls to svn_repos_deleted_rev() active the command took well
over 5 minutes:

  real 5m44.236s
  user 0m0.015s
  sys 0m0.062s

This algorithm could be improved a bit to run much faster for less
contrived cases, using a binary search to narrow the rev range before
beginning the linear search for example. But ultimately everything I've
been able to think of is still O(n). Is it even worth trying to improve
the performance...

> If it does take a noticable amount of time, would we be able to speed
> things up by adding a svn_fs_deleted_rev() API which might be able to
> produce results in a more efficient manner?

...or should I take another route altogether. I'm glad to attempt this,
but I'm not even sure where to start. Has anyone thought about this
problem in the past? Even a vague idea how to get started would be
helpful.
 
> > +
> > + svn_pool_destroy(subpool);
> > + return SVN_NO_ERROR;
> > +}
> ...
> > Index: subversion/libsvn_wc/status.c
> > ===================================================================
> > --- subversion/libsvn_wc/status.c (revision 21732)
> > +++ subversion/libsvn_wc/status.c (working copy)
> > @@ -999,18 +999,42 @@
> > /* Look up the key PATH in BATON->STATII. IS_DIR_BATON indicates
whether
> > baton is a struct *dir_baton or struct *file_baton. If the
> value doesn't
> > yet exist, and the REPOS_TEXT_STATUS indicates that this is an
> > - addition, create a new status struct using the hash's pool. Merge
> > - REPOS_TEXT_STATUS and REPOS_PROP_STATUS into the status
structure's
> > + addition, create a new status struct using the hash's pool.
> > +
> > + If IS_DIR_BATON is true, OOD_BATON is a *dir_baton cotaining the
ood
> > + information we want to set in BATON. This is necessary because
this
> > + function tweaks the status of out of date directories (BATON
> == OOD_BATON)
> > + and out of date directories' parents (BATON ==
OOD_BATON->parent_baton).
> > + In the latter case OOD_BATON contains the ood info we want to
bubble up
> > + to ancestor directories so these accurately reflect the fact they
have
> > + and ood descendent.
> > +
> > + Merge REPOS_TEXT_STATUS and REPOS_PROP_STATUS into the status
> structure's
> > "network" fields.
> > +
> > + Iff IS_DIR_BATON is true, DELETED_REV is used as follows,
otherwise it
> > + is ignored:
> > +
> > + If REPOS_TEXT_STATUS is svn_wc_status_deleted then DELETED_REV
is
> > + optionally the revision path was deleted, in all other cases
it must
> > + be set to SVN_INVALID_REVNUM. If DELETED_REV is not
> > + SVN_INVALID_REVNUM and REPOS_TEXT_STATUS is
svn_wc_status_deleted,
> > + then use DELETED_REV to set PATH's ood_last_cmt_rev field in
BATON.
> > + If DELETED_REV is SVN_INVALID_REVNUM and REPOS_TEXT_STATUS is
> > + svn_wc_status_deleted, set PATH's ood_last_cmt_rev to it's
parent's
> > + ood_last_cmt_rev value - see comment below.
> > +
> > If a new struct was added, set the repos_lock to REPOS_LOCK. */
> > static svn_error_t *
> > tweak_statushash(void *baton,
> > + void *ood_baton,
>
> I can't quite put my finger on why, but I'd like a different name for
> OOD_BATON if we can come up with something more specific.

How about this_dir_baton?

~~~~~

Updated log and patch attached.

Paul B.

[[[
Further improvements to status information on working copy items which
are out of date compared to the repository.

Follow-up to r16344 (and its subsequent follow-ups: r16494, 16784, 16796,
16829, and 17844).

*
subversion/bindings/java/javahl/src/org/tigris/subversion/javahl/Status.java
  (getReposLastCmtKind): New accesor method.

* subversion/bindings/java/javahl/src/org/tigris/subversion/javahl/
  tests/BasicTests.java
  (testPathValidation): New test of SVNClient.status' out of date
  functionality

* subversion/bindings/java/javahl/src/org/tigris/subversion/javahl/tests/
  SVNTests.java
  (OneTest.checkStatus): Add overload which accepts a "checkServer" flag
  which indicates whether or not to check the WC's status against the
  repository.

* subversion/bindings/java/javahl/src/org/tigris/subversion/javahl/tests/
  WC.java
  (org.tigris.subversion.javahl.Revision, java.util.Date): Import.
  (setAllWorkingCopyRevision): New method which sets the revision number
  on all paths in the WC.
  (setItemReposLastCmtRevision, setItemReposLastCmtAuthor,
  setItemReposLastCmtDate, setItemReposKind): New setter methods for the
  four new Item.repos* fields.
  (setItemOODInfo): New method which wraps the four setItemRepos* methods.
  (check): Add overload which accepts a "checkServer" flag which indicates
  whether or not to check the expected vs. actual out-of-date statii
fields.
  (Item.reposLastCmtRevision, Item.reposLastCmtDate, Item.reposKind,
  Item.reposLastCmtAuthor): New fields representing the WC's out of date
  status.

* subversion/include/svn_repos.h
  (svn_repos_deleted_rev): New function to find the revision a path was
  most recently deleted within a give revision range.

* subversion/libsvn_repos/reporter.c
  (update_entry, delta_dirs): Use the new function svn_repos_deleted_rev()
  to determine the revision deleted paths were deleted and pass this to
  the delete_entry callback rather than defaulting to SVN_INVALID_REVNUM.

* subversion/libsvn_repos/rev_hunt.c
  (svn_repos_deleted_rev): New function definition.

* subversion/libsvn_wc/status.c
  (tweak_statushash): Add second baton argument which contains the out
  of date info for a dir baton when tweaking that baton's parent. Add
  another argument to identify the revision a path was deleted when
  handling deletes. When deleting paths: Construct the correct url for
  the path and record deleted path's deleted revision in the path's
  svn_wc_status2_t structure. For pre-1.5 servers, which don't provide
  the deleted revision, use the parent's last committed rev as a best
  guess.
  (delete_entry, close_file): Supply new args to tweak_statushash()
  calls.
  (close_directory): Tweak status for directories even when the only
  change is that they have and an out of date descendents. Supply new
  args to tweak_statushash() call.
]]]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Wed Oct 11 22:50:23 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.