[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1338350 - in /subversion/trunk/subversion: libsvn_subr/io.c libsvn_subr/skel.c libsvn_wc/wc_db.c

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 15 May 2012 11:04:10 +0200

On Mon, May 14, 2012 at 11:43:40PM -0400, Greg Stein wrote:
> On Mon, May 14, 2012 at 7:05 PM, Stefan Sperling <stsp_at_elego.de> wrote:
> > On Mon, May 14, 2012 at 05:42:01PM -0400, Greg Stein wrote:
> > Yes, it is more of UI level issue that does not belong into the
> > core libraries. It should be done by the UI.
> >
> > But the UI gets one path per callback invocation.
> > So it cannot control the order itself without buffering everything.
> > That is not acceptable either.
>
> Right. What are the callback drivers? I'm guessing the status walker.
> Throwing an O(n log n) key sort into that will not cause any
> appreciable overheads, relative to the I/O that is driving the overall
> operation.
>
> There is a *user* benefit for sorted keys, rather than just stable keys.

Yes. Sorting is the only way to guarantee consistent output.

> > I had a patch that used a sorted array of hash table keys (still in
> > libsvn_wc, not in 'svn') but Bert objected to that due to performance
> > concerns and suggested to use the newly added stable hash table instead.
>
> I'd like to see that patch. I really can't imagine that we're talking
> any systemic performance here.

Glad I kept a copy. See below.

> My concern is keeping the lower levels free to do what is best for
> them. The change in this commit is *very* tenuous to its intended
> purpose. That is going to break at some point. Basically cause it
> introduces some awful coupling between our subsystems.

Well, what it really does it revert to pre-APR-1.4.6 behaviour.
I.e. it changes the behaviour back to what it was independent
of the APR version used.

The intent is to restore a side-effect of this behaviour (stable output).
We want to keep this side-effect, so we can either keep relying on our
implicit assumption that a stable hash will produce ordered output,
or we can sort. I don't care either way. I just want sorted/stable output.

> IOW, I'll take a couple milliseconds to perform a sort over the
> long-term maintenance burden of coupled systems.

You'll have to argue this one with Bert, then :P

I already told him on IRC I believe sorting dirents in memory won't
make any noticable difference since we're not going to disk and we're
not accessing sqlite while sorting, and those expensive operations
are what 'status' does all the time. But he insisted.
 
> [ to beat this dead horse some more: not a single one of your changes
> mentioned WHY svn_hash__make() was used instead of apr_hash_make();
> somebody coming along will have absolutely no idea why that was done;
> so they might switch it back because there is zero apparent effect at
> doing so; at a minimum, a comment saying "we sort our skel proplists
> so that 'svn status' (yah, that thing 15 layers up) can have stable
> output." ]

The docstring of svn_hash__make() makes the difference quite clear.

> > The unordered output sucks. This commit was the result of a user
> > complaining about unordered output of 'svn status' after he upgraded
> > to ubuntu 12.04 which includes apr-1.4.6. Unordered output which used
> > to be ordered is perceived as a regression by our users. IMO we can't
> > just ignore this problem.
>
> Sure.
>
> > What would you suggest to do instead? I'm open to suggestions.
>
> Move the code further up. Adding a qsort() somewhere is not going to
> be an issue except for some of the most pathological directory sizes.
> And I will bet directories of that size have other problems relative
> to that qsort().

Here's my original patch (which won't apply cleanly to HEAD anymore, BTW).
Note that this still sorts stuff for the benefit of 'svn status' "15
layers up". I can't find a better place to perform sorting.

And I suppose one could argue that the committed change was less
intrusive while still having the desired side-effect :)

Index: subversion/libsvn_wc/status.c
===================================================================
--- subversion/libsvn_wc/status.c (revision 1338291)
+++ subversion/libsvn_wc/status.c (working copy)
@@ -41,6 +41,7 @@
 #include "svn_config.h"
 #include "svn_time.h"
 #include "svn_hash.h"
+#include "svn_sorts.h"
 
 #include "svn_private_config.h"
 
@@ -1298,15 +1299,16 @@ get_dir_status(const struct walk_status_baton *wb,
                void *cancel_baton,
                apr_pool_t *scratch_pool)
 {
- apr_hash_index_t *hi;
   const char *dir_repos_root_url;
   const char *dir_repos_relpath;
   const char *dir_repos_uuid;
   svn_boolean_t dir_has_props;
   apr_hash_t *dirents, *nodes, *conflicts, *all_children;
+ apr_array_header_t *sorted_children;
   apr_array_header_t *collected_ignore_patterns = NULL;
   apr_pool_t *iterpool, *subpool = svn_pool_create(scratch_pool);
   svn_error_t *err;
+ int i;
 
   if (cancel_func)
     SVN_ERR(cancel_func(cancel_baton));
@@ -1390,17 +1392,23 @@ get_dir_status(const struct walk_status_baton *wb,
   dir_has_props = (dir_info->had_props || dir_info->props_mod);
 
   /* Walk all the children of this directory. */
- for (hi = apr_hash_first(subpool, all_children); hi; hi = apr_hash_next(hi))
+ sorted_children = svn_sort__hash(all_children,
+ svn_sort_compare_items_as_paths,
+ subpool);
+ for (i = 0; i < sorted_children->nelts; i++)
     {
       const void *key;
       apr_ssize_t klen;
+ svn_sort__item_t item;
       const char *child_abspath;
       svn_io_dirent2_t *child_dirent;
       const struct svn_wc__db_info_t *child_info;
 
       svn_pool_clear(iterpool);
 
- apr_hash_this(hi, &key, &klen, NULL);
+ item = APR_ARRAY_IDX(sorted_children, i, svn_sort__item_t);
+ key = item.key;
+ klen = item.klen;
 
       child_abspath = svn_dirent_join(local_abspath, key, iterpool);
Received on 2012-05-15 11:04:47 CEST

This is an archived mail posted to the Subversion Dev mailing list.