[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

More libsvn_wc efficiency improvements.

From: Josh Pieper <jjp_at_pobox.com>
Date: 2004-05-24 03:03:48 CEST

Philip, could you look over this patch and see if anything is
obviously wrong with my entries caching modifications? Essentially
lock.c:prune_deleted was getting called for every single modification
made to the entries file. To generate a pruned list, it needs to
iterate over all the currently existing entries. It just wasn't very
noticable before because of the O(n^2) disk problems. While the patch
does cause a little extra work to be done in svn_wc__entry_modify, the
reduction in work in prune_deleted seems to negate the loss for small
directories, for big directories it's an obvious performance win.

With this patch and the current (r9870) trunk FSFS, I see no
non-linearities in checkouts when going up to at least 25,000 files in
a single directory. At that point my hard drive or filesystem started
to groan, so I stopped.

-Josh

=======================================

Improve the efficiency of libsvn_wc during updates and checkouts by
decreasing the number of times that an entries hash has its hidden
entries removed. Cache the entries list with and without hidden
entries separately and make svn_wc__entry_modify operates on both
cached lists at the same time. Prior to this, svn_wc__entry_modify
would cause a new pruned entries file to be regenerated after every
single modification. Since generating the pruned list required
iterating over the entirety of the entries list, this resulted in n^2
performance as the number of files in a directory increased.

* subversion/libsvn_wc/lock.c
  (prune_deleted): Do not attempt to take a shortcut and assign the
    full entries list to the pruned one. This was only a very small
    performance win before and now that we need to update the two
    caches separately it would be incorrect.

* subversion/libsvn_wc/entries.c
  (svn_wc__entry_modify): Get both the full and pruned entries lists,
    perform the requested modifications on both of them.

Index: subversion/libsvn_wc/entries.c
===================================================================
--- subversion/libsvn_wc/entries.c (revision 9870)
+++ subversion/libsvn_wc/entries.c (working copy)
@@ -1549,7 +1549,7 @@
                       svn_boolean_t do_sync,
                       apr_pool_t *pool)
 {
- apr_hash_t *entries;
+ apr_hash_t *entries, *entries_nohidden;
   svn_boolean_t entry_was_deleted_p = FALSE;
 
   /* ENTRY is rather necessary, and ENTRY->kind is required to be valid! */
@@ -1557,7 +1557,11 @@
 
   /* Load PATH's whole entries file. */
   SVN_ERR (svn_wc_entries_read (&entries, adm_access, TRUE, pool));
+ SVN_ERR (svn_wc_entries_read (&entries_nohidden, adm_access, FALSE, pool));
 
+ /* These two hashes cannot be the exact same ones. */
+ assert (entries != entries_nohidden);
+
   /* Ensure that NAME is valid. */
   if (name == NULL)
     name = SVN_WC_ENTRY_THIS_DIR;
@@ -1565,6 +1569,8 @@
   if (modify_flags & SVN_WC__ENTRY_MODIFY_SCHEDULE)
     {
       svn_wc_entry_t *entry_before, *entry_after;
+ apr_uint32_t orig_modify_flags = modify_flags;
+ svn_wc_schedule_t orig_schedule = entry->schedule;
 
       /* Keep a copy of the unmodified entry on hand. */
       entry_before = apr_hash_get (entries, name, APR_HASH_KEY_STRING);
@@ -1574,6 +1580,9 @@
       SVN_ERR (fold_scheduling (entries, name, &modify_flags,
                                 &entry->schedule, pool));
 
+ SVN_ERR (fold_scheduling (entries_nohidden, name, &orig_modify_flags,
+ &orig_schedule, pool));
+
       /* Special case: fold_state_changes() may have actually REMOVED
          the entry in question! If so, don't try to fold_entry, as
          this will just recreate the entry again. */
@@ -1588,14 +1597,16 @@
   /* If the entry wasn't just removed from the entries hash, fold the
      changes into the entry. */
   if (! entry_was_deleted_p)
- fold_entry (entries, name, modify_flags, entry,
- svn_wc_adm_access_pool (adm_access));
+ {
+ fold_entry (entries, name, modify_flags, entry,
+ svn_wc_adm_access_pool (adm_access));
+ fold_entry (entries_nohidden, name, modify_flags, entry,
+ svn_wc_adm_access_pool (adm_access));
+ }
 
   /* Sync changes to disk. */
   if (do_sync)
     SVN_ERR (svn_wc__entries_write (entries, adm_access, pool));
- else
- svn_wc__adm_access_set_entries (adm_access, FALSE, NULL);
 
   return SVN_NO_ERROR;
 }
Index: subversion/libsvn_wc/lock.c
===================================================================
--- subversion/libsvn_wc/lock.c (revision 9870)
+++ subversion/libsvn_wc/lock.c (working copy)
@@ -915,30 +915,6 @@
     {
       apr_hash_index_t *hi;
 
- /* I think it will be common for there to be no deleted entries, so
- it is worth checking for that case as we can optimise it. */
- for (hi = apr_hash_first (pool, adm_access->entries_hidden);
- hi;
- hi = apr_hash_next (hi))
- {
- void *val;
- const svn_wc_entry_t *entry;
- apr_hash_this (hi, NULL, NULL, &val);
- entry = val;
- if ((entry->deleted
- && (entry->schedule != svn_wc_schedule_add)
- && (entry->schedule != svn_wc_schedule_replace))
- || entry->absent)
- break;
- }
-
- if (! hi)
- {
- /* There are no deleted entries, so we can use the full hash */
- adm_access->entries = adm_access->entries_hidden;
- return;
- }
-
       /* Construct pruned hash without deleted entries */
       adm_access->entries = apr_hash_make (adm_access->pool);
       for (hi = apr_hash_first (pool, adm_access->entries_hidden);

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon May 24 03:04:53 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.