[PATCH] Fix O(n) runtime in cache lookup, part 1/2

From: Stefan Fuhrman <stefanfuhrmann_at_alice-dsl.de>
Date: Tue, 06 Apr 2010 19:56:04 +0200

Hi devs,

I further looked into the SVN core libs performance and
found a couple of things that can be improved. This is one
of them.

svn_cache_t lookup returns a copy of the value found.
For objects of variable size, this can be expensive. If you
checkout / export all N files of a specific directory, for
instance, every single path gets verified. The directory
information is cached for FSFS but the copy operation
will still be O(N).

As a result, the whole check takes O(N^2). The good
news is that we don't need the information for too long.
In fact, copying it takes longer than the usage afterwards.

This patch extends the cache interface by a "get and pin"
operation that returns a reference to the cached object
instead of a copy. As long as an entry is pinned the reference
will remain valid. The caller has to unpin it explicitly.

While the interface allows for multiple pinned objects to
exist, the current implementation supports at most one.

Because you have to release the pinned item and don't
want the cache to get too large, it will almost always be
the responsibility of the calling code to pin and unpin it
(and not to defer the latter it some other function).
So, having at most one pinned item may not be too much
of a limitation after all.

Finally, memcache does not support *_get_pinned but
it would hardly be meaningful in that case.

-- Stefan^2.

[[[
Allow for cache entries to be access directly, i.e. without
prior copying. Use that to fix FSFS offset lookup runtime
from O(N) to amortized O(1).

* subversion/include/private/svn_cache.h
  (svn_cache__create_memcache, svn_cache__get):
  update commentary.
  (svn_cache__get_pinned, svn_cache__unpin):
  declare new functions

* subversion/libsvn_fs_fs/fs_fs.c
(get_packed_offset): use new interface functions.

* subversion/libsvn_subr/cache-inprocess.c
  (inprocess_cache_t): add member for pinned entry
  (inprocess_cache_get): reimplement using the new
  interface functions to avoid code duplication
  (inprocess_cache_set, inprocess_cache_iter):
  add checked for pinned entries
  (inprocess_cache_get_pinned, inprocess_cache_unpin):
  implement
  (inprocess_cache_vtable): add new entries

* subversion/libsvn_subr/cache.c
(svn_cache__get_pinned, svn_cache__get_unpin):
implement

* subversion/libsvn_subr/cache.h
(svn_cache__vtable_t): extend vtable

patch by stefanfuhrmann < at > alice-dsl.de
]]]

Index: subversion/include/private/svn_cache.h
===================================================================
--- subversion/include/private/svn_cache.h (revision 930220)
+++ subversion/include/private/svn_cache.h (working copy)
@@ -144,7 +144,8 @@
  *
  * These caches are always thread safe.
  *
- * These caches do not support svn_cache__iter.
+ * These caches do not support svn_cache__iter, svn_cache__get_pinned
+ * and svn_cache__unpin.
  *
  * If Subversion was not built with apr_memcache support, always
  * raises SVN_ERR_NO_APR_MEMCACHE.
@@ -196,6 +197,10 @@
  * setting @a *found to TRUE iff it is in the cache and FALSE if it is
  * not found. The value is copied into @a pool using the copy
  * function provided to the cache's constructor.
+ *
+ * For large objects that will only be needed temporarily, consider
+ * using svn_cache__get_pinned, if available. It eliminates the need
+ * for copying convoluted data structures.
  */
svn_error_t *
svn_cache__get(void **value,
@@ -248,6 +253,50 @@
                 svn_iter_apr_hash_cb_t func,
                 void *baton,
                 apr_pool_t *pool);
+
+/**
+ * Returns a reference to a value indexed by @a key from @a cache into
+ * @a *value, setting @a *found to TRUE iff it is in the cache and FALSE
+ * if it is not found. If necessary, temporary objects will be allocated
+ * in @a pool.
+ *
+ * Unlike svn_cache__get, this function does not create a copy of that
+ * value but returns a reference to the cached value. The reference remains
+ * valid until svn_cache__unpin is called.
+ *
+ * Every value returned by this function must be released explicitly by
+ * calling svn_cache__unpin. It is suggested that the calling code evaluates
+ * the returned @a *value immediately and unpins it a.s.a.p.
+ *
+ * It is not legal to perform any other cache operations on @a cache while
+ * an entry is pinned (future implementations may be less restrictive).
+ * However, access from different threads will get serialized so that
+ * they don't need to check wheter some other thread pinned a value.
+ *
+ * svn_cache__get_pinned is not supported by all cache implementations;
+ * see the svn_cache__create_* function for details.
+ */
+svn_error_t *
+svn_cache__get_pinned(void **value,
+ svn_boolean_t *found,
+ const svn_cache__t *cache,
+ const void *key,
+ apr_pool_t *pool);
+
+/**
+ * Unpin a @a value reference from @a cache previously returned by
+ * svn_cache__get_pinned. The function will return an error if the
+ * reference has not been pinned before. If necessary, temporary
+ * objects will be allocated in @a pool.
+ *
+ * svn_cache__get_pinned is not supported by all cache implementations;
+ * see the svn_cache__create_* function for details.
+ */
+svn_error_t *
+svn_cache__unpin(const void *value,
+ const svn_cache__t *cache,
+ apr_pool_t *pool);
+
/** @} */

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c (revision 930220)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -1822,15 +1822,19 @@
   apr_array_header_t *manifest;
   apr_pool_t *iterpool;

+ /* We can operate on the cached object directly since we only need it
+ * in this local context. Also, we just want to access a single element
+ * of that array. Copying the whole thing would be expensive.
+ */
   shard = rev / ffd->max_files_per_dir;
- SVN_ERR(svn_cache__get((void **) &manifest, &is_cached,
- ffd->packed_offset_cache, &shard, pool));
+ SVN_ERR(svn_cache__get_pinned((void **) &manifest, &is_cached,
+ ffd->packed_offset_cache, &shard, pool));

   if (is_cached)
     {
       *rev_offset = APR_ARRAY_IDX(manifest, rev % ffd->max_files_per_dir,
                                   apr_off_t);
- return SVN_NO_ERROR;
+ return svn_cache__unpin((void*)manifest, ffd->packed_offset_cache, pool);
     }

   /* Open the manifest file. */
Index: subversion/libsvn_subr/cache-inprocess.c
===================================================================
--- subversion/libsvn_subr/cache-inprocess.c (revision 930220)
+++ subversion/libsvn_subr/cache-inprocess.c (working copy)
@@ -66,6 +66,13 @@
    */
   apr_pool_t *cache_pool;

+ /* Reference to the item returned by inprocess_cache_get_pinned.
+ * Currently, we support at most one item to be pinned at any given
+ * time. While it is pinned, the mutex will be locked as will.
+ * If NULL, no item has been pinned.
+ */
+ struct cache_entry *pinned_entry;
+
#if APR_HAS_THREADS
   /* A lock for intra-process synchronization to the cache, or NULL if
    * the cache's creator doesn't feel the cache needs to be
@@ -205,31 +212,38 @@
   return err;
}

+/* Implement get in terms of get_pinned, copy an unpin.
+ * The overhead of doing so is marginal.
+ *
+ * This is also an example of how client code should use the
+ * pin / unpin functionality.
+ */
static svn_error_t *
+inprocess_cache_get_pinned(void **value_p,
+ svn_boolean_t *found,
+ void *cache_void,
+ const void *key,
+ apr_pool_t *pool);
+
+static svn_error_t *
+inprocess_cache_unpin(const void *value_p,
+ void *cache_void,
+ apr_pool_t *pool);
+
+static svn_error_t *
inprocess_cache_get(void **value_p,
                     svn_boolean_t *found,
                     void *cache_void,
                     const void *key,
                     apr_pool_t *pool)
{
- inprocess_cache_t *cache = cache_void;
- struct cache_entry *entry;
- svn_error_t *err;
+ void* temp;
+ SVN_ERR(inprocess_cache_get_pinned (&temp, found, cache_void, key, pool));
+ if (!*found)
+ return SVN_NO_ERROR;

- SVN_ERR(lock_cache(cache));
-
- entry = apr_hash_get(cache->hash, key, cache->klen);
- if (! entry)
- {
- *found = FALSE;
- return unlock_cache(cache, SVN_NO_ERROR);
- }
-
- move_page_to_front(cache, entry->page);
-
- *found = TRUE;
- err = duplicate_value(value_p, cache, entry->value, pool);
- return unlock_cache(cache, err);
+ SVN_ERR(duplicate_value(value_p, cache_void, temp, pool));
+ return inprocess_cache_unpin(temp, cache_void, pool);
}

/* Removes PAGE from the LRU list, removes all of its entries from
@@ -275,6 +289,14 @@

   SVN_ERR(lock_cache(cache));

+ /* We got the mutex. Thus, other threads cannot have pinned an item.
+ * But the current thread may try to modify the cache before unpinning
+ * the pinned item.
+ */
+ if (cache->pinned_entry)
+ return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+ _("Cannot modify the cache while an object is pinned."));
+
   existing_entry = apr_hash_get(cache->hash, key, cache->klen);

   /* Is it already here, but we can do the one-item-per-page
@@ -403,16 +425,91 @@
   b.user_baton = user_baton;

   SVN_ERR(lock_cache(cache));
+
+ /* We got the mutex. Thus, other threads cannot have pinned an item.
+ * But the current thread may try to access the cache before unpinning
+ * the pinned item.
+ * While this restriction is not strictly necessary here, we may want
+ * to enforce the pin-evaluate-unpin pattern regardless of the threading
+ * model, for instance. Otherwise, we may end up with Heisenbugs etc.
+ */
+ if (cache->pinned_entry)
+ return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+ _("Cannot read the cache while an object is still pinned."));
+
   return unlock_cache(cache,
                       svn_iter_apr_hash(completed, cache->hash, iter_cb, &b,
                                         pool));

}

+static svn_error_t *
+inprocess_cache_get_pinned(void **value_p,
+ svn_boolean_t *found,
+ void *cache_void,
+ const void *key,
+ apr_pool_t *pool)
+{
+ inprocess_cache_t *cache = cache_void;
+ struct cache_entry *entry;
+
+ SVN_ERR(lock_cache(cache));
+
+ /* We got the mutex. Thus, other threads cannot have pinned an item.
+ * But the current thread may try to pin two entries or the same entry
+ * twice.
+ */
+ if (cache->pinned_entry)
+ return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+ _("Cannot pin more than one object."));
+
+ /* Lookup. */
+ entry = apr_hash_get(cache->hash, key, cache->klen);
+ if (! entry)
+ {
+ *found = FALSE;
+ return unlock_cache(cache, SVN_NO_ERROR);
+ }
+
+ /* LRU mechanism. */
+ move_page_to_front(cache, entry->page);
+
+ /* Mark / pin the entry found and return the reference */
+ *found = TRUE;
+ cache->pinned_entry = entry;
+ *value_p = entry->value;
+
+ /* Cache remains locked until *unpin has been called. */
+ return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+inprocess_cache_unpin(const void *value_p,
+ void *cache_void,
+ apr_pool_t *pool)
+{
+ inprocess_cache_t *cache = cache_void;
+
+ /* The value reference must match the one we returned
+ * (and we must be one still pinned at all).
+ */
+ if (!cache->pinned_entry || cache->pinned_entry->value != value_p)
+ return svn_error_create(SVN_ERR_INCORRECT_PARAMS, NULL,
+ _("Object has not been pinned"));
+
+ /* Remove the marker / unpin the entry */
+ cache->pinned_entry = NULL;
+
+ /* "re-open" the cache to other threads */
+ return unlock_cache(cache, SVN_NO_ERROR);
+}
+
static svn_cache__vtable_t inprocess_cache_vtable = {
   inprocess_cache_get,
   inprocess_cache_set,
- inprocess_cache_iter
+ inprocess_cache_iter,
+ inprocess_cache_get_pinned,
+ inprocess_cache_unpin
};

svn_error_t *
@@ -443,6 +540,8 @@
   /* The sentinel doesn't need a pool. (We're happy to crash if we
    * accidentally try to treat it like a real page.) */

+ cache->pinned_entry = NULL;
+
#if APR_HAS_THREADS
   if (thread_safe)
     {
Index: subversion/libsvn_subr/cache-memcache.c
===================================================================
--- subversion/libsvn_subr/cache-memcache.c (revision 930220)
+++ subversion/libsvn_subr/cache-memcache.c (working copy)
@@ -222,10 +222,32 @@
                           _("Can't iterate a memcached cache"));
}

+static svn_error_t *
+memcache_get_pinned(void **value_p,
+ svn_boolean_t *found,
+ void *cache_void,
+ const void *key,
+ apr_pool_t *pool)
+{
+ return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+ _("Can't pin a memcached object"));
+}
+
+static svn_error_t *
+memcache_unpin(const void *value_p,
+ void *cache_void,
+ apr_pool_t *pool)
+{
+ return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+ _("Can't unpin a memcached object"));
+}
+
static svn_cache__vtable_t memcache_vtable = {
   memcache_get,
   memcache_set,
- memcache_iter
+ memcache_iter,
+ memcache_get_pinned,
+ memcache_unpin,
};

svn_error_t *
Index: subversion/libsvn_subr/cache.c
===================================================================
--- subversion/libsvn_subr/cache.c (revision 930220)
+++ subversion/libsvn_subr/cache.c (working copy)
@@ -96,3 +96,34 @@
                                pool);
}

+svn_error_t *
+svn_cache__get_pinned(void **value_p,
+ svn_boolean_t *found,
+ const svn_cache__t *cache,
+ const void *key,
+ apr_pool_t *pool)
+{
+ /* In case any errors happen and are quelched, make sure we start
+ out with FOUND set to false. */
+ *found = FALSE;
+ return handle_error(cache,
+ (cache->vtable->get_pinned)(value_p,
+ found,
+ cache->cache_internal,
+ key,
+ pool),
+ pool);
+}
+
+svn_error_t *
+svn_cache__unpin(const void *value_p,
+ const svn_cache__t *cache,
+ apr_pool_t *pool)
+{
+ return handle_error(cache,
+ (cache->vtable->unpin)(value_p,
+ cache->cache_internal,
+ pool),
+ pool);
+}
+
Index: subversion/libsvn_subr/cache.h
===================================================================
--- subversion/libsvn_subr/cache.h (revision 930220)
+++ subversion/libsvn_subr/cache.h (working copy)
@@ -47,6 +47,16 @@
                        svn_iter_apr_hash_cb_t func,
                        void *baton,
                        apr_pool_t *pool);
+
+ svn_error_t *(*get_pinned)(void **value,
+ svn_boolean_t *found,
+ void *cache_implementation,
+ const void *key,
+ apr_pool_t *pool);
+
+ svn_error_t *(*unpin)(const void *value,
+ void *cache_implementation,
+ apr_pool_t *pool);
} svn_cache__vtable_t;

struct svn_cache__t {
Received on 2010-04-06 19:56:40 CEST

This message: [ Message body ]
Next message: Stefan Fuhrman: "[PATCH] Fix O(n) runtime in cache lookup, part 2/2"
Previous message: Paul Burba: "Re: svn commit: r931164 - /subversion/branches/svn-patch-improvements/"
Next in thread: Philip Martin: "Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2"
Reply: Philip Martin: "Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2"
Maybe reply: Stefan Fuhrmann: "Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]