[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Fri, 16 Apr 2010 11:41:44 +0100

Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de> writes:

> Philip Martin wrote:
>> Stefan Fuhrman <stefanfuhrmann_at_alice-dsl.de> writes:
>>> svn_cache_t lookup returns a copy of the value found.
>>> For objects of variable size, this can be expensive. If you
>>> checkout / export all N files of a specific directory, for
>>> instance, every single path gets verified. The directory
>>> information is cached for FSFS but the copy operation
>>> will still be O(N).
>>> As a result, the whole check takes O(N2). The good
>> That sounds bad. Do you have any measurements to show how much
>> improvement your patch gives?
> Well, choose your N ;) For low square mean numbers of files
> per folder as in TSVN and SVN, the gains were between 3
> and 4 percent of total "svn export" runtime. If you want me to,
> I could run a test with 10k files in a folder (probably a realistic
> upper limit).

Sounds promising. In this case I think server performance is the more
interesting. If you can be bothered then run a server, do a checkout
of a 1K or a 10k directory (probably many times) and look at CPU or
memory usage of the server. But if you are getting 3% on the client
then the server is probably getting a bigger gain.

>>> - SVN_ERR(svn_cache__get((void **) &manifest, &is_cached,
>>> - ffd->packed_offset_cache, &shard, pool));
>>> + SVN_ERR(svn_cache__get_pinned((void **) &manifest, &is_cached,
>>> + ffd->packed_offset_cache, &shard,
>>> pool));
>>> if (is_cached)
>>> {
>>> *rev_offset = APR_ARRAY_IDX(manifest, rev %
>>> ffd->max_files_per_dir,
>>> apr_off_t);
>> Adding
>> SVN_ERR(svn_foo(...));
>> would now be a bug because an error would cause the unpin to be
>> skipped. This is an interface that is easy to get wrong, how about a
>> calback interface similar to svn_wc__call_with_write_lock?
> The implementation is supposed not pin nor lock upon
> lookup failure. Perhaps, I should make that clear in the
> API documentation.

What I meant was that with code like:

    if (...)

it's easy to make a change in the future like:

    if (...)

and that is a bug. We have had bugs in the past like that: errors
that leak, working copy locks that leak, and now we have pins that
leak. If you write an interface like svn_wc__call_with_write_lock
then it's much harder to leak.

>> So pinning leaves the cache locked and failing to unpin is essentially
>> an unrecoverable error.
> Yes. Even if the locking would not be a problem. Pinning
> is equivalent to locking a resource (the entry in that case).
> Memory, for instance, will not be reclaimed until you destroy
> the whole cache. This is the very purpose of this functionality.
>> I think your claim that the interface allows more than one item to be
>> pinned is also suspect. How would the locking work? Could two
>> threads each pin an item?
> The current implementation is more simplistic than what
> the interface allows. Locking the cache is a by-product
> of that simplistic implementation.
>> Do you really need to leave the cache locked? Would it be enough to
>> arrange for the item not to be evicted from the LRU cache? Move the
>> pinned item from the LRU cache to a pinned cache, and move it back
>> when unpinned?
> An alternative implementation would associate every
> cache entry with a pin count. While the latter is >0,
> the respective entry will not be evicted from cache.
> The basic meaning of "pin" is: 'keep this reference
> valid as long as it is pinned'.
> The downside is that you may be tempted to pin many
> items or to pin them for a longer time, possibly forgetting
> to release these items in certain edge-cases.

An svn_wc__call_with_write_lock interface mostly solves the forgetting
to unpin problem.

> That is the reason why I chose to pin at most one item
> at any given time. Mistakes in cache usage would become
> obvious pretty quickly. On the downside, multi-threaded
> code needs additional coordination to make that "only 1"
> limit feasible. Thus, the lock.
> If your assessment is that "misusing" the lock is more
> risky than the risk of a temporary memory leak, I can
> change the implementation this weekend and post a
> new patch.

Leaving the cache locked feels wrong to me, but that really is an
off-the-cuff reaction.

BTW, there is at least one memory bug associated with the cache:
something to do with the way apr hash tables reuse memory.

Received on 2010-04-16 12:42:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.