The Subversion project encountered a usage pattern of APR hash tables that
caused many megabytes of memory to be occupied by a small hash table after a
sequence of many thousands of (insert, delete) pairs. They have subsequently
changed the usage pattern to avoid this.
The cause of the problem is that every time apr_hash_set(..., non-NULL) adds a
new entry, it allocates some new pool memory for that entry and links it into
one of its linked lists. When apr_hash_set(..., NULL) deletes an entry, it
unlinks it but never re-uses that memory.
So this is an inefficiency in APR's implementation of hash tables. It is not
exactly a "leak", since it is contained with a pool, but it is unbounded memory
use caused by repeated simple actions - not a good property for a container to
have.
It would be possible to fix this in APR if people thought it worthwhile, e.g.
by using a "free-list". (Maintain, in the hash table, a list of free entries,
initially empty. When deleting an entry, move it to the free-list. When a new
entry is wanted, take one from the free-list if available, else allocate it
from the pool.) This would make the memory used by a hash table proportional
to the maximum number of entries that it had ever held simultaneously, rather
than the number of insertions that had ever been performed.
In fact, using a free-list seems to be quite easy, with virtually no speed
penalty. A patch is attached, prepared in the manner of a Subversion patch.
(Please copy replies to dev@subversion.tigris.org, as I am not subscribed to
dev@apr.apache.org)
- Julian
Prevent unbounded memory use during repeated operations on a hash table.
The hash table was allocating new memory for each new insertion of an entry,
and not reclaiming it on deletions, so repetition of (insert, delete) caused
the memory use to keep growing. This fix causes the memory freed by a deletion
to be reused by a subsequent insertion, so that the memory used by the hash
table is proportional to the maximum number of entries that it has ever held
simultaneously, rather than the number of insertions that have been performed.
* apr/tables/apr_hash.c:
(apr_hash_t): Add new field "free", a free-list.
(apr_hash_make, apr_hash_copy, apr_hash_merge): Initialise the free-list.
(find_entry): Use an entry from the free-list if there is one.
(apr_hash_set): Return a deleted entry to the free-list.
Index: tables/apr_hash.c
===================================================================
RCS file: /home/cvspublic/apr/tables/apr_hash.c,v
retrieving revision 1.40
diff -u -3 -p -d -r1.40 apr_hash.c
--- tables/apr_hash.c 19 Apr 2004 11:44:37 -0000 1.40
+++ tables/apr_hash.c 2 Nov 2004 22:14:34 -0000
@@ -73,6 +73,7 @@ struct apr_hash_t {
apr_hash_index_t iterator; /* For apr_hash_first(NULL, ...) */
unsigned int count, max;
apr_hashfunc_t hash_func;
+ apr_hash_entry_t *free; /* List of recycled entries */
};
#define INITIAL_MAX 15 /* tunable == 2^n - 1 */
@@ -92,6 +93,7 @@ APR_DECLARE(apr_hash_t *) apr_hash_make(
apr_hash_t *ht;
ht = apr_palloc(pool, sizeof(apr_hash_t));
ht->pool = pool;
+ ht->free = NULL;
ht->count = 0;
ht->max = INITIAL_MAX;
ht->array = alloc_array(ht, ht->max);
@@ -264,7 +266,10 @@ static apr_hash_entry_t **find_entry(apr
return hep;
/* add a new entry for non-NULL values */
- he = apr_palloc(ht->pool, sizeof(*he));
+ if (he = ht->free)
+ ht->free = he->next;
+ else
+ he = apr_palloc(ht->pool, sizeof(*he));
he->next = NULL;
he->hash = hash;
he->key = key;
@@ -286,6 +291,7 @@ APR_DECLARE(apr_hash_t *) apr_hash_copy(
sizeof(*ht->array) * (orig->max + 1) +
sizeof(apr_hash_entry_t) * orig->count);
ht->pool = pool;
+ ht->free = NULL;
ht->count = orig->count;
ht->max = orig->max;
ht->hash_func = orig->hash_func;
@@ -333,7 +339,10 @@ APR_DECLARE(void) apr_hash_set(apr_hash_
if (*hep) {
if (!val) {
/* delete entry */
+ apr_hash_entry_t *old = *hep;
*hep = (*hep)->next;
+ old->next = ht->free;
+ ht->free = old;
--ht->count;
}
else {
@@ -396,6 +405,7 @@ APR_DECLARE(apr_hash_t *) apr_hash_merge
res = apr_palloc(p, sizeof(apr_hash_t));
res->pool = p;
+ res->free = NULL;
res->hash_func = base->hash_func;
res->count = base->count;
res->max = (overlay->max > base->max) ? overlay->max : base->max;
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Nov 3 00:22:39 2004