[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bugs in 1.8.5 libsvn_subr which make 32-bit threaded svnserve sometimes hang and eat 100% cpu

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 13 Feb 2014 20:26:18 +0100

On Thu, Feb 13, 2014 at 12:49 PM, <vitalif_at_yourcmc.ru> wrote:

> Hi!
>
> I want to report two bugs in Subversion 1.8.5 libsvn_subr
> subversion/libsvn_subr/cache-membuffer.c, a more important one and a less
> important one. I've fixed them, and patches are attached - please apply
> them.
>

Hi Vitali,

Thank you very much for taking the time to debug the hangs
and to come up with a patch.

It also tells me that the caching code can still be understood.

> First bug (important):
>
> There is an incorrect integer addition to cache->data_used which makes
> 32-bit threaded svnserve sometimes hang and eat 100% CPU. When it hangs,
> one thread is doing an infinite loop in ensure_data_insertable() with write
> lock acquired, and others just wait for it to release the lock. So it looks
> like 100% usage of a single CPU core and non-working svn:// access.
>
> The problem is in line 1377: "cache->data_used += size - entry->size;" is
> incorrect because cache->data_used is always apr_uint64_t, and size and
> entry->size are size_t which is unsigned 32-bit integer type on 32-bit
> platforms. And size is always [almost always] less than entry->size at that
> line, so cache->data_used becomes > 0x100000000 at once which makes
> ensure_data_insertable() to loop infinitely because all entries in cache
> "become" less than 1/8 * average entry size.
>
> The patch is in the attachment 'fix-membuffer-u32-overflow'
>

Fixed in r1567985 using Peter's suggestion.

> Second bug (less important):
>
> There is also another bug in libsvn_subr membuffer cache implementation:
> cache->hit_count and entry->hit_count are incremented without proper mutual
> exclusion (only with read lock acquired), so some increments get lost in
> multithreaded svnserve; as it's unlikely two threads access the same entry
> at once, usually cache->hit_count (not entry->hit_count) increments are
> lost, which leads to cache->hit_count being less than sum(entry->hit_count).
>
> So cache->hit_count may overflow during subtraction of entry->hit_count
> from it and lead to removing of extra values from the cache in
> ensure_data_insertable(), because the retain threshold may be very big;
> also it may overflow again, become less than cache->entries_used and also
> lead to removing of extra items.
>
> Instead of killing the performance by mutual exclusion I think it's
> possible to just check for integer overflow.
>

This one is more subtle. My patch for it is in r1567996
with a follow-up in 1568009. It took me a while to see
that your proposal is actually a stable solution, i.e. not
likely to make the global counter value drift over time.

I also fixed a related issue with 32 bit hit counter overflows
(rare but may happen after some time for large directories
or items close to the root of the access path).

>
> This doesn't affect general svnserve usability, so severity is minor.
>

These problems may affect svnserve in threaded more
more than httpd but the latter should still exhibit the
infinite loop behaviour.

> The patch is in the attachment 'fix-membuffer-hit_count-overflow'
>
> Hope you'll apply these patches soon, or fix the bugs some another way :)
>

Well, the good news is that 1.7 is not affected (all access
is fully synchronized there) nor 1.9 / trunk. The latter uses
a more sophisticated scheme and will exit the critical loop
after a few iterations without affecting overall performance.

Thanks again for the report!

-- Stefan^2.
Received on 2014-02-13 20:26:54 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.