[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Svnserve scalability problems in multi-threaded mode (-T)

From: Krotil, Radek <radek.krotil_at_siemens.com>
Date: Sat, 18 Apr 2020 07:12:52 +0000

Hi Everyone.

Let me revisit this old topic that has been discussed in 2016 with the Subversion team. As our tool finally added support for SVN 1.10 version, we have a breakthrough in this issue after all these years. And I feel it is something the Subversion team should be aware of.

There are two topics I’d like to bring up here, as they are inter-connected:

1) Svnserve causing mutex lock contention in threaded mode

  * Discussed in http://subversion.1072662.n5.nabble.com/Better-choice-for-Linux-semaphore-than-spinlock-td204915.html#a204989
  * This report I believe originates at our enterprise customer, who has seen this behavior at high concurrent usage. Simply when svnserve on Linux (RHEL 7.6) is configured to run in threaded mode, then we start seeing the following pattern. All the CPU time is consumed by the concurrent usable. In the worst case that we have ever seen, almost all the CPU is consumed by system time, presumably related to that spinlock contention discussed in the thread above.
  * According to our developer analysis, svnserve behaves very differently depending on the fact if there are enough threads in the svnserve pool or not. Quoting our dev: “Svnserve waits on socket read if there is enough threads in pool. But it behaves a bit differently if more than half of threads from pool are occupied by work. In that case, it immediately returns thread after each command/operation back to the pool which is again trying to get out of pool as there is a lot of work to do – and this is point of locking. It does also processing using round-robin, which will intentionally prolong connection operations I think, trying to reduce load back to normal state. Otherwise, if there are enough threads in pool (lets say by default under 128 threads), these active threads are not returned back to pool after each command and just processing next commands in command queue for given connection and it is fully concurrent without lock.”
  * So now when we finally added support for SVN 1.10 that added new configuration options for tuning the number of the threads, we were able to do more experiments based on Stefan Fuhrmann’s recommendation<http://subversion.1072662.n5.nabble.com/Deadlock-like-behaviour-of-svnserve-in-multi-threaded-mode-T-tp196421p196500.html>. When we apply the recommended tuning options –min-threads 64 –max-threads 1024, the situation improves significantly. See the figures below.

Svnserve in threaded mode – no threads tuning

Svnserve in threaded mode – no threads tuning, worst case

Svnserve in threaded mode – tuned –min-threads 64 –max-threads 1024

  * Conclusion here is that svnserve from version 1.10 can be configured to support the necessary concurrency, but there is lack of guidance and potential logging that can lead admins to proper configuration. So bring it up here to your consideration, if you want to process this feedback.

2) Deadlock-like behaviour of svnserve in multi-threaded mode (-T)

  * The second problem is also related to threaded mode and we attacked this for the third time as it was significant robustness problem that caused stalling of our application with hundreds of concurrent users and therefore was escalated by our enterprise customers
  * Discussed in http://subversion.1072662.n5.nabble.com/Deadlock-like-behaviour-of-svnserve-in-multi-threaded-mode-T-tt196421.html#a196500 and also tracked as https://issues.apache.org/jira/browse/SVN-4626

  * Our scenario that leads to this problem is the following: Our tool, Polarion ALM, at times performs a re-indexing operation, where it pulls a lot of data from SVN in parallel connections. Also it is being used by hundreds of concurrent users and at times, also concurrent usage and subsequent connection creation leads to this problem. The newly created connections to svnserve stall completely and due to internal locking in Polarion, all communication to our backend stops until a timeout on the connection occurs minutes later.
  * This stalling occurs when multiple SVN connections are opened at the same time and only when SVN is running in threaded mode. This is default on Windows, and can be enabled by configuration on Linux.
  * We involved the svnkit team in the latest analysis and Alex Kitaev provided very good help. Let me again quote Alex: “I started to reduce number of parallel threads and when issue was reproducible with even two threads I've realized that the problem might be related to socket.connect call rate. Somehow, frequent connection establishment led to failues - connection state was displayed as Established, but no data was read from it. So, the workaround I've found so far, is to make sure SVNRepository instances are created subsequently along with "testConnection" call on the insance, with minimal delay between "testConnection" calls. In my pure socket test a delay of 10ms resolved the problem, with SVNRepository just subsequent calls to repository.testConnection was enough, due to testConnection call overhead. I didn't find any reference to this or similar issue on the internet, but I suspect that it might be either Windows configuration option, or APR used by Subversion that might use current time for some sort of socket/connection id and then mixes sockets up.”
  * So this analysis points to a potential problem that may cause this stalling on svn side. We were able to workaround this problem by introducing “rate limiting” when creating new SVN connections the prevents the concurrent creation of the connection. FYI our app creates some hundreds of connections in enterprise environments.

In conclusion – currently we are able to overcome both of these long standing problems after adding support for SVN 1.10. We just wanted to share our findings, so that the SVN team is aware of them. We understand our usage of SVN is bit special, but I feel our findings may help making the SVN bit better and prevent problems at other users.

Thank you SVN team for your support!

Best regards,
Radek Krotil

Siemens Digital Industries Software
Polarion ALM Product Management

Siemens Industry Software, s.r.o.
Praha 4, Doudlebská 1699/5, PSČ 140 00
IČ 256 51 897
Zapsaná v obchodním rejstříku vedeném Městským soudem v Praze, oddíl C, vložka 58222

Důležité upozornění: Tato zpráva má jen informativní charakter. Obsah této zprávy odesílatele nezavazuje a odesílatel nemá v úmyslu touto zprávou uzavřít smlouvu, přijmout nabídku, potvrdit uzavření smlouvy ani nezakládá předsmluvní odpovědnost jejího odesílatele, ledaže je odesílatelem ve zprávě uvedeno výslovně jinak. Obsah této zprávy (včetně příloh) je důvěrný. Pokud nejste zamýšleným adresátem této zprávy, zpřístupnění, kopírování, distribuce nebo užití obsahu zprávy je přísně zakázáno a v takovém případě, prosím, okamžitě informujte odesílatele a poté zprávu (vč. příloh) odstraňte z Vašeho systému.

Important Note: This message is only of informative nature. The content of this message shall not be binding for sender and sender does neither intend to conclude contract, accept offer or confirm the conclusion of the contract by this message nor this message represents pre-contractual liability of the sender, unless the sender states in the message excplicitly otherwise. The content of this message (including appendices) shall be confidential. Should you are not intended receiver of this message, any access, copying, distribution or use of the content of this message is strictly prohibited and in such case, please immediately notify the sender and subsequently delete the entire message (including apppendices) from your system.

Received on 2020-04-18 09:13:39 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.