Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2

From: Blair Zajac <blair_at_orcaware.com>
Date: Wed, 26 Jan 2011 10:30:35 -0800

On 01/26/2011 02:56 AM, Stefan Sperling wrote:
> On Tue, Jan 25, 2011 at 11:00:31PM -0800, Blair Zajac wrote:
>> We're seeing deadlocks in our Subversion multithreaded server when
>> two distinct processes try to fcntl(F_SETLKW) on two fsfs
>> repositories' db/txn-current-lock, when the processes begin
>> transactions in reverse order.
>>
>> Process 1 Process 2
>> --------- ---------
>> thread 1: begin txn in repos A thread 1: being txn in repos B
>> thread 2: begin txn in repos B thread 2: begin txn in repos A
>>
>> During normal working hours, we get over 1 commit per second,
>> peaking at 6, which is why we're seeing this.
>>
>> Questions:
>>
>> Should a fix for this be put in libsvn_fs_fs() or should I do this
>> in my application? I'm thinking putting this in libsvn_fs_fs() is
>> an appropriate fix, even though other people probably won't see it.
>>
>> I'm also thinking the code should retry a maximum of 100 times with
>> a 1ms sleep, doubling each sleep upon failure to a maximum 128 ms,
>> such as WIN32_RETRY_LOOP.
>>
>> Comments?
>
> If possible it should be fixed in libsvn_fs_fs.

I'm now thinking of putting the retry in svn_io_file_lock2() instead of
handling a deadlock in libsvn_fs_fs itself. It shouldn't hurt any other
use cases and be a general, defensive code.

Blair
Received on 2011-01-26 19:31:14 CET

This message: [ Message body ]
Next message: Philip Martin: "Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2"
Previous message: C. Michael Pilato: "Re: Code doesn't seem ... right"
In reply to: Stefan Sperling: "Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2"
Next in thread: Philip Martin: "Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2"
Reply: Philip Martin: "Re: EDEADLK in svn_repos_fs_begin_txn_for_commit2"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]