On 01/26/2011 02:56 AM, Stefan Sperling wrote:
> On Tue, Jan 25, 2011 at 11:00:31PM -0800, Blair Zajac wrote:
>> We're seeing deadlocks in our Subversion multithreaded server when
>> two distinct processes try to fcntl(F_SETLKW) on two fsfs
>> repositories' db/txn-current-lock, when the processes begin
>> transactions in reverse order.
>>
>> Process 1 Process 2
>> --------- ---------
>> thread 1: begin txn in repos A thread 1: being txn in repos B
>> thread 2: begin txn in repos B thread 2: begin txn in repos A
>>
>> During normal working hours, we get over 1 commit per second,
>> peaking at 6, which is why we're seeing this.
>>
>> Questions:
>>
>> Should a fix for this be put in libsvn_fs_fs() or should I do this
>> in my application? I'm thinking putting this in libsvn_fs_fs() is
>> an appropriate fix, even though other people probably won't see it.
>>
>> I'm also thinking the code should retry a maximum of 100 times with
>> a 1ms sleep, doubling each sleep upon failure to a maximum 128 ms,
>> such as WIN32_RETRY_LOOP.
>>
>> Comments?
>
> If possible it should be fixed in libsvn_fs_fs.
I'm now thinking of putting the retry in svn_io_file_lock2() instead of
handling a deadlock in libsvn_fs_fs itself. It shouldn't hurt any other
use cases and be a general, defensive code.
Blair
Received on 2011-01-26 19:31:14 CET