[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

NFS and fsfs: slow commit performance

From: Blair Zajac <blair_at_orcaware.com>
Date: Mon, 17 Jan 2011 20:42:31 -0800

Attempting to perform high commit rates into an fsfs repository on NFS with two
or more Linux boxes, one of the processes can get stuck in fcntl() for over 30
seconds:

open("repo/db/write-lock", O_RDWR) = 4
fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}

Sample Python script below that easily shows the issue.

I've observed this with NetApp and Isilon NFS servers. I don't observe this
with a single Linux box running multiple svn processes, I'm guessing the kernel
decides who wins the lock on the system and then goes to the NFS server?

Even after the svn processes on one box are stopped, if the svn processes on the
other box are blocked in fcntl(), it can take over 30 seconds for the svn
process waiting on the lock to start.

I have a patch that replaces fsfs.c:get_lock_on_filesystem()'s implementation
with apr_file_open(APR_WRITE | APR_CREATE | APR_EXCL | APR_DELONCLOSE). If it
fails, it sleeps 1ms and doubles the sleep to a maximum of 25ms, until it
succeeds. I haven't seen it hang to the degree that fcntl() does.

Using APR_EXCL requires a NFSv3 server and for Linux, a 2.6.6 or greater kernel
(see http://nfs.sourceforge.net/#faq_d10).

Questions:

1) Is there a better algorithm than exponential sleeps for a resource when you
need to explicitly try to get the resource? I've noticed that having a slow and
a fast Linux client trying to do as many commits per second, the fast one locks
out the slow one, so the slow one ends up sleeping a lot more. I'm thinking of
using a random sleep between 1 and 100ms, where 100ms is an average commit time.

2) Would this be an appropriate patch to put into 1.7, if the locking strategy
can be configured in the fsfs.conf file?

3) I understand some of the large svn hosting providers host on NetApp, don't
they see this issue? Do they use a master/standby deployment so it doesn't matter?

Thanks,
Blair

#!/usr/bin/python -u

import os
import svn.repos
import time

repo_name = 'repo'

if os.path.isdir(repo_name):
     repo = svn.repos.open(repo_name)
else:
     repo = svn.repos.create(repo_name,
                             None,
                             None,
                             None,
                             {svn.fs.CONFIG_FS_TYPE: svn.fs.TYPE_FSFS})

fs = svn.repos.fs(repo)
youngest = svn.fs.youngest_rev(fs)

path = '/%s' % (1000*1000*time.time())

while True:
     t1 = time.time()
     txn = svn.repos.fs_begin_txn_for_commit2(repo, youngest, {})
     fs_root = svn.fs.txn_root(txn)
     if svn.core.svn_node_none == svn.fs.check_path(fs_root, path):
         svn.fs.make_dir(fs_root, path)
     svn.fs.change_node_prop(fs_root,
                             path,
                             'foo',
                             '%s' % (1000*1000*time.time()))
     youngest = svn.repos.fs_commit_txn(repo, txn)
     t2 = time.time()
     print t2 - t1
Received on 2011-01-18 05:43:14 CET

This is an archived mail posted to the Subversion Dev mailing list.