On Fri, Feb 10, 2012 at 04:00:02PM -0500, Nico Kadel-Garcia wrote:
> On Fri, Feb 10, 2012 at 1:21 PM, Bruce Lysik <blysik_at_yahoo.com> wrote:
> > Hi,
> > I'm considering deploying 3 front-ends, all mounting the same SAN volume for
> > repo. (The SAN handle flock() and fnctl() correctly.) These 3 FEs would be
> > load balanced by a Citrix Netscaler. (At least for http(s).)
> > Would there be any problems with this configuration?
> Potentially. Read operations, I wouldn't expect to be a big problem,
> but commit operations need to atomic, and the software wasn't
> *written* to behave well with network mounted back end filesystems
> across multiple servers. So I wouldn't know, off hand, what phase
> delays between two front ends writing revisions at the same time might
> create for genuine adventures on the back end.
Subversion was designed to allow multiple concurrent server processes
accessing the same repository.
And, generally, yes, there is a lot less risk of curruption if no
network i/o is involved when data is written to the repository
by a server process. After all, you're adding yet another complex
layer where something can go wrong.
But assuming locking via fcntl() works correctly, there shouldn't be
a problem with FSFS repositories ("svnadmin create --fs-type=fsfs",
which is default). FSFS was specifically designed for use with NFS.
This was stated in the release announcement of Subversion 1.1 which
was the first release to support FSFS
Technical details are available at
SAN storage usually appears as a local disk, so things should work fine.
I know of setups that run virtualised servers which access repositories
on SAN storage without issues.
It would be far from the truth to claim that problems have never been
seen on network filesystems, though. For example, I know people who,
after putting FSFS repositories on a CIFS share, ended up with a corrupt
rep-cache.db. This is an sqlite database added to FSFS in Subversion 1.6.
Sqlite requires the same locking primitives that Subversion requires
(see http://sqlite.org/faq.html#q5). This problem happened even with
just a single server instance writing to the repository. However, the
most likely cause is flawed or misconfigured file locking support in
the CIFS implementation. I could not examine the failure in detail.
But it involved a huge commit that took a long time, with possibly
concurrent commits. The rep-cache.db file is opened by every commit
operation so a file locking race is quite likely to hit here (not
discounting other possible races such as two commits writing to the
same revision file at the same time, they're just less likely).
The rep cache could be disabled in fsfs.conf without harm to recover
from this problem. Disabling this cache only increases the odds that
future revisions store redundant content but has no effect on correctness.
AFAIK repositories were moved off the share and the problem has not
Received on 2012-02-10 23:01:50 CET