Hi,
We started a production Subversion server a couple of months ago. We are
now running Apache 2.0.52 + Subversion 1.1.3 + Db 4.2.52 on Tru64 Unix 5.1B.
We quite frequently experience repository database corruption on all of our
repositories (7, with very different sizes). In previous versions of
Subversion (before 1.1.2 I would say), we were generally able to fix these
corruptions with svnadmin recover. We are now experiencing more and more
corruptions that can't be fixed (svnadmin recover fails), where the only
solution is a repository restore from backup.
The first corruptions we experienced generally occurred during commit,
especially on large repositories. When we looked at possible causes for
these corruptions, we found that one reason was we were running 2 Apache
servers on 2 different nodes in a cluster configuration (cluster file
system, no NFS involved). We shut down one of the server and it more or
less solved the corruption during commits. This remains strange as the
cluster file system has a pure local file system semantics and we never
experienced such problems with other databases or other Db usage.
Now we experienced corruptions not related to any repository write. We have
log files showing successful repository access through HTTP GET followed by
a GET failure due to database corruption without any repository
modification in between and without any Apache problem/restart. We
suspected that these corruption were related to Apache restart during a
transaction but we now have evidence that corruption can occur at any time
without any repository modification. We have Apache log files and corrupted
repository copies.
Generally svnadmin recover fails on these corruptions. Sometimes we were
able to fix corruptions by recover + verify as documented in a note. We
also have a directory that we restored from backup and needed to repair
before having it accessible again. In this case we had to use recover +
verify. And verify + recover definitly corrupts the repository.
Please could you let us know if this is a known problem (I saw a couple of
issue entries related to similar problems but this is unclear if this is
really the same) and if there is any workaround ? Is FSFS an alternative to
consider ?
Thanks in advance for any help. Let us know materials we could provide to
help in troubleshooting, if this seems necessary.
Best regards,
Michel
*************************************************************
* Michel Jouvin Email : jouvin@lal.in2p3.fr *
* LAL / CNRS Tel : +33 1 64468932 *
* B.P. 34 Fax : +33 1 69079404 *
* 91898 Orsay Cedex *
* France *
*************************************************************
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Feb 18 16:06:25 2005