[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Frequent database corruption

From: Michel Jouvin <jouvin_at_lal.in2p3.fr>
Date: 2005-02-19 10:32:49 CET

Hi,

I forward to dev@ list my original message to uers@, after Karl answer.

As I answered Karl, I am ready to do some testing, if it can help to
analyse and solve this issue. This is not very urgent for me, if switching
to FSFS (as suggested by Karl) is an acceptable workaround (in term of
perfs and reliability). I'll do it next week.

I also have some materials (http logs, db backup, recover outputs...)
available, as mentionned and can send it if useful.

I understand also that 1.2 is not far and that there are some changes in
BDB integration. If it sounds more productive, I can wait 1.2 to start
further investigations.

Best regards,

Michel

---------- Forwarded Message ----------
Date: vendredi 18 février 2005 11:08 -0600
From: kfogel@collab.net
To: Michel Jouvin <jouvin@lal.in2p3.fr>
Subject: Re: Frequent database corruption

Michel Jouvin <jouvin@lal.in2p3.fr> writes:
> We started a production Subversion server a couple of months ago. We are
> now running Apache 2.0.52 + Subversion 1.1.3 + Db 4.2.52 on Tru64 Unix
> 5.1B.
>
> We quite frequently experience repository database corruption on all of
> our repositories (7, with very different sizes). In previous versions of
> Subversion (before 1.1.2 I would say), we were generally able to fix these
> corruptions with svnadmin recover. We are now experiencing more and more
> corruptions that can't be fixed (svnadmin recover fails), where the only
> solution is a repository restore from backup.
>
> The first corruptions we experienced generally occurred during commit,
> especially on large repositories. When we looked at possible causes for
> these corruptions, we found that one reason was we were running 2 Apache
> servers on 2 different nodes in a cluster configuration (cluster file
> system, no NFS involved). We shut down one of the server and it more or
> less solved the corruption during commits. This remains strange as the
> cluster file system has a pure local file system semantics and we never
> experienced such problems with other databases or other Db usage.
>
> Now we experienced corruptions not related to any repository write. We
> have log files showing successful repository access through HTTP GET
> followed by a GET failure due to database corruption without any
> repository modification in between and without any Apache
> problem/restart. We suspected that these corruption were related to
> Apache restart during a transaction but we now have evidence that
> corruption can occur at any time without any repository modification. We
> have Apache log files and corrupted repository copies.
>
> Generally svnadmin recover fails on these corruptions. Sometimes we were
> able to fix corruptions by recover + verify as documented in a note. We
> also have a directory that we restored from backup and needed to repair
> before having it accessible again. In this case we had to use recover +
> verify. And verify + recover definitly corrupts the repository.
>
> Please could you let us know if this is a known problem (I saw a couple of
> issue entries related to similar problems but this is unclear if this is
> really the same) and if there is any workaround ? Is FSFS an alternative
> to consider ?
>
> Thanks in advance for any help. Let us know materials we could provide to
> help in troubleshooting, if this seems necessary.

Hmmm. I don't know why you're having these problems, but they are not
unfamiliar to us. Maybe it has something to do with being on a 64-bit
system, though that's just a wild guess, I have nothing to back it up
with.

Yes, I suggest using FSFS at least for now. We're working on
improving Subversion's usage of Berkeley DB (the problems are with how
we use it, not with BDB itself). Very few people have problems as
severe as you are experiencing, and these problems have been hard for
us to reproduce reliably. You sound like you can reproduce them
pretty reliably, though, so if you want to resend your description to
the dev@subversion.tigris.org list, there might (can't promise) be a
developer interested in using you as a reproduction environment, if
you're willing. (I wish I could, but my personal stack is full right
now.)

Sorry for the troubles. I hope the situation improves for you,
-Karl

P.S. By the way, we try not to say "corruption" if data has not been
      corrupted. The issue is that your data is not accessible, but
      it has not been corrupted, from your description.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

---------- End Forwarded Message ----------

     *************************************************************
     * Michel Jouvin Email : jouvin@lal.in2p3.fr *
     * LAL / CNRS Tel : +33 1 64468932 *
     * B.P. 34 Fax : +33 1 69079404 *
     * 91898 Orsay Cedex *
     * France *
     *************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Feb 19 10:33:58 2005

This is an archived mail posted to the Subversion Dev mailing list.