Hi Karl,
Thnaks for your answer. I'll send my report to dev@. In the meantime, I'll
switch our main repositories to FSFS as a temporary workaround. But I
accept to participate to the analysis by doing the tests that could be
useful. I can wait if FSFS solves the problem. May be it is better to wait
1.2.x and BDB 4.4 anyway as I saw there was some changes planned in the way
Subversion will interact with BDB.
About using the word 'corruption', I understand your remark. I used this
word because, in fact, it was not possible to get the database back online.
We had to use backups...
Michel
--On vendredi 18 février 2005 11:08 -0600 kfogel@collab.net wrote:
> Michel Jouvin <jouvin@lal.in2p3.fr> writes:
>> We started a production Subversion server a couple of months ago. We are
>> now running Apache 2.0.52 + Subversion 1.1.3 + Db 4.2.52 on Tru64 Unix
>> 5.1B.
>>
>> We quite frequently experience repository database corruption on all of
>> our repositories (7, with very different sizes). In previous versions of
>> Subversion (before 1.1.2 I would say), we were generally able to fix
>> these corruptions with svnadmin recover. We are now experiencing more
>> and more corruptions that can't be fixed (svnadmin recover fails), where
>> the only solution is a repository restore from backup.
>>
>> The first corruptions we experienced generally occurred during commit,
>> especially on large repositories. When we looked at possible causes for
>> these corruptions, we found that one reason was we were running 2 Apache
>> servers on 2 different nodes in a cluster configuration (cluster file
>> system, no NFS involved). We shut down one of the server and it more or
>> less solved the corruption during commits. This remains strange as the
>> cluster file system has a pure local file system semantics and we never
>> experienced such problems with other databases or other Db usage.
>>
>> Now we experienced corruptions not related to any repository write. We
>> have log files showing successful repository access through HTTP GET
>> followed by a GET failure due to database corruption without any
>> repository modification in between and without any Apache
>> problem/restart. We suspected that these corruption were related to
>> Apache restart during a transaction but we now have evidence that
>> corruption can occur at any time without any repository modification. We
>> have Apache log files and corrupted repository copies.
>>
>> Generally svnadmin recover fails on these corruptions. Sometimes we were
>> able to fix corruptions by recover + verify as documented in a note. We
>> also have a directory that we restored from backup and needed to repair
>> before having it accessible again. In this case we had to use recover +
>> verify. And verify + recover definitly corrupts the repository.
>>
>> Please could you let us know if this is a known problem (I saw a couple
>> of issue entries related to similar problems but this is unclear if this
>> is really the same) and if there is any workaround ? Is FSFS an
>> alternative to consider ?
>>
>> Thanks in advance for any help. Let us know materials we could provide to
>> help in troubleshooting, if this seems necessary.
>
> Hmmm. I don't know why you're having these problems, but they are not
> unfamiliar to us. Maybe it has something to do with being on a 64-bit
> system, though that's just a wild guess, I have nothing to back it up
> with.
>
> Yes, I suggest using FSFS at least for now. We're working on
> improving Subversion's usage of Berkeley DB (the problems are with how
> we use it, not with BDB itself). Very few people have problems as
> severe as you are experiencing, and these problems have been hard for
> us to reproduce reliably. You sound like you can reproduce them
> pretty reliably, though, so if you want to resend your description to
> the dev@subversion.tigris.org list, there might (can't promise) be a
> developer interested in using you as a reproduction environment, if
> you're willing. (I wish I could, but my personal stack is full right
> now.)
>
> Sorry for the troubles. I hope the situation improves for you,
> -Karl
>
> P.S. By the way, we try not to say "corruption" if data has not been
> corrupted. The issue is that your data is not accessible, but
> it has not been corrupted, from your description.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
*************************************************************
* Michel Jouvin Email : jouvin@lal.in2p3.fr *
* LAL / CNRS Tel : +33 1 64468932 *
* B.P. 34 Fax : +33 1 69079404 *
* 91898 Orsay Cedex *
* France *
*************************************************************
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Sat Feb 19 10:23:52 2005