RE: <help>Database is Wedged for the 6th time. Diagnostics or Prozac, anyone?

From: Peter Kahn <pkahn_at_connected.com>
Date: 2005-03-07 17:40:54 CET

My reason for suspecting svnserve is basically that it was one of the
two in the room at the time when the body was found. I agree that
svnserve could be innocent, but it was in the wrong place at the wrong
time...

Here is my quandary... I cannot find any evidence pointing to the cause
and this leaves me bereft of good remedies.
- My logs show nothing happening at the time of the problem and there is
no abnormal behavior listed.
- I lack access to or understanding of analysis tools that could shed
light on the problem.

What I do have is a log of changes to the server. I know that prior to
going to 1.1.1 and svnserve we had one crash and that was due to a
restart of the web server in the midst of a transaction. After 1.1.1 we
have had many wedgings. This leaves me with the following possible
causes:

1. overall compilation/build problem with how I built svn (see
description of build process below).
there were no errors in the build logs
make check reported no failed tests

2. a problem with an interaction between apache an svn.
    apache is run via the svn user
    svnserve is run via the svn user
    apache has read only access
    svnserve is read/write
    apache is at 2.0.52

3. a general issue with BerkeleyDB
there were no errors in the build logs
I am running 4.2.52.NC

4. a general issue with svnserve

Looking at these, I can see the following ways for me to proceed:

1. obtain smoking gun for problem through analysis
a. set logging level to verbose (but there doesn't seem to be a
method to do this)
issue: lack of tools, or lack of understanding on my part regarding
tools/availability

2. manipulate variables in a coarse fashion to see if the weekly wedging
goes away
a. switch from SVNSERVE to HTTP WebDav access
b. switch form BerkeleyDB to FCFS

Option 1 is the most desirable, but it really hasn't been getting me
anywhere. Repeated emails to the community have yielded some good
information and helped me to track down all the other potential
culprits. For the last two crashes, the response hasn't really helped
me address the problem. I haven't tried tracking the crashes to see if
the follow a regular data problem. (I have seen a DB crash issues
before in other products that happened every 49.7 days or so where it
was linked to number-of-seconds stored in a variable and the crash was
related to the overflow and wrap.) Perhaps, there is a regular time
between wedging related to # of seconds, # of check-ins, or amount of
data from revision X to revision Y.

Since Option 1 is not bearing fruit, I am turning to option 2. Both 2a
and 2b seem pretty painful. I'm not sure which will antagonize my users
more. Perhaps, I will take a copy of my environment put it on a test
machine, and poke it with many little check-ins on a constant basis
until it falls over. Perhaps that will yield the information.

If you have any better ideas or a better plan for investigation or
mitigation, please let me know. Thanks for the help.

------------------------------------------------------------------------
---------------------
---- Process for Building Svn on Linux
------------------------------------------------------------------------
---------------------
Required Tars
    All of these can be found in svn:/root/downloads
    svn (note its installation doc as it will have the new version
requirements
    httpd 2.x (2.0.49 or greater as of 1.1.2)
    Neon (0.24.7 as of 1.1.2)
    BerkeleyDB DB (4.2.52 as of 1.1.2)

Unpack & Build Berkeley DB
    gunzip < db-4.2.52.NC.tar.gz| tar xf -
    read readme/install docs to verify this process if not 4.2.52
    cd to {unpack directory}/build_unix
    execute the following: ../dist/configure >> dbConfig.log 2>&1
    check log for errors
    make: make all >> dbMakeAll.log 2>&1
    check log for errors
    install: make install >> dbMakeAll.log 2>&1
    check log for errors

Unpack & Build Apache (if necessary)
    gunzip < httpd-2.0.52.tar.gz| tar xf -
    read readme/install docs to verify this process if not 2.0.52
    cd to {unpack directory}
    ./configure --enable-dav --enable-so --enabme-maintainer-mode >>
httpdConfig.log 2>&1
    check log for errors
    make: make all >> httpdMakeAll.log 2>&1
    check log for errors
    install: make install >> httpdMakeAll.log 2>&1
    check log for errors

Unpack & Build SVN
    gunzip < svn.1.1.2.tar.gz| tar xf -
    verify all version from INSTALL document
    cd to {unpack directory}
    unpack neon ls /
    ./configure --enable-dav --enable-so --enabme-maintainer-mode >>
httpdConfig.log 2>&1
    check log for errors
    make: make all >> httpdMakeAll.log 2>&1
    check log for errors
    install: make install >> httpdMakeAll.log 2>&1
    check log for errors
    make: make check >> makeChekc.log 2>&1
    check log for errors

-----Original Message-----
From: Ben Collins-Sussman [mailto:sussman@collab.net]
Sent: Friday, March 04, 2005 6:02 PM
To: Peter Kahn
Cc: users@subversion.tigris.org
Subject: Re: <help>Database is Wedged for the 6th time. Diagnostics or
Prozac, anyone?

On Mar 4, 2005, at 4:39 PM, Peter Kahn wrote:

> Below was my email from the 5th crash. I'm now up to the 6th. The
> difference this time is that I didn't have websvn running.
>
> I am coming to the conclusion that the svnserve executable is not
> ready for production. Switching my users back from svn to http is
> going to hurt, but at this point I can see no other alternative.
>

I don't see any evidence that this problem is related to svnserve. Why
do you think that?

> Does anyone have any suggestions on what I can do to make my
> environment
> stable?

Your errors are coming from BDB running out of some sort of resources:

> svn: Berkeley DB error while checkpointing after Berkeley DB
> transaction
> for filesystem /home/svn/repos/db:
> DB_INCOMPLETE: Cache flush was unable to complete

I wonder if the db cache is too small. We've changed the default BDB
cache size in svn 1.2. This is from the DB_CONFIG that 'svnadmin
create' now generates:

# The default cache size in BDB is only 256k. As explained in
# http://svn.haxx.se/dev/archive-2004-12/0369.shtml, this is too # small
for most applications. Bump this number if "db_stat -m" # shows too many
cache misses.
set_cachesize 0 1048576 1

So what I would do is:

1. increase the cache size, then run 'svnadmin recover'

2. stop deltifying in the post-commit hook. The deltification happens
anyway. All you're doing is creating extra BDB traffic for no reason.
This may cause you to hit cache limits even faster.

If these things make no difference, then switch to FSFS.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Mar 7 17:43:44 2005

This message: [ Message body ]
Next message: Peter Kahn: "RE: problems installing subversion"
Previous message: Ben Collins-Sussman: "Re: Problem on Windows client- colon in file name?"
Maybe in reply to: Peter Kahn: "<help>Database is Wedged for the 6th time. Diagnostics or Prozac, anyone?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]