repository corruption, many times

From: Rick Lee-Morlang <rick.lee-morlang_at_justleapin.com>
Date: 2007-12-19 21:52:48 CET

Hi all,

(This is a repost, because we just had the same problem, *again*. I've
already tried to post this, but it never made it through the moderation
queue. Sorry if it ends up being duplicated.)

I'm stumped and frustrated.

I'll admit from the outset that we're using Subversion in a non-optimal way.
In part of our repo, we have a group of artists committing binary files,
some of which get rather large. We know this won't scale in the long term,
but there's a difference between SVN being intolerably slow (which it's not)
and corrupting data (which it is).

Our initial setup was on a Fedora 7 machine running svnserve 1.4.3 with an
FSFS backend, integrated with Trac via the standard pre and post-commit hook
scripts. We experienced a few instances of corruption with this setup,
always during large binary commits, and almost always from the same
particular workstation. However in this handful of instances we were always
able to identify the pin the blame on either running out of diskspace or
having mod_svn and svnserve running against the repository at the same time.

To resolve the corruption we restored from a known good backup and then did
a svnadmin dump | svnadmin load for the revs between the known good and the
point of corruption. At that point we committed a change against a
placeholder file so the rev numbers still matched up with Trac, and went on
about our business.

Since then, we've moved SVN to a Centos 5 box running svnserve 1.4.2 where
diskspace is not an issue anymore. All seemed fine for a month and a half,
and then in the last week we've had three corruption instances, two
occurring today. The second happened within 15 minutes of bringing the repo
back up after recovery.

This time I noticed two new details. I feel like I'm grasping at straws,
but:

1) All three of the recent, unexplained corruptions were:
* large-ish commits mostly consisting of binaries (i.e. at least 10s of MB)

* that were happening within a very close timespan (likely 5 - 15 seconds)
of another commit (some cases small, one case also large)

This makes me wonder about how svnserve is handling file locking, which led
me to:

2) For some reason we're running svnserve with --threads.

After this I tried to reproduce the problem by creating a new repo and
writing a script that repeated wrote 10MB of random data to a file and
committed it while another script wrote five tiny files of random data and
conmitted those. No corruption.

The only thing I can think of at this point, aside from an outright bug, is
that when the trac pre/post commit scripts call back to check the repository
they're causing corruption -- but afaik they should be read-only ops and
this shouldn't cause any harm.

Any ideas? Should I bump this on to the dev list?

Meanwhile, I'm off to recover the repo. Again.

thanks,
Rick
Received on Wed Dec 19 21:53:16 2007

This message: [ Message body ]
Next message: Frank Kim: "merging without a working directory"
Previous message: Frank Kim: "Re: svn copy not working but svn co does"
Next in thread: Jeremy Whitlock: "Re: repository corruption, many times"
Reply: Jeremy Whitlock: "Re: repository corruption, many times"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]