On Tue, May 08, 2007 at 10:56:29PM -0700, Karl Fogel wrote:
> No, it should be read-only. So the error is in the same place each
> time?
The error will tend to move. Here’s the history as long as I can
remember it:
1. I tried to do a checkout of a branch and got a checksum error, so
I knew something was up.
2. I remove my working copy and checkout the branch again, everything
just works. (Even stranger, but whatever.)
3. I try to verify my repository and it fails on some late revision,
somewhere in the 1800’s range.
4. I try to verify it again. Now it fails on revision 903.
5. I restore r903 from backup and run verify. It runs to completion.
6. Because I’m feeling paranoid, I immediately run verify again, it
fails at r903. I run md5sum on the original file and the backup
and it has indeed changed.
7. I restore 903 again, run verify and it fails on revision 1135. It
continues to fail on 1135, until I restore it from backup.
*write yesterday’s e-mail*
8. I run verify early this morning, and it fails on revision 1853:
…
* Verified revision 1851.
* Verified revision 1852.
svnadmin: Checksum mismatch while reading representation:
expected: 735eb982e05ea6ef1b5fa413e2077cd6
actual: 5354cb5fde767b97d4c3cb30f09f325a
9. I run it again to see if I get the same error and I get:
…
* Verified revision 916.
* Verified revision 917.
* Verified revision 918.
* Verified revision 919.
* Verified revision 920.
svnadmin: Invalid diff stream: [tgt] insn 14624 starts beyond the
* target view position
10. I run it again as I send this e-mail and I get:
* Verified revision 920.
* Verified revision 921.
svnadmin: Checksum mismatch while reading representation:
expected: c1f1f2f8655bbb901b145cd0c956f443
actual: a2cb22551d495109f0ddeafe11abbebc
The one common thing about all of these revisions is that they are
relatively huge check-ins:
%ls -lah svn/db/revs/{903,1135,1853,921}
-rw-rw-r-- 1 jmp jmp 281M Apr 17 20:43 svn/db/revs/1135
-rw-rw-r-- 1 jmp svn 41M Mar 23 10:15 svn/db/revs/1853
-rw-rw-r-- 1 jmp jmp 190M Apr 17 20:40 svn/db/revs/903
-rw-r--r-- 1 jmp svn 283M Dec 3 13:32 svn/db/revs/921
-rw-rw-r-- 1 jmp svn 417M Dec 3 11:36 svn/db/revs/922
(Someone’s keeping a lot of binaries in one branch of our repo.)
I don’t know if the problem is the fact that the large revisions are
an issue or if the large revisions, since they form the vast majority
of the repository, are just the ones statistically likely to receive
an error if my errors are truly random.
> This is very, very bizarre. Can you say more about your setup,
> about the RAID, about anything that might possibly be related?
Here’s the long story.
The svn repository is on an LVM volume group on top of a Linux RAID
software group. It has been running in on one machine for almost
three years. On Friday, the machine started failing by randomly
rebooting, so I removed the OS drive + 5 RAID disks and move them all
to a new machine, a Dell Pentium 4 I had sitting in a corner of my
equipment area just for this purpose.
The new machine, on which I’m having all of the problems, is a Pentium
4/1.8GHz with 512 of RAM. The OS is Debian Etch-ish (it runs testing,
but hasn’t been upgraded to Etch fully, the last apt update was about
a week before Etch came out). uname -a says:
Linux library 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux
The kernel hasn’t reported any read errors on the RAID disks and the
partitions on the RAID usually fsck just fine.
In the past we’ve had problems with checksum errors, but those were
due to Apache check-ins gone wrong, mainly because someone was
uploading those same large binaries over a cable modem and Apache
didn’t handle the timeout graciously.
—Justin
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed May 9 13:02:02 2007