[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Linux Kernel Summit

From: Greg Stein <gstein_at_lyra.org>
Date: 2001-04-03 23:49:09 CEST

Guys... we already said that we're going to be adding MD5 hashes and having
a "scan/verify" utility. In 1.0.

We definitely appreciate the input and insight, but (at this point) I
believe you're preaching to the choir :-)

Cheers,
-g

On Tue, Apr 03, 2001 at 01:55:43PM -0700, Peter Vogel wrote:
> I agree with Jason. I too have managed large, long-lived
> repositories and have had to fight file corruption issues
> found long after the corruption occurred. My solution
> at the time was a nightly run of "cvs log" to detect
> corruption as soon as it occurred, but it was never
> completely satisfactory to me... CVS was notorious
> for corrupting files when the disk filled up due to
> activities elsewhere (i.e. a process run amock on
> the same NetApp volume as the CVS tree). I'd love to
> see SVN protect against network and disk corruption.
>
> That said, I'd be willing to settle for those features
> in 2.0 :-) One of these days I'll catch up on sleep
> enough to contribute something other than wisdom from
> the school of hard knocks, but not this week :-(
>
> -Peter
> --
> Peter A. Vogel
> Manager, Configuration Management
> Arsin Corporation
> 4800 Great America Parkway Suite 425, Santa Clara, CA 95054
>
>
>
> > -----Original Message-----
> > From: Jason Molenda [mailto:jason-svn@molenda.com]
> > Sent: Tuesday, April 03, 2001 1:47 PM
> > To: jimb@cygnus.com
> > Cc: dev@subversion.tigris.org
> > Subject: Re: Linux Kernel Summit
> >
> >
> > Jim Blandy wrote:
> >
> > > I don't think Subversion should be involved in:
> > > 1) checking that bits were transmitted across the
> > network without
> > > corruption, or
> > > 2) checking that bits were stored on disk without corruption.
> >
> >
> > I've worked with three large (multi-gigabyte), long-lived (5-10
> > years of history) cvs repositories, and each of them had file
> > corruption problems. Some problems were due to OS/filesystem
> > problems (most notably NFS). Some problems seem to have been RCS
> > bugs (file truncation - probably a disk filled and RCS didn't do
> > the right thing). Some seem to have been cvs bugs (I'm not even
> > sure how those files got in that state). Every large repository
> > I've worked with has accumulated a few dozen of these.
> >
> > It doesn't do me any good if I have backups -- in every case, I didn't
> > find the corruption until many months after the fact. The corrupted
> > revisions were often some of the oldest, so the only time we'd find it
> > is if a user ran a command like "cvs log" (which digs through all the
> > revisions) on the corrupt file.
> >
> > My desire for a checksum is obvious: I want to (a) know if a file
> > is corrupt, and (b) I want to find the corruption reasonably close
> > to the time when it happens. Backups, RAIDs, etc., are all nice,
> > but they don't do me any good if a repository is silently corrupted
> > over time. With CVS, I'm reduced to stupid things like running rcs
> > over every file in my repository to check file integrity.
> >
> > Assuming that the OS will never corrupt a file is not good enough;
> > some OS somewhere at some time is going to corrupt one. And if
> > subversion doesn't have a way to detect that corruption, people
> > are going to blame it on svn. Or the problem could happen at a
> > higher level - maybe the svn filesystem or database could corrupt
> > part of a file's revisions. As you offer plug-in-able databases,
> > you'll be opening yourself up to even more points of failure.
> >
> > The sine qua non of a revision control system is that your sources
> > are never corrupted. If there is disagreement about the utility
> > of these sorts of integrity checks, make them an optional setting.
> > I wager that many administrators will enable these checks even if
> > they incur a non-trivial performance penalty in the process. CPU
> > is cheap; my time doing detective work trying to track down corrupt
> > revision history is expensive.
> >
> > (Of course you don't see me implementing anything, so I expect my
> > opinion to be given the appropriate weighting. But I thought I'd
> > throw in my two cents)
> >
> > Jason
> >

-- 
Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:27 2006

This is an archived mail posted to the Subversion Dev mailing list.