[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Linux Kernel Summit

From: Peter Vogel <pvogel_at_arsin.com>
Date: 2001-04-03 22:55:43 CEST

I agree with Jason. I too have managed large, long-lived
repositories and have had to fight file corruption issues
found long after the corruption occurred. My solution
at the time was a nightly run of "cvs log" to detect
corruption as soon as it occurred, but it was never
completely satisfactory to me... CVS was notorious
for corrupting files when the disk filled up due to
activities elsewhere (i.e. a process run amock on
the same NetApp volume as the CVS tree). I'd love to
see SVN protect against network and disk corruption.

That said, I'd be willing to settle for those features
in 2.0 :-) One of these days I'll catch up on sleep
enough to contribute something other than wisdom from
the school of hard knocks, but not this week :-(


Peter A. Vogel
Manager, Configuration Management
Arsin Corporation
4800 Great America Parkway Suite 425, Santa Clara, CA 95054
> -----Original Message-----
> From: Jason Molenda [mailto:jason-svn@molenda.com]
> Sent: Tuesday, April 03, 2001 1:47 PM
> To: jimb@cygnus.com
> Cc: dev@subversion.tigris.org
> Subject: Re: Linux Kernel Summit
> Jim Blandy wrote:
> > I don't think Subversion should be involved in:
> >     1) checking that bits were transmitted across the 
> network without
> >        corruption, or
> >     2) checking that bits were stored on disk without corruption.
> I've worked with three large (multi-gigabyte), long-lived (5-10
> years of history) cvs repositories, and each of them had file
> corruption problems.  Some problems were due to OS/filesystem
> problems (most notably NFS).  Some problems seem to have been RCS
> bugs (file truncation - probably a disk filled and RCS didn't do
> the right thing).  Some seem to have been cvs bugs (I'm not even
> sure how those files got in that state).  Every large repository
> I've worked with has accumulated a few dozen of these.
> It doesn't do me any good if I have backups -- in every case, I didn't
> find the corruption until many months after the fact.  The corrupted
> revisions were often some of the oldest, so the only time we'd find it
> is if a user ran a command like "cvs log" (which digs through all the
> revisions) on the corrupt file.
> My desire for a checksum is obvious:  I want to (a) know if a file
> is corrupt, and (b) I want to find the corruption reasonably close
> to the time when it happens.  Backups, RAIDs, etc., are all nice,
> but they don't do me any good if a repository is silently corrupted
> over time.  With CVS, I'm reduced to stupid things like running rcs
> over every file in my repository to check file integrity.
> Assuming that the OS will never corrupt a file is not good enough;
> some OS somewhere at some time is going to corrupt one.  And if
> subversion doesn't have a way to detect that corruption, people
> are going to blame it on svn.  Or the problem could happen at a
> higher level - maybe the svn filesystem or database could corrupt
> part of a file's revisions.  As you offer plug-in-able databases,
> you'll be opening yourself up to even more points of failure.
> The sine qua non of a revision control system is that your sources
> are never corrupted.  If there is disagreement about the utility
> of these sorts of integrity checks, make them an optional setting.
> I wager that many administrators will enable these checks even if
> they incur a non-trivial performance penalty in the process.  CPU
> is cheap; my time doing detective work trying to track down corrupt
> revision history is expensive.
> (Of course you don't see me implementing anything, so I expect my
> opinion to be given the appropriate weighting.  But I thought I'd
> throw in my two cents)
> Jason
Received on Sat Oct 21 14:36:27 2006

This is an archived mail posted to the Subversion Dev mailing list.