[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

AW: How to check integrity of database?

From: Markus Karg <markus.karg_at_quipsy.de>
Date: 2005-10-07 16:25:38 CEST

also to you (I just wrote to Daniel), it was unclear whether the team was concerned about corruption or not. Some comments *sounded* like laughter. Maybe I misinterpreted them. Sorry.
For your tip of backups: Sure I'm doing daily backups. But I need to know when a repository is broken. And properties *belong* to the repository. That's why this whole discussion started. What tool tells me that any part of the repository (not only the data files) is corrupt, if not svnadmin verify? It's not acceptable to wait for a user to tell the admin 'hey, I *think* the repository is broken' -- there must be some way of nightly check for that.
Thanks for SVN, it's great.
I just want to make it greater.
Mit freundlichem Gruss / With kind regards
Markus KARG, Staatl. gepr. Inf.
Entwicklung / R & D


Von: Matt England [mailto:mengland@mengland.net]
Gesendet: Fr 07.10.2005 16:22
An: Markus Karg
Cc: kfogel@collab.net; Leon Zandman; Daniel Berlin; users@subversion.tigris.org
Betreff: Re: How to check integrity of database?

(I took out the multiple "AW:" and "Re:"'s in the subject line...I hope that doesn't screw up the threading-trackers...)

At 10/7/2005 02:27 AM, Markus Karg wrote:

        This whole discussion proofs that I am not the only one that is concerned about corrupted repositories (including properties), so I don't understand that the SVN development team is more or less laughing at this task. :-((

For what it's worth, I don't see the Subversion development team (and I'm not part of this team) doing this. I see them taking data-protection seriously.

And to be clear, what I'm reading is that all file-revision content *is* being checksummed, and thus the content of any files your store in Subversion has an extra level of data integrity. This goes a long way in my book to protecting my the digital assets of my business, and for what it's worth, I'm rather picky about data integrity.

Alas, as some have mentioned prior to this note, I believe it's not a good idea to rely only on Subversion checksums is not a good idea. Namely, I highly recommend multiple backups of any filesystem/database over time, and saving these backup sets over a period of time in case an administrator has to revert back to them for any problem--data integrity or something else.

To address another note in the thread:

At 10/7/2005 01:41 AM, Fabian Cenedese wrote:

        These information can be set any time and countless times to new values
        with svn admin, be it a user or a virus. As they are unversioned you
        won't ever find a trace that they changed. Anyone can store any data
        if he has access to the machine. As the checksum would need to
        get updated on every change you will never find an error.

Do any of these points lessen the need to checksum the revision properties?

         The only
        thing you could detect with that checksum is a hardware error.
        And if there's something wrong with the disc it would surely also
        affect the real data files.

As someone who has worked more then 10 years developing, integrating, and servicing (not to mention selling and marketing) enterprise-level storage systems (hardware, software, storage-networking), I find this analysis flawed. Hardware failures come in numerious flavors, and they often are firmrware-software oriented (eg, RAID-controller firmware gets its memory map screwed up, starts inserting random byte changes in only a *few* places--I personally isolated this problem that has arguably the most-distributed SAN RAID controller family in the world).

Further, I can see software/application-based systems messing up the rev props, but maybe that point is debatable.

Here's what I recommend this community (both as consumers and developers) do moving forward on this topic:

1) Accept that that MD5s on the file-revision content are all that it's doing right now. This protection addresses the content of all files stored in Subversion. I speculate that most would argue that this is the most important part of the Subversion-integrity-self-check process and that rev-prop info is much less critical.

2) Put on the todo list to checksum the rev props (and any other data/meta-data content) in the future just to settle this issue for all, and lend extra value to the point that everything is checksummed. It does have value (albeit not as much value as protecting the file-revision content), it seems like it may not be hard to do given there's a general MD5 mechanisms/process already in place, and it would address all of the naysayers who take issue with no protection for the rev props.

Does this make sense?

Received on Fri Oct 7 16:32:13 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.