[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Extra integrity test for svnadmin dump/verify (was Re: svnadmin dump/verify not working as expected?)

From: John Szakmeister <john_at_szakmeister.net>
Date: 2005-11-22 16:18:36 CET

David James wrote:
> On 11/22/05, John Szakmeister <john@szakmeister.net> wrote:
>> On Tuesday 22 November 2005 04:54, Malcolm Rowe wrote:
>>> On Sun, Nov 20, 2005 at 03:10:33PM -0500, John Szakmeister wrote:
>>>> a consequence though. In this case, it means that if the length of the
>>>> svndiff was small for a file, that you can no longer dump that revision
>>>> in your repository (it would error out on both svnadmin verify and
>>>> svnadmin dump).
>>> If someone comes to svn-dev with a repository that 'works', but fails
>>> verification due to this problem, how do we repair it, given that 'dump'
>>> no longer works?
>> The same we fix other issues: edit the backend data structures. Some of us
>> have done this on the users@ list a number of times. :-)
>>
>>> I like the idea of flagging potential problems as soon as possible, but
>>> not the idea of blocking 'svnadmin dump' when we don't really need to.
>>> Is there any way we can convert this hard error into a warning? (Yes,
>>> I realise that might be difficult, given that it's code in libsvn_repos,
>>> not svnadmin itself).
>> I'm not a fan of it either, but we already have a precedent for it. Change a
>> byte in the plain text representation of a file, and you'll run into the same
>> issue. *shrug*
> Can we add a "--no-verify" flag or similar so that users can output
> broken dumps for debugging purposes?

Possibly. The problem is that if you really want everything (not just
to skip the extra check this patch adding), we'd have to change a number
of things. For instance, right now we get a stream pointer to the file
contents when doing the dump. It's actually the stream implementation
that's doing the MD5 checksum. We'd have to expose another API, or
provide a flag to the get file contents function in order to choose an
alternate stream implementation that doesn't include the checksumming.
I think there are several other similar issues too.

I'm also not sure that being able to produce a broken dump file is more
helpful than editing the backend directly. At least by editing the
backend directly you stand a chance at recovering everything, and you
get a glimpse and the real corruption. With the dump, we're just going
to dump what we think is there, and anything ancillary to that is lost.

I've actually been working on a tool to help repair/fix broken FSFS
repositories. Perhaps I should just include this extra check there, and
leave well enough alone. :-) To be honest, this is protecting against a
man-made mistake. In the wild, the FSFS corruptions that I have seen
have had broken svndiffs, or the text rep was pointing to the wrong
location, or a mix of both. I was just surprised to see that when I
truncated the file, that the MD5 was no longer being compared. I didn't
mean to open a can of worms. :-)

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 22 16:29:43 2005

This is an archived mail posted to the Subversion Dev mailing list.