RE: Almost repetitive repository corruption

From: Igor <i3v_at_mail.ru>
Date: Mon, 30 Dec 2013 03:48:21 +0400

> -----Original Message-----
> From: Stefan Kueng [mailto:tortoisesvn_at_gmail.com]
> Sent: 27 December, 2013 22:18
> To: users
> Cc: i3v_at_mail.ru
> Subject: Re: Almost repetitive repository corruption
>
> On 26.12.2013 23:28, Igor wrote:
> > Hi all,
> >
> > I’ve just ran into a weird bug which damaged my svn repository. I
> > still don’t understand what exactly was wrong, so, I don’t know how to
> > describe it in a clear and simple manner, sorry… I’ll just try to
> > describe all the symptoms I’ve experienced. I’ll use real file names,
> > since I wasn’t able to reproduce this bug on synthetic test repository.
> >
> > *SETUP*
> > Most simple single-user, single-PC setup. Local repository.
> > First svn version: “Subversion command-line client, version 1.8.5.”.
> > Windows 7 x64
> > Antivirus: Kaspersky Endpoint Security 10
> >
> > *THE STORY*
> > The story began, when I ran into some sort of error message, while
> > trying to commit r3349.
> > After a bit of struggling, I’ve realized, that my repository got
> > broken after previous commit (r3348). Nasty thing is that previous
> > commit finished without any error message.
> >
> > *SYMPTOMS*
> > **svn verify**
> > Output ends like this:
> > <….>
> > * Verified revision 3346.
> > * Verified revision 3347.
> > svnadmin: E160004:
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > svnadmin: E160004: Found malformed header '' in revision file
> >
> > **svn checkout**
> > When I try to checkout a new working copy, I receive similar
> > message:
> > <…>
> > W:\testCO\Binar\Matlab\deploy
> > W:\testCO\Binar\Matlab\deploy\x64
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64.prj
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64\distrib
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > Found malformed header '' in revision file
> >
> > **svn Repository Browser**
> > When I navigate to
> > file:///V:/R_Matlab/Binar/trunk/Binar/Matlab/deploy/x64/Binar_x64
> > in tortoise svn repository browser, I see the same error message:
> >
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > Found malformed header '' in revision file
> >
> > Here’s a screenshot: http://sdrv.ms/1fJVuwa
> >
> > *ZEROS IN DATA FILE*
> > Luckily, I have a full backup (r3337). I’ve manually repeated all my
> > commits up to r3347 and verified that at this state repository is OK.
> >
> > Next, I’ve tried to reproduce the bug:
> >
> > 1. Firstly (“try1”), I’ve repeated same Matlab commit script
> > (Matlab simply calls svn, just like from cmd). And… «success»
> > - same bug again!
> >
> > 2. Secondly (“try3”), I’ve managed to reproduce the bug using
> > only windows cmd commands.
> >
> > 3. Thirdly (“try4” and “try5(0)”), I wrote a bat-script to
> > reproduce the same actions.
> >
> > I’ve compared
> > R_Matlab\db\revs\3\3348
> > file for different “tries”: (initial bug is designated as “try0”) and
> > discovered a single interesting thing:
> > each “3348” file has a long sequence of zero-bytes:
> >
> > • try0: 0x2201B0A to 0x2201FFF
> >
> > • try1: 0x2201000 to 0x2201FFF
> > o try0_vs_try1_p1: http://sdrv.ms/Ju7nev
> > o try0_vs_try1_p2: http://sdrv.ms/Ju7tmu
> > o try0_vs_try1_p3: http://sdrv.ms/Ju7AOI
> >
> > • try3: 0x2201B11 to 0x2201FFF
> > o try0_vs_try3_p1: http://sdrv.ms/Ju7G9g
> > o try0_vs_try3_p2: http://sdrv.ms/Ju7HKd
> >
> > • try4: 0x2201000 to 0x2201FFF
> > o try0_vs_try4_p1: http://sdrv.ms/Ju7OFE
> > o try0_vs_try4_p2: http://sdrv.ms/Ju86MJ
> > o try0_vs_try4_p3: http://sdrv.ms/Ju89ID
> >
> > • try5(0): 0x2201000 to 0x2201FFF (just like try4).
> > o try0_vs_try5(0)_p1: http://sdrv.ms/1daKwjG
> > o try0_vs_try5(0)_p2: http://sdrv.ms/1daKxUx
> > o try0_vs_try5(0)_p3: http://sdrv.ms/Ju8iM5
> >
> >
> > Moreover, try4 and try5 have only one single difference, two zero-
> > bytes, starting from 0x21F9FFE (in case of “try5(0)”):
> > http://sdrv.ms/19jmBdm
> >
> > *BUG DISAPPERED*
> > That’s all I have. 5 broken repositories. After that bug DISAPPEARED.
> > Just like a UFO :) . I’ve launched the SAME script, with the SAME
> > input data 10 more times (“try5(1)”,”try5(2)”…) – nothing – svn
> > correctly commits r3348, resulting repository is valid:
> >
> > • svn verify is OK
> >
> > • I’m able to see contents of
> > “R_Matlab/Binar/trunk/Binar/Matlab/deploy/x64/Binar_x64”
> > in tortoise svn repository browser
> >
> > • svn checkout is OK.
> >
> > When I compare “revs\3348” for “try4” vs “try5(1)” the ONLY difference
> > is those long sequence of zero-bytes mentioned before:
> >
> > • try4_vs_try5(1)_p1: http://sdrv.ms/1edmEdV
> >
> > • try4_vs_try5(1)_p2: http://sdrv.ms/Ju8YkC
> >
> > *REPRODUCTION SCRIPT*
> > The bat script, that resulted in error is quite straightforward. It
> > simply copies several files. It might be not a good idea to copy
> > modified file without committing it first, but still it should not
> > result in error… The bat file (used in try4) is here:
> > http://sdrv.ms/19ld4FN Another thing to mention is that size of files
> > in 3348 commit is about
> > 250 Mbytes….
> > To my shame, my repository is both large (~30GB) and containing
> > confidential data, so, I’m unable to share it :( .
> >
> > All files mentioned above are in this folder: http://sdrv.ms/1jMN250
> >
> > *LOKING FOR SIMILAR CASES*
> > Mainly, I’ve just googled “svn: Corrupt node-revision”. It looks like
> > this error message is quite common, but no one tried to understand
> > it’s source. Though, there’s a “what was that?” question in [1](see
> > link below).
> > Moreover, it looks like no one experienced “repetitive” behavior… In
> > some cases, issue was resolved by restoring revision files from
> > backup[1], or using svn dump/load [3,4]. In one report [2],
> > julian.foad <at> wandisco.com was using John Szakmeister's
> > 'fsfsverify.py' to analyze corruption. Though, it looks like in his
> > case, corruption type was quite different. In one post [4], VinnyJames
> > said: “we've seen this happen during heavy load”.
> >
> > 1. http://www.wandisco.com/svnforum/threads/38519-Commit-errors-Revision-files-corrupted
> >
> > 2. http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123110
> >
> > 3. http://stackoverflow.com/questions/5543285/how-do-i-fix-a-repository-with-one-broken-revision
> >
> >
> > 4. http://dev-notes-to-self.blogspot.com/2009/01/fixing-corrupt-subversion-repository.html?showComment=1280529811361#c6899551059356251422
> >
> > *QUESTIONS*
> > So….
> > 1. What was that? Any ideas? May it happen again?
> > 2. Any other interesting diagnostic info I can get from these
> > repositories?
> > 3. Should I re-post this to subversion mailing list also? Or is it,
> > most probably, dependent on tortoise somehow?
> > Say, due to some caching?
>
> First: you really did your research first before coming here to the mailing list.
> So thank you! That doesn't happen very often.
>
> Now to your questions:
>
> May it happen again? I hope not, but I'm guessing there's a good chance that
> it will happen again. A few things you might consider:
> * you're repository is located at file:///V:/R_Matlab/...
> Since V: usually indicates a network share or some other external
> storage, have a good look at this FAQ:
> http://tortoisesvn.net/faq.html#repoonshare
> * you're also using Kaspersky Endpoint Security. I won't repeat my
> opinion about those kinds of "tools" here, but you should add
> exceptions for all your working copy paths, your repository path
> and of course for all TSVN and svn processes.
> * the fact that the revision in the repository contains null byte
> sequences instead of the correct data indicates that the data either
> got 'sanitized' by a security tool (see last point) or that the data
> didn't get written at all but on the next (successful) write those
> missing bytes got filled with zeros. But that would mean that your
> harddrive is at the end of its lifetime and should get replaced
> as soon as possible.
> * check the harddrive for errors. Most harddrives have a health status
> that can be read by various tools, some even log certain problems.
> At least (if your harddrive does not provide such detailed info), run
> the checkdisk tool that comes with Windows
>
> But, for more help you should post this to the Subversion users mailing
> list: it's not really an issue with TSVN since you could also reproduce
> everything with the svn command line tool.
>
> Stefan
>
> --
> ___
> oo // \\ "De Chelonian Mobile"
> (_,\/ \_/ \ TortoiseSVN
> \ \_/_\_/> The coolest interface to (Sub)version control
> /_/ \_\ http://tortoisesvn.net
===========================================================
[Igor Varfolomeev]

> * you're repository is located at file:///V:/R_Matlab/...
> Since V: usually indicates a network share or some other external
> storage, have a good look at this FAQ:
> http://tortoisesvn.net/faq.html#repoonshare

Actually, "V" drive is just a local folder ("G:\blah-blah-blah"),mounted
with "Visual Subst" tool in this case....
( http://www.ntwind.com/software/utilities/visual-subst.html ).
AFAIK, it's equal to windows cmd "subst" command. So, nope, it's not
related to network in any way...

> * you're also using Kaspersky Endpoint Security. I won't repeat my
> opinion about those kinds of "tools" here, but you should add
> exceptions for all your working copy paths, your repository path
> and of course for all TSVN and svn processes.

Well, there's always some chance of interference with Anti-Virus, yep...
Even though I've set it to warn me about everything it blocks (which,
somehow, is no the default setting), I'm not perfectly sure it is not the
source of the issue.. But the bug disappeared before I came to the
point when I was going to check this possibility...
Neither my repository, nor working copy were in "trusted" zone
of KEPS. I guess you're right and I should add them to "trusted".

> * the fact that the revision in the repository contains null byte
> sequences instead of the correct data indicates that the data either
> got 'sanitized' by a security tool (see last point) or ....

I've never experienced such behavior from KEPS, if it's intentionally
blocking something.:
1) There should be a popup
2) It won't allow anything "after 5th try"...
3) There should be corresponding records in "Reports"
3) There's an option to "Disinfect" (even to "auto-disinfect"), but
I've disabled it, it never was allowed...(It's junk, actually :) )

I agree that there might be some interference, but I doubt there was an
intentional blocking or "disinfecting" or something else...

> ... or that the data
> didn't get written at all but on the next (successful) write those
> missing bytes got filled with zeros. But that would mean that your
> harddrive is at the end of its lifetime and should get replaced
> as soon as possible.

I doubt it was some sort of hardware error, because:
1) I've tried it on two different physical discs....
2) Same file, nearly the same byte offset... Doesn't look like hardware
error...
3) S.M.A.R.T. is OK for all disks:

http://www.hddstatus.com/hdrepshowreport.php?ReportCode=7693018&ReportVerification=894022CE

http://www.hddstatus.com/hdrepshowreport.php?ReportCode=7693017&ReportVerification=8CCE2E60

http://www.hddstatus.com/hdrepshowreport.php?ReportCode=7693016&ReportVerification=DEA41523

> * check the harddrive for errors. Most harddrives have a health status
> that can be read by various tools, some even log certain problems.
> At least (if your harddrive does not provide such detailed info), run
> the checkdisk tool that comes with Windows

In addition to SMART, I'll ran a chkdsk tonight, and report back if
there would be something interesting.

> But, for more help you should post this to the Subversion users mailing
> list: it's not really an issue with TSVN since you could also reproduce
> everything with the svn command line tool.

I've just re-posted my initial message there:
http://thread.gmane.org/gmane.comp.version-control.subversion.user/115734

Though, to be honest, I don't expect to identify the issue source. I was running
this setup for 3 years, and this is the first time something like this occur.... Maybe
it's just a really rare issue...Maybe even not in svn.

*MY INITIAL GUESSES*
My initial guess was that there might be something wrong with some sort of
parallelism. A missing semaphore or something. Some asynchronous call,
that is fast enough a-l-m-o-s-t each time... Of course, it looks like svn doesn't
have massive parallelism :) , but when I compare "equal" checkouts logs, the
order is different (why?), so, I guess there is "something". Another thing is that
this commit had large files... And this is where those a-l-m-o-s-t might break.

Second thing that might "heal itself" (and disappear) is some sort of caching.
Though I know nothing either about cashing or about parallelism in svn...

------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=4061&dsMessageId=3070904

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2013-12-30 00:48:34 CET

This message: [ Message body ]
Previous message: Stefan Küng: "Re: Almost repetitive repository corruption"
In reply to: Stefan Küng: "Re: Almost repetitive repository corruption"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]