[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fsverify.py unable to fix invalid svndiff header

From: Daniel Shahaf <danielsh_at_apache.org>
Date: Sat, 14 May 2011 21:10:29 +0300

On Sat, 14 May 2011 19:54 +0200, "Steinar Bang" <sb_at_dod.no> wrote:
> >>>>> Stefan Sperling <stsp_at_elego.de>:
>
> > The script probably took a wrong guess.
>
> > Hopefully this is the known corruption problem with a duplicate block of
> > data in the revision file.
>
> > Can you check if the original revision file (i.e. not modified by
> > fsfsverify.py) somewhere contains a data block which contains data
> > that matches the data around byte offset 1916?
>
> "offset 1916", is that "byte number 1916 in the 683 ref file"?
> Is that 1916 decimal, or hexadecimal? I'm assuming decimal for now.
>

Yes.

> > Usually the spot where the corruption appears (offset 1916 in your case)
> > contains an incomplete representation, but the representation data in the
> > duplicated block is good.
>
> > One of way of locating the duplicate block is to open the file in a
> > hex editor and search the entire file for hex strings that occur
> > around or after 1916.
>
> Ok, opening the file in emacs hexl mode:
> `M-x hexl-find-file /tmp/svnrepo/svn/db/revs/0/683 RET'
>
> > Try to locate boundaries of representations, which look as follows:
>
> > https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure
> > A representation begins with a line containing either "PLAIN\n" or
> > "DELTA\n" or "DELTA <rev> <offset> <length>\n", where <rev>, <offset>,
> > and <length> give the location of the delta base of the representation
> > and the amount of data it contains (not counting the header or
> > trailer). If no base location is given for a delta, the base is the
> > empty stream. After the initial line comes raw svndiff data, followed
> > by a cosmetic trailer "ENDREP\n".
>
> `M-j 1916 RET', takes me here:
> 00000770: 4e44 5245 500a 4445 4c54 410a 5356 4e01 NDREP.DELTA.SVN.
> 00000780: 0000 8c0c 823d 8431 823b 8140 4c00 8844 .....=.1.;._at_L..D
> 00000790: 4ca8 404e 0048 8100 bf45 820d 9945 820d L._at_N.H...E...E..
> ...
>
> The cursor is positioned on the "5" starting "5356" in the first line,
> and on the "S" of "SVN".
>
> Does that make sense?
>
> So I should search for "ENDREP.DELTA.SVN"? The 683 revfile contains 15
> istances of that string, but I have no idea of which ones are relevant
> or not.
>

No, you should be looking for the sequence of bytes starting at offset 1916. So, the bytes are:

53564e0100008c0c823d

(for example, 'xxd -s 1916 -l 10 -ps' will tell you that)

And please don't try searching for the text representation on the right hand side of the hex view! (You've been here long enough that I assume you know that, but you said 'search for "ENDREP.DELTA.SVN"', which is incorrect in so many ways...)

> > So if you find a duplicated block of data you should be able to fix this
> > problem by copying representation data from the duplicate block to the
> > corrupted location.
>
> So what I'm looking for isn't exactly "ENDREP.DELTA.SVN", but what
> follows this text...?
>
> I tried searching for "x^.Rmo.@", but the one at the cursor is the only
> occurrence in the file. At least the only aligned so that search will
> find it. Doesn't look like hexl-mode has the possibility to search for
> a byte sequence. Maybe I should get myself a proper hex editor?
>
> > DO NOT change any byte offsets in the file while doing this. If you
> > cannot squeeze the data in because it would overlap with subsequent
> > data you're out of luck but I've never seen this happen. Usually
> > there is enough room to fit the data, but you might have to add
> > padding. Any dummy byte will do, I usually use 0x42.
>
> The meaning of life, the universe and everything? I thought that was 42
> decimal...? :-)
>
> > Another possibility is that offset 2247 is wrong. In this case the
> > expected svndiff data is probably located elsewhere and the offset
> > in the representation header should be adjusted.
>
> Right... that's the first error that fsfsverify.py tries to fix?
>
> `M-j 2247 RET' takes the cursor over the "7" in "5878" in the first
> line:
> 000008c0: 8550 8b57 8585 5978 5e1d 526d 6fda 400c .P.W..Yx^.Rmo.@.
> 000008d0: fe1c ff0a 17f8 00d5 12fa a27d 4145 2a90 ...........}AE*.
> 000008e0: 8466 0da4 2361 6353 2414 ee0e b871 89a3 .f..#acS$....q..
>
> That wasn't as easily recognizable, as the 1916 one, though. Not as
> recognizable as a boundary at least.
>
> What are the things I should look for a duplicate of? The bytes
> following the troublesome position? And how many?
>
> > This is of course not an easy task and it is unfortunate that people
> > keep running into this problem. The source of the problem is not yet
> > known :( If you have any further questions just ask. If you cannot
> > get it fixed at all but can share the revision file privately I will
> > have a go at it.
>
> I think I need help with this one. I'll send you the revision file
> privately.
>
> Thanks!
>
>
> - Steinar
>
>

Good luck,

Daniel
Received on 2011-05-14 20:10:57 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.