[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Error: Found malformed header in revision file

From: NextGen$ <nextgens_at_emu.freenetproject.org>
Date: 2007-02-28 13:53:01 CET

* Malcolm Rowe <malcolm-svn-dev@farside.org.uk> [2007-02-28 06:03:49]:

> Hi nextgens,
>
> Couple of questions for you below.
>
> On Sat, Feb 24, 2007 at 10:31:25PM +0000, NextGen$ wrote:
> > Today I have experienced a weird problem with subversion... I googled
> > a bit for it and found
> > http://svn.haxx.se/users/archive-2004-12/0453.shtml ... So I joined
> > the irc channel where I have been told to mail this list.
> >
> > Here is what has occured : On the client side I have tried to commit
> > svnadmin verify /path/to/repo says:
> > [...]
> > * Verified revision 11911.
> > svnadmin: Found malformed header in revision file
> >
>
> So I took a look at this problem on IRC right after nextgens posted this
> message.
>
> Interestingly, it's not the 'normal' FSFS corruption problem
> ('underlapping' repeated writes while writing a delta rep), though it
> does involve data duplication. The problem is at the first noderev:
>
> DELTA...
> ENDREP
> DELTA...
> ENDREP
> id: 7x2.0.r11912/988
> type: file
> pred: 7x2.0.r11909/178
> count: 55
> text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
> cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
> copyroot: 0 /
>
> id: 7x2.0.r11912/988
> type: file
> pred: 7x2.0.r11909/178
> count: 55
> text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
> cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
> copyroot: 0 /
>
> PLAIN...
>
> As you can see, the noderev's been duplicated. The effect of this is to
> break all the subsequently recorded offsets for the following noderevs
> and the trailer line. Removing one of those duplicates with dd fixed
> the problem.
>
> So here's the question - how on earth did that happen? We only ever
> write these out to the revision file once, in fs_fs.c:write_final_rev(),
> and we get the offset to use just before we write each one.
>
> Okay, so one explanation of why the file offset is wrong might be
> because, for buffered files, APR doesn't call lseek() to get the current
> file offset, because it can't. It uses it's own concept of the
> location, so if that gets out of sync, we'd see what we have here.
>
> How could that have happened? I'm not entirely sure. The sequence of
> events here is:
>
> 1. We get the offset for the noderev, which has the side effect of
> flushing the APR file buffer to disk. We know this offset is
> correct because the offset for the noderev is correct.
> 2. We write out the noderev using a series of apr_file_write()'s.
> This puts all the data into the APR file buffer (it's smaller than 4K,
> the APR buffer size).
> 3. We get the offset for the directory rep (PLAIN), which flushes the
> noderev to disk. We know this offset is incorrect because of the
> offset reported in the text part of the directory noderev (not shown).
> [Hmm, actually I guess we don't know that for sure, but I'm going
> with that theory for the moment.]
>
> I'm pretty sure we're not trying to write out the noderev twice (i.e.
> calling write_final_rev() twice for the same file entry), since we
> unconditionally update the offset in the node id, and so the two
> noderevs above would have different offsets.
>
> So I'm thinking that somewhere in #3, the APR file buffer is written
> twice. I can think of only a few ways to achieve that effect, and none
> of them seem particularly likely:
>
> a. There's a journalling filesystem in use, and it's decided to replay
> the write of the noderev. I don't think this is how journalling
> filesystems actually work, but I might be wrong.
> [nextgens: What OS and filesystem are you using for the repository?]
>

I am using ext3 on GNU/Linux 2.6.18.2-grsec... I have attached to the
mail results from "tune2fs -l".

> b. There's a bug in APR's apr_file_flush() that's causing the same
> buffer to be flushed to disk more than once.
>
> c. There's a bug in APR's apr_file_seek()/setptr() that's causing the same
> buffer to be flushed to disk more than once.
> [nextgens: Can you tell us what version of APR you're using? (as
> precisely as you can)]

I get "1.2.7" using "apr-config --version". I am using the
debian/testing package version 1.2.7-8.2.

>
> d. Something Else.
>
> For b., I can't really see where a bug would be, but I wouldn't mind
> someone else taking a look once nextgens confirms which version of APR to
> look at. I assume that nowadays we can rely on write(2) not returning
> EINTR after a partial write - anyone know if this isn't the case?
>
> For c., I again can't see anything obvious, but the implementation is
> non-trivial. Now I know that APR 0.9.7 fixed a problem whereby errors
> encountered while flushing during seek weren't reported, but that's why
> we raised the minimum APR requirement to 0.9.7... and in any case, I
> can't see how that could have this effect.
>
> Regards,
> Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Feb 28 13:53:26 2007

This is an archived mail posted to the Subversion Dev mailing list.