[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs_fs core dumps in checksum code

From: Blair Zajac <blair_at_orcaware.com>
Date: Thu, 12 Apr 2012 11:54:50 -0700

On 08/04/2010 09:32 AM, Blair Zajac wrote:
> On 08/04/2010 05:38 AM, Julian Foad wrote:
>> Again due to in-lining, I presume, fs_fs.c:2859 calls svn_checksum_dup()
>> but the debugger shows, in this case, only its subroutine
>> svn_checksum__from_digest().
>>
>>> #0 0x00002ac15f8bd28c in svn_checksum__from_digest (
>>> digest=0x646c6f663a706e76<Address 0x646c6f663a706e76 out of bounds>,
>>> kind=1949987428, result_pool=<value optimized out>)
>>> at subversion/libsvn_subr/checksum.c:77
>>
>> Here, digest is ASCII "dmof:pnv" and kind is hex 743a7264 is ASCII
>> "t:rd"...
>>
>>> 77 memcpy((unsigned char *)checksum->digest, digest, DIGESTSIZE(kind));
>>> #1 0x00002ac15f057f87 in read_representation (contents_p=0x2aaab92d7c50,
>>> fs=0x2aab80d53590, rep=0x2aabbd57e748, pool=0x2aab1c15a0e8)
>>> at subversion/libsvn_fs_fs/fs_fs.c:2859
>>
>> ... meaning that 'rep->md5_checksum' points to readable memory but not
>> to a valid checksum object.
>>
>> Thus, in both cases, it looks like that pointer was either uninitialized
>> or subsequently corrupted.
>>
>> I searched in 1.6.5 code for creation of a representation_t structure
>> and found nowhere that leaves it uninitialized. At least one place
>> allocates the structure with apr_pcalloc (zero-initialized) and doesn't
>> subsequently fill in ->md5_checksum, but that's not what you're seeing.
>> I searched for creation of node_revision_t structures and found no
>> places where the data_rep member is uninitialized.
>>
>> Thus I can only suggest looking for memory corruption.
>
> Hi Julian,
>
> Thanks for taking your time to look at this, I appreciate it.
>
> I haven't followed the code path in detail, but my gut agrees with you
> on memory corruption.

Hi Julian,

Resurrecting this thread [1] from 1.5 years ago ;)

Since we discussed this, we moved the Subversion server to a new box and
from RAID to FusionIO storage and we're still getting the core dumps
with the same stack trace, so I don't think its memory corruption.

Yesterday, we got two core dumps within 30 minutes of each other.

Would looking at the txn files in progress tell us anything?

bash-3.2# ls -l transactions/5653610-3f4jm.txn/
total 16
-rw-r--r-- 1 tomcat games 0 Apr 11 18:40 changes
-rw-r--r-- 1 tomcat games 4 Apr 11 18:41 next-ids
-rw-r--r-- 1 tomcat games 156 Apr 11 18:41 node.0.0
-rw-r--r-- 1 tomcat games 2035 Apr 11 18:41 node.0.0.children
-rw-r--r-- 1 tomcat games 2366 Apr 11 18:41 props

bash-3.2# ls -l txn-protorevs/5653610-3f4jm.rev*
-rw-r--r-- 1 tomcat games 0 Apr 11 18:40 txn-protorevs/5653610-3f4jm.rev
-rw-r--r-- 1 tomcat games 0 Apr 11 18:40
txn-protorevs/5653610-3f4jm.rev-lock

Having the empty files, such as changes, is that odd? Could that be a hint?

Blair

[1]
http://mail-archives.apache.org/mod_mbox/subversion-dev/201008.mbox/%3C4C59960C.90409@orcaware.com%3E
Received on 2012-04-12 20:55:24 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.