[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs_fs core dumps in checksum code

From: Blair Zajac <blair_at_orcaware.com>
Date: Fri, 13 Apr 2012 13:08:12 -0700

On 04/13/2012 12:45 AM, Julian Foad wrote:
> Blair Zajac wrote:
>
>> Since we discussed this, we moved the Subversion server to a new box and
>> from RAID to FusionIO storage and we're still getting the core dumps
>> with the same stack trace, so I don't think its memory corruption.
>
> I meant I suspect corruption of this process's state by any mechanism, which could be buffer overflows, bad multi-threading, and so on. You wrote before, "I'll run our our backend severs on the dev cluster in valgrind and see if we pick up anything there." Were you able to try that? Or just load the core dump files into GDB and see what you can see?

We didn't do valgrind, the box is in production and it would be too
slow. Maybe I can set up a dev process on the production server to test.

To do valgrind well, do I need to recompile APR with specific flags to
enable pool debugging?

>> Yesterday, we got two core dumps within 30 minutes of each other.
>>
>> Would looking at the txn files in progress tell us anything?
> [...]
>> Having the empty files, such as changes, is that odd? Could that be a hint?
>
> No, that's not interesting, that's just the result of crashing out at the point where it did -- in the middle of doing a commit.

The 'changes' is created during the commit process and not building the
transaction? If so, then having an empty changes file is odd and
probably only possible through the RPCS API we wrote that wraps svn_fs.h
and svn_repos.h, in which case, could there be a bug with trying to
commit empty transactions in a multithreaded environment?

Blair
Received on 2012-04-13 22:08:49 CEST

This is an archived mail posted to the Subversion Dev mailing list.