On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann <
stefanfuhrmann_at_alice-dsl.de> wrote:
> Johan Corveleyn wrote:
>
>> On Sun, Aug 29, 2010 at 12:32 PM, <stefan2_at_apache.org> wrote:
>> /> Author: stefan2 /
>> /> Date: Sun Aug 29 10:32:08 2010 /
>> /> New Revision: 990537 /
>> /> /
>> /> URL: http://svn.apache.org/viewvc?rev=990537&view=rev <
>> http://svn.apache.org/viewvc?rev=990537&view=rev> /
>>
>> /> Log: /
>> /> Looking for the cause of Johan Corveleyn's crash (see /
>> /> http://svn.haxx.se/dev/archive-2010-08/0652.shtml), it /
>> /> seems that wrong / corrupted data contains backward /
>> /> pointers, i.e. negative offsets. That cannot happen if /
>> /> everything works as intended. /
>>
>> I've just retried my test after this change (actually with
>> performance-branch_at_990579, so updated just 10 minutes ago). Now I get
>>
>> the assertion error, after running log or blame on that particular
>> file:
>>
>> [[[
>> $ svnserve -d -r c:/research/svn/experiment/repos
>> Assertion failed: *ptr > buffer, file
>> ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 282
>>
>> This application has requested the Runtime to terminate it in an unusual
>> way.
>> Please contact the application's support team for more information.
>> ]]]
>>
>> That is what I expected looking at the call stacks you posted.
> My preliminary analysis goes as follows:
>
> * The error seems to be limited to relatively rare occasions.
> That sufficiently excludes alignment issues and plainly wrong
> parameters / function calls.
>
> * It might be a (still rare) 32-bit-only issue.
>
> * There seems to be no miscast of types, i.e. the DAG node
> being read and causing the PF is actually a DAG node. Even
> if conflicting keys were used, the structure could still be read
> from the cache and would lead to some logic failure elsewhere.
>
> What else could it be? Most of the following are rather
>
> * concurrency issue
> * data corruption within the cache itself
> * some strange serialization issue that needs very specific data
> and / or 32 bit pointers to show up
>
>
> Is there any way I can find more information about this failure, so I
>> can help you diagnose the problem?
>>
>> In fact there is. Just some questions:
>
> * You are the only one accessing the server and you use
> a single client process?
>
Yes. All on the same machine actually (my laptop). Accessing the server with
svn://localhost.
> * Does it happen if you log / blame the file for the first time
> and no other requests have been made to the server before?
>
Yes
> * Does a command line "svn log" produce some output
> before the crash? If so, is there something unusual happening
> around these revisions (branch replacement or so)?
>
Yes. Running "svn log svn://localhost/trunk/some/path/bigfile.xml" yields
969
of the 2279 log entries. From r95849 (last change to this file) down to
r42100. Then it suddenly stops.
I've checked r42100 with "log -v", and it only mentions text modification of
bigfile.xml. Same goes for the previous and next revisions in which
bigfile.xml was affected (r42104 and r42042).
>
> Also, please verify that the crash gets triggered if the server is started
> with the following extra parameters:
>
> * -c0 -M0 -F0
>
No crash
> * -c0 -M0
>
No crash
> * -c0 -M1500 -F0
>
Crash (actually I did it with -M1000, because M1500 would give me an "Out of
memory" immediately).
> * -c0 -M1500
Crash (with -M1000 that is)
>
>
>
>> Just to be clear: the very same repos does not have this problem when
>> accessed by a trunk svnserve.
>>
> I thought so ;) To narrow down the nature of the problem,
> I added some checks that should be able to discern plain
> data corruption from (de-)serialization issues. Please apply
> either the patch or replace the original files with the versions
> in the .zip file.
>
> A debug build should then, hopefully, trigger a different
> and more specific assertion.
>
>
Ok, will try that now.
--
Johan
Received on 2010-08-30 21:33:20 CEST