Re: Diff suffix scanning invalid read

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 25 Mar 2013 23:33:28 +0100

On Mon, Mar 25, 2013 at 11:09 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Mon, Mar 25, 2013 at 9:31 PM, Philip Martin
> <philip.martin_at_wandisco.com> wrote:
>> Philip Martin <philip.martin_at_wandisco.com> writes:
>>
>>> $ valgrind -q .libs/lt-diff-diff3-test 15
>>> ==6097== Invalid read of size 1
>>> ==6097== at 0x503FD83: find_identical_suffix (diff_file.c:586)
>>> ==6097== by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097== by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097== by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097== by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097== by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097== by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097== by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097== by 0x4E35686: main (svn_test_main.c:551)
>>> ==6097== Address 0x138585af is 1 bytes before a block of size 131,072 alloc'd
>>> ==6097== at 0x4C28BED: malloc (vg_replace_malloc.c:263)
>>> ==6097== by 0x572DDDB: pool_alloc (apr_pools.c:1463)
>>> ==6097== by 0x572DF57: apr_palloc_debug (apr_pools.c:1504)
>>> ==6097== by 0x503FACA: find_identical_suffix (diff_file.c:558)
>>> ==6097== by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097== by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097== by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097== by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097== by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097== by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097== by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097== by 0x4E35686: main (svn_test_main.c:551)
>>
>> On entry to find_identical_suffix file[0] is
>>
>> (gdb) p file[0]
>> $1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>> size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>...,
>> curp = 0x129cf48 " @@@\n\300\312&\001", endp = 0x129cf50 "\300\312&\001",
>> normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = -1,
>> suffix_offset_in_chunk = 0}
>>
>> The file data ends at the end of the chunk. The code does
>>
>> if (file_for_suffix[i].chunk == file[i].chunk)
>> {
>> /* Prefix ended in last chunk, so we can reuse the prefix buffer */
>> file_for_suffix[i].buffer = file[i].buffer;
>> }
>> else
>> {
>> /* There is at least more than 1 chunk,
>> so allocate full chunk size buffer */
>> file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE);
>> SVN_ERR(read_chunk(file_for_suffix[i].file, file_for_suffix[i].path,
>> file_for_suffix[i].buffer, length[i],
>> chunk_to_offset(file_for_suffix[i].chunk),
>> pool));
>> }
>> file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i];
>> file_for_suffix[i].curp = file_for_suffix[i].endp - 1;
>>
>> and allocates a new chunk so that file_for_suffix[0] is
>>
>> (gdb) p file_for_suffix[0]
>> $3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>> size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>...,
>> curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>...,
>> normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0,
>> suffix_offset_in_chunk = 0}
>>
>> file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and
>> so dereferencing it is not valid.
>
> You mean file_for_suffix[0].curp, which is 1 byte before
> file_for_suffix[0].buffer. Bummer, it's not supposed to do that.
>
>> How are the chunks supposed to work? When the file data ends at the end
>> of a chunk is there supposed to be another valid chunk with no data?
>
> It's been a while, so I'm not sure. I *think* the intention is that
> find_identical_suffix is not executed in this case, because in
> datasources_open, find_identical_suffix is only executed if
> (!reached_one_eof). And find_identical_prefix should have set
> reached_one_eof in this case. I think ... But apparently it doesn't
> ...

Hmmm, I should have looked closer. The case of file.size==131072
(which is CHUNK_SIZE, which is 1<<CHUNK_SHIFT, which is 17), or any
chunk_size multiple, can happen with or without find_identical_prefix
eating it all. The fact is that offset_to_chunk is defined as:

#define offset_to_chunk(offset) ((offset) >> CHUNK_SHIFT)

which means that offset_to_chunk(131072) is 1. And it just so happens
that in multiple places in diff_file.c the "last chunk" is determined
as offset_to_chunk(file.size).

I think this ("last chunk being offset_to_chunk(file.size)") predates
my involvement of working on the prefix/suffix scanning. So I think
it's safe to say that: yes, it seems that there is supposed to be
another chunk with no data. Perhaps someone else with longer
involvement with the origins of the code of diff_file.c can confirm
that this is indeed the intention.

If that's indeed the case (there can be a last chunk with no data), I
suppose it's best to fix find_identical_suffix to check for that.

Sorry to not be any more help ... a bit overworked lately.

-- 
Johan

Received on 2013-03-25 23:34:19 CET

This message: [ Message body ]
Next message: Joe Swatosh: "Re: [PATCH] #include limts.h for ULONG_MAX"
Previous message: Daniel Shahaf: "Re: one question on the source code of svnserve"
In reply to: Johan Corveleyn: "Re: Diff suffix scanning invalid read"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]