On Mar 7, 2018, at 6:18 PM, Philip Martin <philip_at_codematters.co.uk> wrote:
>
> [Moving this to the dev_at_s.a.o list.]
>
> Well done! It looks like you have identified a serious bug here. The
> function svn_fs_fs__get_contents_from_file() that was recently added to
> 1.9 for the SHA1 collision detection so the code is new and it is also
> different from that on trunk.
>
> Your proposed fix looks like it might be correct, although rb->rep is
> just rep so it could be a bit simpler. One other consequence of this
> bug is that the checksum calculated in rep_read_contents is not
> finalized.
>
> Myria <myriachan_at_gmail.com> writes:
>
>> During rep_write_contents_close, there is a call to get_shared_rep.
>> get_shared_rep calls svn_fs_fs__get_contents_from_file, which has the
>> code pasted below.
>>
>>
>> /* Build the representation list (delta chain). */
>> if (rh->type == svn_fs_fs__rep_plain)
>> {
>> rb->rs_list = apr_array_make(pool, 0, sizeof(rep_state_t *));
>> rb->src_state = rs;
>> }
>> else if (rh->type == svn_fs_fs__rep_self_delta)
>> {
>> rb->rs_list = apr_array_make(pool, 1, sizeof(rep_state_t *));
>> APR_ARRAY_PUSH(rb->rs_list, rep_state_t *) = rs;
>> rb->src_state = NULL;
>> }
>> else
>> {
>> representation_t next_rep = { 0 };
>>
>> /* skip "SVNx" diff marker */
>> rs->current = 4;
>>
>> /* REP's base rep is inside a proper revision.
>> * It can be reconstructed in the usual way. */
>> next_rep.revision = rh->base_revision;
>> next_rep.item_index = rh->base_item_index;
>> next_rep.size = rh->base_length;
>> svn_fs_fs__id_txn_reset(&next_rep.txn_id);
>>
>> SVN_ERR(build_rep_list(&rb->rs_list, &rb->base_window,
>> &rb->src_state, &rb->len, rb->fs, &next_rep,
>> rb->filehandle_pool));
>>
>>
>> The bug is occurring because build_rep_list is being called with
>> first_rep->expanded_size set to zero. Well, the reason it's zero is
>> because first_rep is the second to last parameter to build_rep_list,
>> and the above code initialized expanded_size to zero:
>>
>> representation_t next_rep = { 0 };
>>
>> Does the code just need this? I don't know this call >.<
>>
>> next_rep.expanded_size = rb->rep.expanded_size;
>>
>> Melissa
>>
>>> On Wed, Mar 7, 2018 at 9:02 AM, Nathan Hartman <hartman.nathan_at_gmail.com> wrote:
>>>> On Mar 5, 2018, at 10:54 PM, Myria <myriachan_at_gmail.com> wrote:
>>>>
>>>> Final email for the night >.<
>>>>
>>>> What's clobbering the expanded_size is this in build_rep_list:
>>>>
>>>> /* The value as stored in the data struct.
>>>> 0 is either for unknown length or actually zero length. */
>>>> *expanded_size = first_rep->expanded_size;
>>>>
>>>> first_rep->expanded_size here is zero for the last call to this
>>>> function before the error. In every other case before the error, the
>>>> two values are equal.
>>>>
>>>> Then this code executes:
>>>>
>>>> if (*expanded_size == 0)
>>>> if (rep_header->type == svn_fs_fs__rep_plain || first_rep->size != 4)
>>>> *expanded_size = first_rep->size;
>>>>
>>>> first_rep->size is 16384, and this is why rb->len becomes 16384,
>>>> leading to the error.
>>>>
>>>> I don't know what all this code is doing, but that's the proximate
>>>> cause of the failure.
>>>>
>>>> Melissa
>>>
>>> Has it been possible to determine what is setting expanded_size to 0
>>> before that last call? I wonder if there is specific logic that
>>> decides (perhaps incorrectly?) to do that?
>>>
>>> Alternatively is it being clobbered by some out-of-bounds access,
>>> use-after-free, or another such issue?
>>>
>>> Is it possible in your debugger setup to determine the address of
>>> that variable and set a breakpoint that triggers when that memory is
>>> written? (It may be called a watchpoint?)
>>>
>>> Which leads me to another thought: if you can set such a breakpoint
>>> / watchpoint and it does not trigger, then this expanded_size might
>>> not be the same instance in that final call. Perhaps a shallow copy
>>> of an enclosing structure is made which leaves out the known size
>>> and sets it to 0 for some reason, and that final call is given that
>>> (incomplete) copy.
>>>
>>> Caveat: I am not familiar with the codebase but these are my
>>> thoughts based on adventures in other code bases.
>>>
>>
>
> --
> Philip
Congrats for finding that!
It looks like this falls under the "shallow copy ... which leaves out the known size" possibility I surmised about earlier, in which case it is zero because the enclosing structure was previously memset()ed to 0 and this field was not assigned.
That makes me wonder why this has not triggered more frequently for users? Is there some obscure set of circumstances that triggers this code path in this particular way? If so, can a test be added to the test suite to prevent this sort of thing from slipping in again?
In what way is the code different on trunk?
Received on 2018-03-08 01:17:00 CET