Ivan Zhakov wrote:
> Bert Huijben wrote:
>>> URL: http://svn.apache.org/viewvc?rev=1707196&view=rev
>>> Log:
>>> Slightly optimize svn_stringbuf_from_stream() to avoid allocating twice
>>> more memory and unnecessary memcpy() when LEN_HINT is equal to final
>>> stringbuf
>>> length.
>>>
>>> * subversion/libsvn_subr/stream.c
>>> (svn_stringbuf_from_stream): Always preallocate LEN_HINT +
>>> MIN_READ_SIZE
>>> bytes to be able perform final read without stringbuf reallocation.
>>
>> Can you explain why hint + MIN_READ_SIZE instead of something like
>> MAX(len_hint+1, MIN_READ_SIZE)
>>
>> I don't know what MIN_READ_SIZE is from just this patch, but it could easily be
>> something like 16 Kbyte, while len_hint could be something like 16 for a file like
>> 'format' that was just statted to obtain the actual size
>>
>> 16 Kbyte + 16 bytes for a 16 bytes file looks a bit large... And allocating 16 + 32
>> bytes is far more efficient than allocating that huge chunk at once.
>>
> MIN_READ_SIZE is 64 bytes and it's locally defined in this function.
> We cannot use MAX(len_hint+1, MIN_READ_SIZE) because main loop
> reallocates buffer if don't have MIN_READ_SIZE remaining. The current
> code assumes that MIN_READ_SIZE is small value and I decided to leave
> this code intact. It could be improved of course, but I wanted to make
> minimum change first.
I don't think that was the right fix. One deficiency with the code is
that it doesn't break out of the loop when end-of-stream is detected,
which can be reported by a 'short' read. Your change doesn't solve
that problem in general, although it does solve it for the case where
the supplied length hint is exactly right.
We also have the function
svn_string_from_stream()
which performs a virtually identical task. These two functions have
completely different implementations. svn_string_from_stream() does
break out of the loop as soon as it detects end-of-stream reported as
a short read. They also use completely different read sizes
(MIN_READ_SIZE = 64 bytes vs. SVN__STREAM_CHUNK_SIZE = 16384).
That is silly: there is no good reason for that, and it is a waste of
effort maintaining them separately, and discussing separate
optimizations for each.
We should make those two functions share the same "best" implementation.
- Julian
Received on 2015-10-07 13:53:46 CEST