On 7 October 2015 at 14:53, Julian Foad <julianfoad_at_gmail.com> wrote:
> Ivan Zhakov wrote:
>> Bert Huijben wrote:
>>>> URL: http://svn.apache.org/viewvc?rev=1707196&view=rev
>>>> Log:
>>>> Slightly optimize svn_stringbuf_from_stream() to avoid allocating twice
>>>> more memory and unnecessary memcpy() when LEN_HINT is equal to final
>>>> stringbuf
>>>> length.
>>>>
>>>> * subversion/libsvn_subr/stream.c
>>>> (svn_stringbuf_from_stream): Always preallocate LEN_HINT +
>>>> MIN_READ_SIZE
>>>> bytes to be able perform final read without stringbuf reallocation.
>>>
>>> Can you explain why hint + MIN_READ_SIZE instead of something like
>>> MAX(len_hint+1, MIN_READ_SIZE)
>>>
>>> I don't know what MIN_READ_SIZE is from just this patch, but it could easily be
>>> something like 16 Kbyte, while len_hint could be something like 16 for a file like
>>> 'format' that was just statted to obtain the actual size
>>>
>>> 16 Kbyte + 16 bytes for a 16 bytes file looks a bit large... And allocating 16 + 32
>>> bytes is far more efficient than allocating that huge chunk at once.
>>>
>> MIN_READ_SIZE is 64 bytes and it's locally defined in this function.
>> We cannot use MAX(len_hint+1, MIN_READ_SIZE) because main loop
>> reallocates buffer if don't have MIN_READ_SIZE remaining. The current
>> code assumes that MIN_READ_SIZE is small value and I decided to leave
>> this code intact. It could be improved of course, but I wanted to make
>> minimum change first.
>
> I don't think that was the right fix. One deficiency with the code is
> that it doesn't break out of the loop when end-of-stream is detected,
> which can be reported by a 'short' read.
It's a different problem and I my plan was to fix in another commit. I
don't think that problem in existing code should be reason to do not
fix one of them.
> Your change doesn't solve
> that problem in general, although it does solve it for the case where
> the supplied length hint is exactly right.
>
> We also have the function
>
> svn_string_from_stream()
>
> which performs a virtually identical task. These two functions have
> completely different implementations. svn_string_from_stream() does
> break out of the loop as soon as it detects end-of-stream reported as
> a short read. They also use completely different read sizes
> (MIN_READ_SIZE = 64 bytes vs. SVN__STREAM_CHUNK_SIZE = 16384).
>
Did you notice that current svn_stringbuf_from_stream() implementation
increase read size twice on every loop iteration?
> That is silly: there is no good reason for that, and it is a waste of
> effort maintaining them separately, and discussing separate
> optimizations for each.
>
> We should make those two functions share the same "best" implementation.
>
I agree that it would be nice to have the same "best" implementation
for both functions. But svn_string_from_stream() function less
efficient it terms of memory operations, while reading stream using
bigger chunks at the beginning. Another difference that
svn_string_from_stream() closes source stream, while
svn_stringbuf_from_stream() don't.
Disclaimer: I didn't wrote svn_string_from_stream() nor
svn_stringbuf_from_stream() and I'm not aware why these functions have
different implementations. I've just spotted one particular problem
and fixed it.
--
Ivan Zhakov
Received on 2015-10-07 14:21:40 CEST