I voiced some concerns on IRC (
) about the patch http://svn.apache.org/r1666965 nominated for
backport to 1.9.x. More details below.
Bert Huijben wrote:
> Julian Foad wrote:
>> Marc Strapetz wrote:
>>> On 16.03.2015 01:50, Bert Huijben wrote:
>>>> Our server reports use an apr feature that buffers +- 8 KByte data before
>>>> sending out the first data.
>>>> In this specific JavaHL case you ask for just the revision numbers. [...]
>>>> I think every revision would (encoded in our Xml protocol) cost about 70
>>>> bytes, so there would fit at least 100 revisions in that buffer.
>> On my testing against the ASF EU mirror, the buffer size seems to be
>> 48 KB, the response is compressed, and around 3000 log entries (with
>> -q) or 10000 log entries (with --xml --with-no-revprops) are buffered
>> at a time.
The scenario tested above has a huge latency (10 or 20 seconds in my
case) and it looks like this patch would bring it down hugely (to ~0.5
seconds in my case). That's wonderful.
However, it hasn't been sufficiently analyzed and tested yet,
especially not for backport to 1.9, so I have voted against the 1.9
backport proposal for the time being.
>> I used this tracing command:
>> strace -tt -e trace=read svn log -q
>> http://svn.eu.apache.org/repos/asf/subversion/trunk --limit=10000
>> 2>&1 > /dev/null | grep -v "EAGAIN"
> The first level of buffering (The 8 KByte I quoted, or more precisely
> APR_BUCKET_BUFF_SIZE) is before the compression output filter even
> receives the first byte.
> It is quite possible further layers have their own buffering limits,
Yes, and the biggest buffer (or the total size of all buffers) is what
determines the overall latency till the user sees the first result
with no forced flushes.
> but if they follow the filter rules the flush should still get through. The apr brigades add a special flush frame to trigger this behavior, and the documentation explicitly says that filters should take care to handle this properly.
I agree, it should work. Just needs to be tested.
The patch flushes the buffer after the first (1, 2, 4, 8, ..., 2048)
log entries. That's fine for a fast network and a compute-bound
server. The testing above was against a 1.8 server. We don't yet have
any test data against a 1.9 server. The 1.9 server is going to be, I
understand, *much* faster at computing logs. When the computation is
fast so that thousands of logs are available within a fraction of a
second, then we get a burst of extra flushes/packets at the start of
every log but the user experience is no better. Might the additional
cost of flushes/packets be significant? Probably not, we imagine, but
I have no concrete evidence of this.
Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
approximation to the desired semantics which is more like "don't delay
the first result more than a small fraction of a second, and don't
delay the next few results much more than that either". In other
words, the user's requirement is more about time delays. I wonder if
we could implement something closer to what the user really wants.
Received on 2015-03-16 16:36:45 CET