Here's this morning's failure. For testing, I've set http-timeout=67. I am
trying to see if the zero window is a coincidence or not.
The debug output:
Reading 8192 bytes of response body.
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6180 bytes.
XML: Parsing 6180 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
response body: connection was closed by server (http://foundry51.qnx.com)
sess: Destroying session.
sess: Destroying session.
The network capture:
391174 498.610439000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
391175 498.610495000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [ACK] Seq=1945767 Ack=480574851 Win=1280 Len=0
391176 498.842179000 10.222.3.88 10.14.11.50 HTTP 1334
[TCP Window Full] Continuation or non-HTTP traffic
391177 499.042264000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391178 499.297972000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391179 499.298065000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391180 499.796762000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391181 499.796811000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391182 500.778349000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391183 500.778405000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391184 502.728191000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391185 502.728260000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391186 506.614031000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391187 506.614093000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391188 514.371638000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391189 514.371694000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391190 529.872211000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391191 *REF* 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391192 30.988601000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391193 30.988645000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391194 71.282655000 10.14.11.50 10.222.3.88 TCP 54 [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=1171456
Len=0
391195 71.296598000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
which results in:
392598 71.906863000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
392599 71.906925000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [ACK] Seq=1945767 Ack=482310892 Win=987904 Len=0
392600 72.043276000 10.14.11.50 10.222.3.88 TCP 54 [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=482310892 Win=1188096
Len=0
392601 72.375325000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [FIN, ACK] Seq=1945767 Ack=482310892 Win=1188096 Len=0
392602 72.388690000 10.222.3.88 10.14.11.50 TCP 60 http >
37480 [ACK] Seq=482310892 Ack=1945768 Win=908288 Len=0
There are two interesting things:
The long gap after the *REF* packet is unusual, it's the first time I've
seen that.
The FIN packet is 72 seconds later, not 67 like I set the timeout. OTOH,
it's close, and, given the long gap no packets occur at FIN - 67 seconds.
I'm going to keep trying random timeout values.
...Stu
On Tue, Jan 14, 2014 at 9:27 AM, Stuart MacDonald
<stuartm.coding_at_gmail.com>wrote:
> I don't have access to the server log, but you can see from the wireshark
> capture that the *client* closed the connection (look for the RST packet),
> not the server. The error of "socket closed" is consistent with that.
>
> I suspect that there's an issue handling the "http-timeout" timer; it gets
> triggered during a TCP zero window situation, and then not canceled
> properly, and then it fires, closing the socket.
>
> That would also fit the data of the bug being in 1.6 (reported by someone
> else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
> svn's handling of the underlying HTTP interaction layer, and not in that
> layer or the server or somewhere else.
>
> I could be wrong. When I look at the evidence so far, that's what I see.
>
> ...Stu
>
>
> On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin <philip.martin_at_wandisco.com
> > wrote:
>
>> stuartm.coding_at_gmail.com writes:
>>
>> > I have another failure, with some debug info, and some updates.
>> >
>> > We're switching to a different system for this particular repository,
>> so I
>> > may or may not be able to continue debug efforts. :-(
>> >
>> > Since there's no debug info available for the svn 1.8 client, I haven't
>> > been running that.
>> >
>> > I went a full week without a failure, which I suspect is due to the
>> > enormous amount of debugging that neon-debug-mask=255 produces, so this
>> > week I dialed it back to 127 and got a failure on the first try.
>> >
>> > Here's the latest failure. First up, the debug info:
>> > Got 8192 bytes.
>> > XML: Parsing 8192 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Got 6600 bytes.
>> > XML: Parsing 6600 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Aborted request (-3): Could not read response body
>>
>> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
>> defined in ne_socket.h:
>>
>> #define NE_SOCK_CLOSED (-3)
>> /* Connection was reset (e.g. server crashed) */
>>
>> > sess: Closing connection.
>> > sess: Connection closed.
>> > Request ends, status 200 class 2xx, error line:
>> > Could not read response body: connection was closed by server
>> > Running destroy hooks.
>> > Request ends.
>> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
>> > response body: connection was closed by server (http://<server>)
>> > sess: Destroying session.
>> > sess: Destroying session.
>>
>> What does the server error log show?
>>
>> --
>> Philip Martin | Subversion Committer
>> WANdisco // *Non-Stop Data*
>>
>
>
Received on 2014-01-14 16:43:29 CET