On Mon, Dec 23, 2013 at 3:51 PM, Lieven Govaerts <lgo_at_apache.org> wrote:
> to be clear, you've tested:
> - svn 1.7.13 with ra_neon, svn 1.7.14 with ra_neon and 1.8.5 with ra_serf.
> - to an svn 1.7.7 server
> - over http, with nginx 1.0.5. as intermediate proxy
> - running 'svn up'
> - on an Ubuntu guest (version ______) using Virtualbox (version
> __________) on host OS ________.
>
> Can you fill in the blanks?
>
- on an Ubuntu guest (version 12.10) using Virtualbox (version 4.2.12
r84980) on host OS Windows 7 64-bit Enterprise SP1
For VirtualBox the guest extensions are installed; the adapter is set in
NAT mode.
The error "The server sent a truncated HTTP response body." you're
> seeing from ra_serf, means that ra_serf is busy reading a response
> from the server but all data isn't arriving.
>
The network trace shows the data arriving:
svn 1.8.5:
496958 625.444757000 10.222.3.88 10.14.10.85 HTTP 1434
Continuation or non-HTTP traffic
496959 625.444836000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
496960 625.444878000 10.14.10.85 10.222.3.88 TCP 54 [TCP
Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496961 626.111331000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496962 626.124890000 10.222.3.88 10.14.10.85 TCP 60 http
> 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
svn 1.7.14:
444658 437.941946000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
444659 437.941995000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660 439.774982000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661 440.020170000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662 440.033580000 10.222.3.88 10.14.10.32 TCP 60 http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
One thing to note: a while ago my IT dept had me set "http-timeout=180" to
see if that fixed the problem. The 1.7.14 network capture is with that set
from last Fri. The 1.8.5 is without it set from this morning. I found that
setting it did reduce the frequency of occurrence, but not resolve it
entirely; I went from failing about 3 out of 5 mornings to 1 out of 5. I've
removed it (restored to using the default values) in order to have more
frequent failures for quicker testing.
We have seen this caused by proxies tampering with the response data.
> In this case I suppose the connection is shut down unexpectedly first,
> which ra_serf interprets as 'the connection closed, but I didn't get
> all data for this response yet" => error.
>
I would doubt that the proxy is tampering with the data, otherwise svn
would fail all the time, yes?
Note that the client is sending the [FIN, ACK], not the server.
> You have tested two versions of subversion, each with their own http
> client implementation (serf and neon). Both libraries have been
> implemented completely independently by different people, without
> sharing code. I find it difficult to believe that both libraries have
> the same bug, but ok, it's not impossible.
>
Agreed, it seems unlikely, but not impossible. I'd look first at the svn
layer that uses the libraries.
I took some time last Fri to see what was 180 seconds prior to the FIN
packet in the trace (this is the 1.7.14 failure):
289223 259.418521000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
289224 259.418583000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
289225 259.649415000 10.222.3.88 10.14.10.32 HTTP 1334
[TCP Window Full] Continuation or non-HTTP traffic
289226 259.855605000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
289227 259.868018000 10.222.3.88 10.14.10.32 HTTP 1334
[TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
289228 *REF* 10.14.10.32 10.222.3.88 TCP 66 [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
SLE=358465616 SRE=358466896
289229 0.383858000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
289230 0.398005000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
which yields
444658 178.073895000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
444659 178.073944000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660 179.906931000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661 180.152119000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662 180.165529000 10.222.3.88 10.14.10.32 TCP 60 http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
If the time ref is set at (wireshark-)packet 289227 then the FIN packet
would be at 179.something elapsed. There are a lot of TCP zero windows in
the trace though, that one isn't special AFAICT. There's plenty before and
after it.
If someone can let me know what the default for http-timeout is for
1.8.5/ra_serf, I'll do the same check in this morning's failure and see
what's there.
In case of serf I can add that if serf would actively drop the
> connection due to a timeout, you'd see a different error message. In
> fact, I've searched for error E120106 and it's only raised when a
> clean EOF is returned from the network layer, whereas in case of error
> (TCP RST) the nw layer returns a ECONNABORTED or ECONNRESET status
> code.
>
Right, but that would be if the server drops the connection? The client is
dropping the connection.
> So, my first guess would be a problem with either the guest OS, or
> with the Virtualbox network code (the latter I think is more likely).
>
The packets are being delivered, so I'd expect not, but that's an area I
didn't consider. On that note: the wireshark capture is from the Win7 host,
not the guest. I'm assuming the captures would be the same in both cases,
but that may not be true.
> 1. If you like and are allowed to, you can send me the wireshark .pcap
> file privately (replace username/passwd/basic/digest authn headers
> with X, but keep the same # of chars!).
>
I've been assuming I'm not allowed to; I'd like to leave this to a last
resort.
> Also, the debug logging of both http client libraries would be nice:
> - For svn 1.7.x put neon-debug-mask=255 in your servers file.
>
Done.
> - For 1.8.5 this won't work anymore, you'll have to compile serf from
> source to enable debugging (flags CONN_VERBOSE, SOCK_VERBOSE,
> SOCK_MSG_VERBOSE in serf_private.h).
>
I tried compiling from source last week, but it was pretty painful. I'll
try again over the holidays.
> 2. I'd also test this same operation on a physical machine, with the
> same OS and/or with a different OS.
>
The rest of my team are not running a VM and nobody else sees this problem.
Also, they've syncing the entire repository, I'm only syncing a few (large)
branches to save disk space, so their operations are heavier than mine.
Keep in mind: the post that I linked to in my original post mentions that
the problem seems to be due to the slowness of the VM while svn is
processing the incoming data. I'm expecting this to be a necessary
contributing factor, although not a sufficient factor. IOW I wouldn't
expect this problem to show up on the host OS.
Hm. Given that I'm the only person out of a team of about 10 distributed
coast to coast across Canada, doesn't that rule out the proxy?
I have TortoiseSVN installed, I'll see if I can setup something on the host
OS.
> 3. Regardless of this issue, the upgrade of nginx is important for svn
> 1.8.5, as older versions of nginx have problems with serf's use of
> HTTP 1.1 features. There's a fallback mechanism now, but at the cost
> of some reduced performance.
I'll mention this to our IT.
...Stu
Received on 2013-12-23 22:48:15 CET