Mark Phippard <markphip_at_gmail.com> wrote on 06/06/2012 10:18:52 AM:
> > On Tue, Jun 5, 2012 at 8:28 AM, Mark Phippard <markphip_at_gmail.com>
wrote:
> >> Keep in mind that this is not about server load, they are looking at
> >> this from bandwidth. So a cache in front of the server would not
help
> >> them at all. Eclipse.org has a 10MB Internet link that is almost
> >> always saturated.
> >>
> >> Also, if it was not clear, Subversion is not involved here. This is
a
> >> plain Apache server.
> >
> > IMO, having a 10Mbps link (if that is indeed the case) is probably the
> > root of the problem...that's ridiculously underprovisioned for a
> > public site. Any type of update checks for a heavily used product no
> > matter what the underlying protocol is would saturate that once they
> > get enough users. An easy thing for them to have done (for example)
> > is to shove the check on their mirrors or a CDN (hi Fastly!) or
> > something similar.
>
> It is possible I mis-reported the number. Eclipse already uses
> mirrors extensively and the product has built-in support to use the
> mirrors for the checks. I believe the main issue for the Eclipse
> servers are in servicing all of the development builds and
> infrastructure as these do not use the mirrors. That said, bandwidth
> is always an issue as you can see from all of the blog posts from the
> webmaster:
>
> http://eclipsewebmaster.blogspot.com/search?q=bandwidth
>
> > This goes back to having a simple protocol which is the antithesis of
> > the REPORT call in ra_neon. Having a simple straightforward protocol
> > allows you to easily drop in caches - REPORT won't let you do that as
> > the responses are too specific to the user to permit any type of
> > caching which will limit your options when you do hit load.
>
> They are already serving virtually the entire site out of a memory
cache:
>
> http://eclipsewebmaster.blogspot.com/search?q=cache
>
> This issue was purely about using the available bandwidth better. The
> thing I found interesting is that when you look at these simple small
> requests down at the packet level and add them all up (when there are
> millions of these every day) the amount of available bandwidth they
> suck up is interesting. We have done good work with Serf and HTTPv2
> to eliminate a lot the smaller requests we used to do. Hopefully this
> is motivation for removing more of them, such as the ones Mike Pilato
> mentioned. I always thought of all those HEAD and PROPFIND requests
> from the point of view of a client on a high latency connection, and
> how it made things slower than it needed to be. I just thought it was
> interesting to see how these small requests can add up on the backend
> to consume a significant amount of the available packets that can be
> sent/received in any one day. It is just food for thought.
Externals are another area that need optimization. Each external
opens a new separate connection to the server, with all the
associated authentication handshakes. I've unfortunately seen
projects with tens of thousands of externals that take 20+ hours
to do an update over a 200ms+ connection. I've been able
to reduce that time to around 15 minutes by only contacting
the server if the svn:external revision doesn't match the
current commit revision in the working copy and by updating
the remaining svn:externals in parallel.
So far, my work is just a prototype ruby script that
launches the command line and processes the xml output,
so most of the time is spent launching processes against
the local working copy. I'm thinking the revision
check should be easy to add to the C code, but I haven't
investigated it yet. Instead of parallel updates, I
think the normal client could also use connection
pooling to reduce the need of the authentication handshakes,
but that is a more intrusive change...
Kevin R.
(And yes, I would not recommend a project use 10,000
externals...)
Received on 2012-06-06 17:56:39 CEST