[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Limiting access to replay in 1.4

From: Brian Behlendorf <brian_at_collab.net>
Date: 2006-04-12 21:21:17 CEST

My overall sense is that if mirroring of large repositories is a broadly
desireable option, then it needs to use a mechanism that doesn't place all
the responsibility for bootstrapping new mirrors (or personal mirrors)
entirely on the central server. Something like a BitTorrent-esque
protocol where mirrors can help bootstrap new mirrors would be ideal -
there's probably even ways to do it without major coding, such as putting
up large but compressed dumpfiles and using existing BT tools to
distribute them. Until then, I think installations should be able to
limit total resources consumed by people doing mirroring - which could be
0 in Justin's case. But even without those, there might be reasonable
ways to limit:

On Wed, 12 Apr 2006, Molle Bestefich wrote:
> I'm going to get picky with the terms, bear with me..
>
> If setup correctly (fx. per-user-per-host), rate limiting _does_
> prevent DoS, just not DDoS.

You have to carefully define the rate of _what_ is being limited.
There's little direct correlation between # of connections, # of
transactions, # of bytes, I/O load, or CPU time; any one of those could
pose a problem by a user independently of another, and can even be
unintentional. There are now commercial Subversion clients that insist on
doing repository scans on a periodic basis just to let people know when
there's something new - and people are setting that to once a minute.
It's the RSS polling problem writ large, and actually poses more of a
problem for CollabNet's customers than people doing too much mirroring.

I noticed that a similar commercial system implements rate limiting by
trying to predict how many objects are going to be read or modified by a
given operation, and then rejecting requested operations of that list is
larger than some configurable amount (and the end-user isn't in a
permissions class that allows them to avoid that check). Such a request
would be rejected, even though an end-user can accomplish the same thing
by performing the same update or commit operation with two separate
commits. I wonder if it's even possible in SVN to calculate that ahead of
time; you don't want operations to fail halfway through. But it can block
that accidental big usage.

For non-accidental usage, the problem starts to look like a standard
quality of service problem - you want the "critical" operations to have
precedence over operations that have more tolerance for delay. By
definition, operations like mirroring have more tolerance for delay than
operations like commit, and updating a small working copy should take
precedence over large. It's why, for example, we have the concept of
"MinSpareServers" in Apache httpd - you always want, when possible, a pool
of servers available to immediately handle the fast requests even if other
threads are consumed handling more difficult requests.

Can we identify "expensive" operations as those consuming over some
configurable amount of wall-clock time (counting HTTP persistant
connection), and then lower their priority against operations that have
not yet exceeded that limit? Go further and limit it based on a
combination of IP address (to separately identify the anonymous users) and
login name (to separately identify those named users all coming from the
same company). Something persistant so that a quick cancel and restart
doesn't allow someone to work around the limits. That indentification
should time out so that someone isn't penalized indefinitely. And while
it may be romantic to allow the long-termers to duke it out for fixed
resources, we can't have too many there, just as eventually you might run
out of MinSpareServers headroom if you hit MaxClients in httpd.

That would still reward people who update more selectively or are making
commits, and still allow those with larger updates or big checkouts to
eventually get what they want.

Conceptually it's a mirror of "make the easy things easy and the hard
things possible": "make the quick actions quick and the long-term
operations eventually finish".

         Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 12 21:22:25 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.