stefan2_at_apache.org wrote on Thu, Apr 10, 2014 at 18:08:46 -0000:
> +++ subversion/branches/thunder/notes/thundering-herd.txt Thu Apr 10 18:08:45 2014
> @@ -0,0 +1,50 @@
> +The Problem
> +In certain situations, such as the announcement of a milestone being
> +reched, the server gets hit by a large number of client requests for
> +the same data. Since updates or checkouts may take minutes, more of
> +the same requests come in before the first ones complete. The server
> +than gets progressively slowed down.
Does this have to be solved in svn?
The solution sounds like you are reinventing the OS' disk cache.
Wouldn't it be better to make a syscall to hint the cache that "<these>
files are going to be a hot path for the next few minutes", so the OS
can then consider the size of the files v. the size of the CPU caches
and other processes' needs and make its own decisions?
I'm not against having this branch, I just don't immediately see why
this is the best solution to the described problem.
P.S. "Bypass the kernel's memory management strategy" even ties to the
recent OpenSSL bug: http://article.gmane.org/gmane.os.openbsd.misc/211963
(tldr: OpenSSL() used its own malloc(), but using the OS malloc() would
have prevented CVE-2014-0160 from resulting in memory disclosure)
> +The best we may achieve is that the first request gets to read the
> +data from disk and all others are being fed from the cache, basically
> +saturating the network as necessary.
> +However, there is a catch. While the first request reads missing data
> +from disk, all others get served quickly from cache and catch up until
> +they miss the *same data* as the first request. Disk access suddenly
> +acts like a synchronization barrier. More importantly, reading and
> +reconstructing data from disk is CPU expensive and blocks ressources
> +such as file handles. Both limits scalability.
> +The Solution
> +We introduce a central registry that keeps track of FS layer operations.
> +Whenever a cache lookup misses, we turn to that registry and tell it
> +that we intend to read / reconstruct the missing data. Once we are
> +done, we inform the registry again.
> +If the registry already contains an entry for the same node, it delays
> +the request until the first one is completed and then tells the caller
> +that it has not been the first. The caller, in turn, may then retry
> +the cache lookup and will usually find the required data. If it misses
> +again, we continue with reading / reconstructing the data ourselves.
> +There are a few caveats that we need to address:
> +* A initial reader may get aborted due to error, stuck or otherwise
> + delayed. The registry uses a configurable timeout for that case.
> +* Streamy processing can make the "end of access" notification
> + unreliable, i.e. it might not get sent in all cases. Again, the
> + timeout will prevent the worst effects.
> +* A reader may request access to the same data more than once at the
> + same time. The second request must not get delayed in that case.
> + Example: When a file gets committed, the deltification base as well
> + as the base for the incomming delta stream may be the same reps,
> + being read as streams at the same time.
Received on 2014-04-11 10:25:19 CEST