Optimizing properties for checkout/update with ra_serf
From: Justin Erenkrantz <justin_at_erenkrantz.com>
Date: Fri, 8 Jun 2012 10:28:31 +0200
To help kick off the hackathon discussions next week in Berlin, I'd
As Philip has pointed out, there is still a gap between ra_serf and
Let me recap what it is that both RA layers do right now at a high level:
--- ra_neon issues a REPORT call to the server indicating what local revision it has in the body of the request and it also sets the "send_all" flag to true. Then, in the response, mod_dav_svn will do a number of things: - The server will produce an XML document listing what changes need to happen locally on the client. (In this way, the response specific to the client's version indicated in the request body. No amount of HTTP caching can therefore help with REPORT request.) - Inside the XML response, two things of note happen due to send_all being set in the REPORT request: 1. For file content, the server will inline the contents of the file via svndiff - this is regardless if it is fulltext or deltas - the behavior is the same. 2. For every properties on the affiliated file entry, the server sends remove-prop or set-prop with the associated key/value. Since the response is embedded inside of an XML document, we must base64-encode the resulting svndiff and potentially the property values (if it is not XML-safe). Roughly speaking, base64 will add about 20-30% space overhead. If you are willing to do double-compression by using mod_deflate, you'll mitigate some of that space overhead at the cost of CPU by running zlib again over the base64-encoded svndiff (which uses zlib anyway). ra_serf --- ra_serf issues a REPORT call to the server indicating what local revision it has in the body of the request - but, unlike ra_neon, it does not set the send_all flag. So, in response to the request, mod_dav_svn does: - Just like it's response to ra_neon, the server will produce an XML document listing what changes need to happen locally on the client. - However, it does not inline the content or the property values. This is left for ra_serf to handle separately. - ra_serf will then parse the REPORT response - which is substantially smaller than what ra_neon has to do. It then opens up to 4 HTTP connections to the server and does for each file: - Issue a GET with the local version number in the request headers as well as that it'd prefer svndiff responses. mod_dav_svn can then send back a svndiff version if it chooses to...or, it will send a plaintext version. (N.B. This request is easily cacheable by stock HTTP edge caches and proxies.) - Issue a PROPFIND for the file. Optimizing ra_serf: content and properties --- So, ra_serf will issue 2 HTTP requests for each file in checkout/update. In practice, the PROPFIND requests/responses are very small. If your httpd's logging infrastructure isn't tuned (ie, logging to a slow disk and/or not tuned properly), you may notice a slowdown in synthetic checks due to the increased number of responses. There is still additional traffic here by having those extra HTTP requests...and that's where we can further optimize things. A thread a little while back indicated that ra_serf won't compress things by default whereas ra_neon does. This is due to the fact that ra_serf doesn't do the compression inline - it relies upon mod_deflate (standard module in httpd) to do it while ra_neon always does the compression as well as the base64 encoding. So, if you don't do any tuning whatsoever, ra_neon will always send smaller responses than ra_serf. But, when compression is enabled, ra_serf currently sends about 1.2x data compared to ra_neon. We can do better... Recent optimizations in ra_serf (but *not* ra_neon) should attempt to not GET a file that the local client already has in its pristine database. (Think about what happens on a checkout with branches/tags/etc.; I've found this to be a pretty common occurrence at least in my workflow!) It would be much tougher (if not impossible) for this optimization to occur in ra_neon as the server-side does all of the logic and doesn't know what the client may or may not have. In actuality, due to the PROPFIND requirement, ra_serf still issues a HEAD request - but, we don't need the actual content saving a huge bunch of bandwidth on an update case if the pristine already exists locally. This is a huge win for ra_serf and I don't think we'll be able to do much better - we *have* to get the content somehow. (As we eventually move to a global pristine store with libsvn_wc, it'll get even better!) And, these GET/HEADs are easily cacheable, so distributing load on the server side should be fairly straightforward by dropping in a dumb HTTP cache. By skipping the base64 overhead and spreading it across multiple connections (to take better advantage of beefier servers) and using pipelining, I think overall we're doing about as well as we could here. Regarding the 2 HTTP requests, the key bit here is that there is no way for us to get the properties and the content in one call. However, there is a mechanism to optimize the PROPFIND that Greg has suggested that does not require any server-side changes: have ra_serf issue a PROPFIND call on each directory in the WC with a Depth: 1 header. mod_dav_svn will then respond in one HTTP request all of the properties for all of its children. This reduces the number of HTTP requests from roughly 2-per file down to 1-per file and 1-per directory. On the client-side...and why I didn't implement this way back in 2006...is that the PROPFIND/Depth: 1 introduces some complexity on the client due to the editor API. Given the way that our editor drives work with libsvn_wc (including the Ev2 rewrite), we will almost certainly need to keep the properties per-directory around until we are ready to process the file contents. In the worst case, I think it is possible that we could have *all* of the properties for *all* of the files before we even start to process the files themselves. We might be able to play tricks by spooling properties to disk or delaying the PROPFIND until we start fetching the files (if even needed)...but, ugh. I do also wonder if we only do the Depth: 1 when we add a directory - or could we aggregate it when we have multiple files updated in the tree (but not all updated!). Anyway, that's where my head is at. I just don't have a clear picture on what the ra_serf side will look like with PROPFIND/Depth: 1 yet. I hope this helps frame the conversation a bit. For those of you who will be in Berlin, see you soon - and for those of you not in Berlin, we'll ensure to writeup whatever we discuss and throw it on list. -- justinReceived on 2012-06-08 10:29:09 CEST
This is an archived mail posted to the Subversion Dev mailing list.