[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Kudos to cmpilato

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2003-10-03 18:56:23 CEST

Mike Pilato has made some fantastic progress on the libsvn_fs code.
I'm sending this explanation to the dev list so everyone can
appreciate the branch->trunk merges you're about to see from him.

Here's my synopsis of what has transpired in the last six weeks or so:

* The 0.28 fs schema change had the main affect of making it practical
  to report implicit "copy" events when walking back through a file's
  history. In olden days, 'svn log /branch/mybranch/foo.c' would skip
  over the event of creating /branch/mybranch. In the new libsvn_fs,
  we see that event in foo.c's history, even though foo.c was only
  implicitly copied.

* This new history-reporting opened the doors to solving other
  problems: 1. creating HTTP-cacheable version-resource-urls (VR's)
  during checkouts and updates, 2. ViewCVS displaying svn copy events.

     - At the moment, mod_dav_svn generates VR's that are not useful
       to HTTP caching proxies, because they're non-unique. As soon
       as libsvn_fs grew this new accurate history-reporting feature,
       cmpilato taught mod_dav_svn to generate unique ("stable") VR's
       by simply backing up one step in a file's history -- either to
       the most recent file-change, or the most recent copy event.

     - Unfortunately, this change ended up killing us. During a
       checkout, mod_dav_svn is running the history-code on every
       single file, to generate a stable VR. On August 29, we
       discovered that this not only tremendously slowed down
       checkouts, but that three simultaneous checkouts brought
       BerkeleyDB to its knees -- thousands and thousands of locks
       were being created. The brute-force history-searching
       algorithm was hitting the database waaaaaay too hard.

     - We immediately reverted the 'stable VR' mod_dav_svn change, and
       issue #1499 was born. Cmpilato, kfogel, and sander went to
       work, trying to figure out how to stop using (or minimize)
       BerkeleyDB txns for read-only operations.

* In the last couple of days, however, Mike had an amazing flash of
  insight. He figured out a way to *toss* the brute-force
  history-searching algorithm altogether. He's now able to detect
  copies by searching up through (already in-memory) parent dag nodes
  and examining CopyIDs. I'll let Mike explain the algorithm if we
  wants to; it's extremely clever. The new algorithm almost never
  needs to hit the database at all.

The results of this breakthrough are:

   - We can now safely re-enable the 'stable VR' feature in
     mod_dav_svn, with no noticeable performance hit.

   - 'svn log' is much faster at walking back through history and
     detecting copies.

   - ViewCVS will likewise be much faster at generating histories.

In addition to all this Goodness, cmpilato has also finally got the fs
dag-node-caching all fixed up. He'll be merging that feature to trunk
as well, which provides a big libsvn_fs speedup overall.

Kudos to Mike!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Oct 3 18:58:22 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.