Mike Pilato has made some fantastic progress on the libsvn_fs code.
I'm sending this explanation to the dev list so everyone can
appreciate the branch->trunk merges you're about to see from him.
Here's my synopsis of what has transpired in the last six weeks or so:
* The 0.28 fs schema change had the main affect of making it practical
to report implicit "copy" events when walking back through a file's
history. In olden days, 'svn log /branch/mybranch/foo.c' would skip
over the event of creating /branch/mybranch. In the new libsvn_fs,
we see that event in foo.c's history, even though foo.c was only
implicitly copied.
* This new history-reporting opened the doors to solving other
problems: 1. creating HTTP-cacheable version-resource-urls (VR's)
during checkouts and updates, 2. ViewCVS displaying svn copy events.
- At the moment, mod_dav_svn generates VR's that are not useful
to HTTP caching proxies, because they're non-unique. As soon
as libsvn_fs grew this new accurate history-reporting feature,
cmpilato taught mod_dav_svn to generate unique ("stable") VR's
by simply backing up one step in a file's history -- either to
the most recent file-change, or the most recent copy event.
- Unfortunately, this change ended up killing us. During a
checkout, mod_dav_svn is running the history-code on every
single file, to generate a stable VR. On August 29, we
discovered that this not only tremendously slowed down
checkouts, but that three simultaneous checkouts brought
BerkeleyDB to its knees -- thousands and thousands of locks
were being created. The brute-force history-searching
algorithm was hitting the database waaaaaay too hard.
- We immediately reverted the 'stable VR' mod_dav_svn change, and
issue #1499 was born. Cmpilato, kfogel, and sander went to
work, trying to figure out how to stop using (or minimize)
BerkeleyDB txns for read-only operations.
* In the last couple of days, however, Mike had an amazing flash of
insight. He figured out a way to *toss* the brute-force
history-searching algorithm altogether. He's now able to detect
copies by searching up through (already in-memory) parent dag nodes
and examining CopyIDs. I'll let Mike explain the algorithm if we
wants to; it's extremely clever. The new algorithm almost never
needs to hit the database at all.
The results of this breakthrough are:
- We can now safely re-enable the 'stable VR' feature in
mod_dav_svn, with no noticeable performance hit.
- 'svn log' is much faster at walking back through history and
detecting copies.
- ViewCVS will likewise be much faster at generating histories.
In addition to all this Goodness, cmpilato has also finally got the fs
dag-node-caching all fixed up. He'll be merging that feature to trunk
as well, which provides a big libsvn_fs speedup overall.
Kudos to Mike!
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Oct 3 18:58:22 2003