As many of you already know, Google recently announced that they had (again)
written a new filesystem backend for Subversion, built atop their BigTable
database system. As it turned out, Jon Trowbridge -- primary author of that
new backend -- was visiting the Google NYC office last week at the same time
that those of us at the Subversion Vision Conference were freeloading a
conference room there. We were very fortunate to have been able to steal
some of Jon's time and solicit feedback from him regarding his experiences
in writing this new backend.
What follows are notes that I took based on what Jon shared with us. (He's
seen a skeletal version of this list and okayed my sharing of it.) Keep in
mind that some of this feedback stems from the nature of what Jon was trying
to do -- it's not every day that someone feels compelled to write a new FS
backend for Subversion designed simultaneously treat thousands of computers
as if they were one. Still, he offers some very valuable general
observations that we'd be wise to note.
- Writing a new FS is *hard*. (Jon estimated that it cost him about two
man-years to make it happen.)
- There's not enough test coverage of the FS API. "You test for the bugs
you know, but not that the APIs actually work". Jon said that it
became clear very quickly that as Subversion's client began to
be usable in the very early days, and as we started writing the
Python tests which wrap the client, we abandoned the idea of writing
smaller, C-based unit tests. The net affect was that when something
went wrong that wasn't caught by the fs-tests, the whole stack of
Subversion's layers was suspect. It was not uncommon to spend at
least a day or two fixing any one bug.
- Subversion's API parameters and purposes are well documented, but
error codes are not. As a result, it's not clear when catch-and-react
versus catch-and-raise error handling semantics are in order.
- Fake abstraction just gets in the way. For example, node-rev-ids
propose to be abstract but aren't -- that triplet of information is
actually critical to Subversion's inner workings -- and the abstraction
just causes problems.
- The 'changes' table is obviously a bolt-on. (Yep, he's totally right.)
- Subversion assumes that I/O is essentially free. But in a cloud-based
FS, that's not true.
- The FS doesn't support batch operations. This hurts backends for which
the cost of asking for anything is generally higher than the cost of
answering the question for a set of many things. (Think "cost of RPC"
versus "cost of multi-row database query".)
- The FS API supports features that aren't actually used, such as the
ability to, in a single transaction, add a file, delete it, add a file
at the same path, delete it, etc. This is not only unnecessary
cognitive cost, but practically it's a cache-killer.
- Path-based APIs are highly costly because of DAG walking. ID-based
APIs would be more performant.
- You really should optimize for your particular storage layer.
- "WebDAV sucks. Period."
- Don't underestimate the extent to which SVNKit (or, in general, any
non-core-Subversion-based client) is used, and the accompanying nuances.
Total lack of mod_dav_svn tests caused suffering here.
- Google's Mercurial implementation was much easier to write, primarily
because Mercurial is "just a vastly simpler system". (There's no
directory versioning, no mixed-rev working copies, etc.)
There you have it. Jon's a pleasant guy, so it didn't hurt too badly as he
kicked the snot out of us. For me, the biggest lessons to be learned here
touch on over-engineering and under-testing. I hope others benefit from
Jon's truly unique perspective here.
--
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2010-03-31 21:21:11 CEST