On Mon, Aug 17, 2009 at 5:24 PM, Blair Zajac<blair_at_orcaware.com> wrote:
> Ben Collins-Sussman wrote:
>> Hi folks,
>> As you may have noticed, Google Code's Subversion service is still
>> pretty slow... still much slower than a stock Subversion/Apache server
>> running on a single box. It turns out to be tricky to work with
>> bigtable: you get massive scalability, but in return you have to
>> convert all of BDB's disk i/o calls into network RPCs. On a single
>> box, the disk i/o calls get faster over time as the OS eventually ends
>> up swapping the underlying filesystem into memory. But network RPCs
>> are slow and stay slow. :-/
> Are there any writeups on the specifics of the svn_fs.h to BigTable mapping?
> How are the paths and node-ids mapped to BigTable's key and columns?
The original port that fitz and I did was fairly brain-dead: we
simply forked libsvn_fs_base, and replaced BDB calls with Bigtable
calls. Instead of BDB managing 10 hashtables on disk, we now had
Bigtable managing the same 10 "columns" in a single Bigtable.
It certainly worked, but it was heinously slow. Our whole BDB backend
assumes that that reads & writes are essentially free. Sure enough,
any reasonable OS will eventually page the BDB files directly into
memory and then access *is* essentially free. However, by converting
these BDB calls to Bigtable network RPCs, we experienced a 10x
slowdown. And nothing ever makes the network RPCS faster over time.
We eventually got the system up to a slow-but-usable speed through the
judicious use of gigantic LRU caches. That's what you see today.
Jon's project, however, is building a completely new implementation --
one with a bigtable schema designed from scratch, designed to make as
few Bigtable RPCS as possible. I'm not sure it's safe for me to spill
all the details of that schema to the public just yet; I may need to
get an official nod from someone first.
Received on 2009-08-19 13:38:22 CEST