Ben Collins-Sussman wrote:
> Hi folks,
>
> As you may have noticed, Google Code's Subversion service is still
> pretty slow... still much slower than a stock Subversion/Apache server
> running on a single box. It turns out to be tricky to work with
> bigtable: you get massive scalability, but in return you have to
> convert all of BDB's disk i/o calls into network RPCs. On a single
> box, the disk i/o calls get faster over time as the OS eventually ends
> up swapping the underlying filesystem into memory. But network RPCs
> are slow and stay slow. :-/
Ben,
Are there any writeups on the specifics of the svn_fs.h to BigTable mapping? How
are the paths and node-ids mapped to BigTable's key and columns?
I'm curious because I'm finishing a distributed versioned asset management
system using svn as the backend database. The system supports one global
namespace that is distributed across facilities in (e.g. Los Angles and Bristol,
England) and has to support writes everywhere even with network partitioning for
the facility that owns a portion of the namespace.
Reading up on Cassandra, Hypertable and other distributed databases, I'm
wondering that if I was starting this project now instead of two years ago
whether I would have chosen Hadoop with Hypertable, respectively similar to GFS
and BigTable, and provided a svn_fs.h like API on top of that that the asset
management system would use.
Comments welcome.
Regards,
Blair
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2384554
Received on 2009-08-18 00:25:13 CEST