On 19.10.2010 15:10, Daniel Shahaf wrote:
> Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
>> Personally, I see [FSv2] as a broad swath of API changes to align our
>> needs with the underlying storage. Trowbridge noted that our current
>> API makes it *really* difficult to implement an effective backend. I'd
>> also like to see a backend that allows for parallel PUTs during the
>> commit process. Hyrum sees FSv2 as some kind of super-key-value
>> storage with layers on top, allowing for various types of high-scaling
>> mechanisms.
> At the retreat, stefan2 also had some thoughts about this...
>
Without going too much into detail, the main issues are:
* Missing 3 layer abstraction: there is no distinction between
logical data model and external representation. That makes
it hard to optimize data arrangement on disk (order of node
deltas etc.) or to cache index (position) information in some
local context.
* Implementation of a "streamy" server API (good) as a fine-
grained iteration over some node tree (bad). In a redesigned
3-layer FS backend, I would like to see set-oriented requests
("get list of nodes in that folder / subtree / whatever", "fetch
data for that list of nodes") that can be transformed in each
layer to a similar request (or limited number of requests)
on the respective lower layer. As a result, data on disk could
be arranged that many high-level requests translate into a
small number of disk read requests asking for large chunks
of data. That abstraction of "access planning" will benefit
DBs and networked file I/O the most.
If someone is working on a design, I would like to review it.
I've got "some" experience what that kind of data processing ...
-- Stefan^2.
Received on 2010-10-24 22:15:41 CEST