On 7 July 2014 20:44, Julian Foad <julianfoad_at_btopenworld.com> wrote:
> I should probably let Stefan answer this, but...
>
> C. Michael Pilato wrote:
>>> On 07.07.2014 17:07, C. Michael Pilato wrote:
>>>> On 07/07/2014 10:58 AM, Ivan Zhakov wrote:
>>>>> My technical opinion that FSFS7/log addressing is slower by design,
>>>>> because it's doing more (read index, then read data instead of just
>>>>> read data) and only caching makes them comparable on performance to
>>>>> FSFS6 repositories.
>
> Ivan, it sounds like you've missed the important part of the design.
> It's designed to do LESS work in total, not more, because the cost of using
> the index is outweighed by the savings that it enables. As I understand it,
> a large saving is gained by re-ordering the data on disk during packing;
> that's why packing is essentially a requirement.
It seems to be designed to do less work in total if there is enough cache at
expense of other users who don't have enough caches.
As I understand constructing final content for file in FSFS7 works like
this (Stefan2 may correct me if I wrong):
1. Open rev10 file
2. Seek to end
3. Read rev10 file trailer
4. Seek to l2p index position
5. Read page of data from l2p index, decode it from 7bit encoding
6. Store index page content in cache.
7. Lookup index to find absolute offset of text delta
8. Open rev9 file
9. Seek to end
10. Read rev10 file trailer
11. Seek to l2p index position
12. Read page of data from l2p index, decode it from 7bit encoding
13. Store index page content in cache.
14. Seek to delta offset in rev10 file
15. Seek to delta offset in rev9 file
Then read two files combining deltas to produce final content.
Steps [2-7] and [9-13] could be potentially avoid if there is enough cache, but
for other use cases it's just waste of resources.
While for FSFS6 repository looks like:
1. Open rev10 file
2. Seek to delta offset in rev10 file
3. Open rev9 file
4. Seek to delta offset in rev9 file
Then read two files combining deltas to produce final content.
Regarding re-ordering during packing: it sounds good in theory, but I
doubt this will work in real world scenarios. There several reasons:
1. There is no explicit continous read: the code still reads
data in random order and assume that OS or "block read" prefetch
(FSFS7 feature)
will deliver data faster since it's reading around the same area.
2. It doesn't help if text deltas spreads across different pack files
3. SANs, SSDs, hybrid-disk and virtual disk are very popular now and assumption
that they have spinning disks characteristcs is wrong IMHO.
---
Ivan Zhakov
Received on 2014-07-07 22:12:11 CEST