We talked about that in Berlin but since then I still have been
pondering the question of where FSFS improvements should go and
I changed my mind more than once. Now I think, I have a consistent
and workable answer.
Historically, the fsfs-format7 branch was destined to 'fix all
that is "wrong"' with FSFS-f6. As I went analyzing and addressing
issues, more things kept coming up and I found solutions that
work reasonably well. Some are already implemented, others won't
for a while.
All that lead to a point where FSFS-backward compatibility is much
more of a burden and stability risk than given a tangible benefit.
The relevant code is still there - despite ripping stuff from FSFS
and FSX backends. However, there never was a time at which an
FSFS-f7 with "just the right amount of improvement" - as outlined
below - ever existed.
Thus, the following strategy:
* Get the fsfs-f6 compatible refactorings and improvements to /trunk.
That requires no format bump and the current state of FSFS on
the fsfs-format7 branch will be the blueprint.
* After that, create a branch for what fsfs-f7 can feasibly be.
Most of that code can be found right at the point where I
forked FSFS and FSX. Manually applying changes will be required
for review anyway:
- based on fsfs-f6
- add support for logical addressing
- data alignment and block read
- pack() reorders data on disk
TODO but not very complicated:
- support of mixed addressing repositories, i.e. allow upgrade to f7
- review existing tools (e.g. fsfs-stats) to handle logical addressing
- nice to have: bump short-term cache hit rates to > 99.9%
Maybe backported from FSX in future release:
- prefetch daemon
Gains already demonstrated: 3x speedup in c/o, 10x speed up in
merge-info evaluation, ~100x speedup in log (YMMV)
* FSX is a third backend (alongside FSFS and BDB) that will keep
its EXPERIMENTAL state for at least 2 releases. That implies
that there will be no direct upgrade paths between those releases.
Users may use it in read-only mirrors they already have for
analyzing large repositories. This is where FSX should excel
from the start.
There is a fair chance the FSX will be the first implementation
of FS2 such that the long-term upgrade path would be from
[FSFS,BDB] -> [FSX]. The key is to "always" have a fully
functional implementation instead of starting from ground up.
Features include (more will be added when we start designing FS2):
- logical addressing, block read, pack() reordering
- replace fixed-window txdelta with variable-window txdelta2
- replace txdelta windows with star-delta containers
- replace the reps / noderev / changes items with much lower
overhead containers. This is the point where backward compat
begins to hurt very much.
- staged packing (more and more revs per pack file)
- enhanced change list info that allows for "loggy" ops to
be run without touching tons of directory reps
- fully checksummed
- prefetch deamon
- in-memory transactions (may require FS2)
Some design goals:
- ~50% reduction in repo size. At par with or better than git.
- space to represent a merged node: ~100 bytes typ.
- commit speed >1000 revs/s
- verification in O(repo size), ~1GB/s
- log, log -g, merge and c/o at > 100MB/s from cold start
I hope that makes sense to most of you.
-- Stefan^2.
Received on 2013-07-02 15:52:44 CEST