[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1044833 - /subversion/trunk/notes/fsfs-improvements.txt

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Mon, 13 Dec 2010 09:13:42 -0600

On Sun, Dec 12, 2010 at 9:23 AM, <stefan2_at_apache.org> wrote:
> Author: stefan2
> Date: Sun Dec 12 15:23:18 2010
> New Revision: 1044833
>
> URL: http://svn.apache.org/viewvc?rev=1044833&view=rev
> Log:
> Get some ideas for FSFS format improvements written down.
>
> * notes/fsfs-improvements.txt: new file
>
> Added:
>    subversion/trunk/notes/fsfs-improvements.txt   (with props)
>
> Added: subversion/trunk/notes/fsfs-improvements.txt
> URL: http://svn.apache.org/viewvc/subversion/trunk/notes/fsfs-improvements.txt?rev=1044833&view=auto
> ==============================================================================
> --- subversion/trunk/notes/fsfs-improvements.txt (added)
> +++ subversion/trunk/notes/fsfs-improvements.txt Sun Dec 12 15:23:18 2010
> @@ -0,0 +1,224 @@
> +Introduction
> +------------
> +
> +The FSFS external data format can be changed to allow for
> +significantly reduced I/O overhead without changing the
> +fundamental ideas behind FSFS.
> +
> +The whole idea is to rearrange packed revision info in a
> +new FSFS format "6".  A two-way conversion between format
> +"5" and "6" should be possible as well as supporting both
> +formats for different repositories on the same server.

Generally, I like these ideas.

As I understand it, they would be implemented as part of the packing
process, or the pack files would be rewritten as part of the repos
format upgrade. This could get a bit tricky, since FSFS depends on
static file offsets for referencing data within revision files. Not
impossible, just tricky.

> +
> +
> +Revision Order
> +--------------
> +
> +FSFS format "5" packs revisions by putting revision 0 at
> +the beginning of the file and concatenating the others in
> +ascending order.  This intuitive design is counter-productive
> +because we always trace data HEAD -> r0 and scanning a file
> +backwards is expensive.
> +
> ++-------+
> +| rev 0 |
> ++-------+
> +| rev 1 |
> ++-------+
> +
> +:  ...  :
> +
> ++-------+
> +| rev N |
> ++-------+
> +<EOF>
> +
> +A counter-intuitive but more efficient reversed order (de-
> +scending revision order in a packed file) results in always
> +reading forward through a file.
> +
> ++-------+
> +| rev N |
> ++-------+
> +
> +:  ...  :
> +
> ++-------+
> +| rev 1 |
> ++-------+
> +| rev 0 |
> ++-------+
> +<EOF>
> +
> +
> +Don't Interleave Revision and Content Data
> +------------------------------------------
> +
> +Format "5" keeps the each revision as a single block.  When
> +re-constructing a node content from the diffs,  we jump
> +through a number of distant revision blocks.  For each node.
> +
> ++--------------+
> +| rev N Reps   |
> ++--------------+
> +| rev N Header |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Reps   |
> ++--------------+
> +| rev 1 Header |
> ++--------------+
> +| rev 0 Reps   |
> ++--------------+
> +| rev 0 Header |
> ++--------------+
> +<EOF>
> +
> +Instead,  delta-info should be removed from the revision
> +blocks and put at the end of the file.  This speeds up
> +header-only operations like "log" without impairing node
> +content lookup.  And it lays the foundation for further
> +improvements.
> +
> ++--------------+
> +| rev N Header |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Header |
> ++--------------+
> +| rev 0 Header |
> ++--------------+
> +| rev N Reps   |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Reps   |
> ++--------------+
> +| rev 0 Reps   |
> ++--------------+
> +<EOF>
> +
> +The only downside is that putting headers first is somewhat
> +more computationally expensive since offsets needs to be
> +calculated in advance.  This is made even more difficult by
> +using a variable length encoding for those offsets.

Does the statement about computation cost apply to the initial placement,
or subsequent accesses? (I would guess the former.)

> +
> +
> +Group the Representations by Delta Tree
> +---------------------------------------
> +
> +All diffs that are based on the same node within the packed
> +revision file are put in one place.  Re-constructing a node
> +would then involve reading the revision headers (relatively
> +close together) and the content deltas after that (again
> +with high locality).
> +
> ++------------------+
> +| rev N Header     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev 1 Header     |
> ++------------------+
> +| rev 0 Header     |
> ++------------------+
> +| node tree A Reps |
> ++------------------+
> +| node tree B Reps |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| node tree Z Reps |
> ++------------------+
> +<EOF>
> +
> +We can optimize that even further by exploiting the skip-
> +delta information: the "walks" through the delta tree for a
> +given node will merge bit by bit into a main "trunk". The
> +nodes on these paths can be re-ordered such that very few
> +seek() operations are required on average to reconstruct
> +the node content in this file.
> +
> +Example of 8 changes in revs r0 .. r7 (for simplicity),
> +looking at the delta-info only ("->" means "stored as diff
> +against"):
> +
> +       r0
> +       r1->r0
> +       r2->r0
> +       r3->r2
> +       r4->r0
> +       r5->r4
> +       r6->r4
> +       r7->r6
> +
> +Default ordering r7,r6,r5,r4,r3,r2,r1,r0 requires
> +3/3/2/2/2/2/1/1 seeks.  For 2^N changes (N integer > 0),
> +we need (N+1)/2 seeks on average.
> +
> +A path-optimized ordering r1,r3,r2,r5,r7,r6,r4,r0
> +requires 2/2/2/2/1/1/1/1 seeks.  It's (N+3)/4 on average.
> +That optimized ordering can easily be constructed by
> +
> +       * select the highest rev not covered, yet
> +       * prepend its diff path to the existing result
> +         (until we meet a revision at already is in
> +         in the result)
> +       * repeat until all revs are covered
> +
> +
> +Move the Manifest Info into the Packed File
> +-------------------------------------------
> +
> +Once we mastered the offset pre-calculation issue (see
> +above),  we may as well put the manifest info at the be-
> +ginning of the file.  This will benefit "log" in particular
> +as only one consecutive chunk of data needs to be read
> +from only one file.

I don't completely understand the benefits we get from this, since the
manifest is read one and then cached. I would think that the accesses
into the pack file would dominate I/O, and the placement of the
manifest wouldn't matter much (though I could certainly be wrong).

> +
> ++------------------+
> +| rev 0 Offset     |
> ++------------------+
> +| rev 1 Offset     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev N Offset     |
> ++------------------+
> +| rev N Header     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev 1 Header     |
> ++------------------+
> +| rev 0 Header     |
> ++------------------+
> +| node tree A Reps |
> ++------------------+
> +| node tree B Reps |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| node tree Z Reps |
> ++------------------+
> +<EOF>
> +
> +
>
> Propchange: subversion/trunk/notes/fsfs-improvements.txt
> ------------------------------------------------------------------------------
>    svn:eol-style = native
>
>
>

Overall, it looks like a good plan. Thanks!

-Hyrum
Received on 2010-12-13 16:14:24 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.