Re: FSFS successor ID design draft

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Sun, 28 Aug 2011 15:46:03 +0200

On 26.08.2011 13:14, Stefan Sperling wrote:
> Below is an initial draft of a design for successor-IDs in FSFS.
> It is a bit long, but I hope it covers all relevant points in
> sufficient detail.
>
> Please share your thoughts on this. I have most likely missed a few things.
Sure. Given that I implemented a bit more than that
in TSVN, I have some things to say about what works
and what doesn't.
>
>
> FSFS successor IDs
> ==================
>
> See http://svn.apache.org/repos/asf/subversion/branches/fs-successor-ids/BRANCH-README
> for what this is all about.
But the assumptions in that file are actually not valid.
There are at least following use-cases for successor IDs:

* "has there been a change for path P since rev R?"
-> a simple svn info + opt. svn ls will answer that

* "has there been a change for the subtree of path P
since rev R?"
-> a simple svn info will answer that

* "where have changes been made to / after (P,R);
search in copies, too?"
-> This can actually be answered directly using
the proposed successor IDs.

* "Where did path P at rev N to M directly or indirectly
  copied to, e.g. which releases contain a certain faulty
  code segment; optionally list changes to targets?"
-> needs to scan parent paths for copies, too
   (reversed "log", revision graph)

* "Where has a given list of patches been applied
to (directly or via merge); and where not?"
-> similarly, reversed log -g

Operation to implement
======================

The core "reverse log" operation would return
data similar to what "log". For that, we need to
follow all copies, including parent copies:

rN: M /trunk/p/f
rN+1: A /tags/t from /trunk_at_N
rN+2: A /branches/b from /tags/t1_at_N+1
rN+3: M /branches/p/f

You want all these revisions to be reported
(possibly filtering the result in a later stage).
One problem here is that we can't use the successor
ID relation for noderevs here because it will report
*every* revision as we need to follow /trunk, /tags
and /branches.

It turns out that we can produce even humongous
reverse logs (50+ k nodes, thousands of branches
and tags) within a second by simply performing
a full history scan.

A example of how the whole process can be
implemented efficiently, can be found here:

https://tortoisesvn.googlecode.com/svn/trunk/src/TortoiseProc/RevisionGraph/FullGraphBuilder.cpp

Also, the change info returned by log should be
extended by an enum {parent, this, subtree}. This
would indicate whether that change corresponds
to a parent of the original path (e.g. when creating
a branch), to the original path itself or to some
sub-tree.

>
> What follows are ideas about implementing successor ID support for FSFS.
>
> Useful background information:
> Node-revisions: subversion/libsvn_fs_base/notes/fs-history
> subversion/libsvn_fs_base/notes/structure
> fsfs design: subversion/libsvn_fs_fs/structure
>
>
> The "cache" and its properties
> ==============================
>
> Storage of successor IDs is essentially a cache of the result of the
> inversion of the predecessor relation between all node-revisions.
> This inversion is a *very* expensive operation (follow every node
> that ever existed in the repository from its most recent revision
> back to the revision it was created in).
Not true. At least the "*very* expensive" part.
Log -v takes about 1min for AFS over ra_local.
Building any of the index data described below
can be done in < 10s.
>
>
> In the following, "cache" refers to the store for successor IDs.
>
> After a general introduction, a proposed cache design is presented.
> Note that the cache implementation will be hidden behind the API of
> libsvn_fs_fs and can be improved or changed in every new minor
> release of Subversion.
>
> Required properties of successor ID cache for FSFS:
> - FSFS-compatible:
> - cache must use some plain file format
> - writers do not block readers (i.e. sqlite is out of the question)
> - cache must not rely on modifying existing revision files
> - cache can be regenerated on demand
> - do not corrupt the cache in case of an unexpected crash
> - try to keep number of directory entries low
> - conserve inodes by design (preferred), or via packing support
Node IDs may not be the most convenient level to operate
on. They imply a certain redundancy (any change requires
new node IDs for all parent paths).
>
> Nice-to-have properties:
> - the repository can be live during cache generation
I propose a modified version of TSVN's log cache
data structure. The adaptations:

* split into multiple files to reduce modification overhead
* remove rev-prop-dependent information (log comment,
author, timestamp)
* add the reverse copy info
* simplify the data structures

>
>
> Storage size requirements
> =========================
>
> Ballpark figures about node-revs from #svn-dev:
> * cmpilato does some math. There are over 350,000 node revisions in my
> ancient copy of the svn.collab.net Subversion repos.
> <hwright> there are over 20M of them in the ASF repo
<snip>

As of r1162184, the AFS repo contains
* 9002894 changes on
* 4272329 different paths constructed from
* 791804 path elements (different file and folder names)
(over 20M words of commentary etc.)

Total cache size on disk: 79MB
Estimated size after adaptations: 60 .. 70M
(due to significant duplication)

> Implementation proposal
> =======================

<snip>

Data model
==========

revisions -> { revision } // array, indexed by revnum
revision -> ( common_base_pathID, // pathID of the longest common
                                       base path of all changes
                common_change_mask, // indicates what kind of changes
                                       is contained in this revision
                changes_count,
                { change } )
change -> ( change_kind, pathID [, fromRev, from_pathID] )
pathID -> ( parent_pathID, elementID )
                                    // pathID 0 is for "/"
                                    // parent_pathID < pathID
elementID -> string // 0-terminated UTF8

fwdCopies -> { fwdCopy } // ordered by source_rev
fwdCopy -> { source_pathID, source_rev, target_pathID, target_rev }
// source rev is going to be implicit

The pathID <-> path and elementID <-> element string
relations are bijective.

External format
===============

This is not a strict append-only format as individual
files will get replaced / rewritten during commit.
Because it will never remove records describing previous
revisions, reading while writing is safe assuming that
you limit your query to committed revisions.

All data files are being sharded to support even huge
repositories and unusual usage patterns. A query will
read, extract the fragments into an LRU cache and, in
most cases, keep them until the query is finished.

(I) Overall structure

/logcache/ids next free IDs
/logcache/elem2s/* elementID -> string; 4k per shard
/logcache/s2elem/* string -> elementID; 256 fragments
/logcache/id2path/* pathID -> (parentID, elementID); 16k per shard
/logcache/path2id/* (parentID, elementID) -> pathID; 256 fragments
/logcache/revision/* revisions; 4k revs per shard
/logcache/fwdcopy/* fwdCopies; 4k revs per shard

(II) Path elements

* elementID -> string
   a simple concatenation of zero-terminated UTF8 strings
* string -> elementID
   - shard selected by lower 4 bits of the first two chars
     in string (padding with 0 if necessary)
   - entry counter, followed by a sequence of 0-terminated
     UTF-8 strings sorted lexicographically, followed by the
     list of corresponding elementIDs

(III) Path IDs

* pathID -> (parentID, elementID)
   list of parentIDs as differences to the previous value,
   followed by the list of elementIDs (non-differential)
* (parentID, elementID) -> pathID
   - shard selected by (parentID % 255) ^ (elementID % 255)
   - entry counter, followed by a sequence of differential parentIDs,
     followed by the list of differential elementIDs,
     followed by the list of corresponding pathIDs,
     sorted lexicographically by (parentID, pathID)

(IV) Revisions
*each shard contains:
   list of common base pathID,
list of aggregated changes flags,
list of numbers of changes per respective revision,
   list of change kinds,
   list of pathIDs,
   list of from_revs (only where indicated in change kind)
   list of from_pathIDs (only where indicated in change kind)

(V) FwdCopies
*each shard contains (indexed by source rev):
list of numbers of forward copies per respective revision,
list of source_pathIDs,
list of target_pathIDs,
list of target_revs (as delta to the implicit source_rev),

(VI) Storage details

* unsigned integers are variable-length encoded (see svndiff.c)
* signed integers are variable-length encoded after mapping
them onto unsigned ints: sint x -> 2 abs(x) + sign(x)
* data has been arranged such that efficient data compression
is possible (details not in this post)

-- Stefan^2.
Received on 2011-08-28 15:46:42 CEST

This message: [ Message body ]
Next message: Johan Corveleyn: "Failing test svnsync_tests.py 28: copy and reencode non-UTF-8 svn:* props"
Previous message: Johan Corveleyn: "Re: FSFS successor ID design draft"
In reply to: Stefan Sperling: "FSFS successor ID design draft"
Next in thread: Stefan Sperling: "server-side log cache (was: Re: FSFS successor ID design draft)"
Reply: Stefan Sperling: "server-side log cache (was: Re: FSFS successor ID design draft)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]