Re: 'svnadmin upgrade' Re: FSFS successor ID design draft

From: Mark Phippard <markphip_at_gmail.com>
Date: Tue, 6 Sep 2011 11:50:40 -0400

If it makes any sense, you could look at what we did in 1.5 for merge tracking.

Merge tracking required a new bit of information -- the node origin
index. Dump/Load automatically populated this index. svnadmin
upgrade bumped the format but did not populate the index. Instead the
index was lazily built as needed and you suffered the performance
penalty of doing it this way. To offset this, we provided a separate
tool svn-populate-node-origins-index which a user can run of they want
to avoid the dump/load but still have predictable performance. The
tool runs a lot faster than a dump/load and basically lets you
manually pre-populate the index so that users do not get the
inconsistent performance they would get if it was being built on the
fly at unpredictable moments.

Mark

On Tue, Sep 6, 2011 at 11:42 AM, Daniel Shahaf <danielsh_at_elego.de> wrote:
> r1165700 mentions that we need to decide what to do with 'svnadmin
> upgrade'.
>
> 1. We could punt and require a dump/load. All format >=6 FSFS instances
> always have a full successors store.
>
> 2. We could store the progeny shard size and the successors data shard
> size in the 'format' file. If those values are absent (or zero),
> then the successor lookup operation returns SVN_ERR_UNSUPPORTED_FEATURE.
>
> We can also reconstruct the successors cache without a dump/load, via
> a tools/ binary that operates as follows:
>
> for (i = 0; i < CONSTANT; i++)
> check 'current' and build the successors data up to that revision
> with write lock:
> check 'current' and build the successors data up to that revision
> updates 'format' with the shard sizes
> (release write lock)
>
> Requiring a dump/load is a barrier for _any_ upgrade from pre-1.8 to
> post-1.8, so I prefer (2) to (1). 'svnadmin upgrade' might want to warn
> about this.
>
> Stefan Sperling wrote on Thu, Sep 01, 2011 at 00:06:40 +0200:
>> On Tue, Aug 30, 2011 at 12:43:29PM +0200, Stefan Sperling wrote:
>> > I'll try to tweak my proposal such that successor ID updates become
>> > transactional and happen as part of every commit.
>>
>> Here's a first shot at this. Comments welcome.
>>
>> To support transactional semantics, any new data written to the FSFS
>> filesystem must be tied to the newly created revision.
>> This way, updating 'current' as the final step in each commit makes the
>> new data appear. Failing to update 'current' will cause any data added
>> by failing commits to be overwritten by the next successfull commit.
>>
>> The following design tries to keep this principle valid for sucessor-ids.
>> There is one compromise though which hopefully won't hurt too much.
>>
>> Thanks again to danielsh for providing some initial feedback via IRC.
>>
>>
>> Implementation proposal
>> =======================
>>
>>
>> - On-disk File Format -
>>
>> All files related to the map are stored in the repos/db/successors
>> directory ("successors directory").
>>
>> The successor map contains 3 kinds of files:
>>
>> 1) Node-revision map files.
>> These files map node-revision IDs to lists of revisions.
>> This map answers the question:
>> "In which revisions were successors of a node-revision created?"
>>
>> 2) Created-revision map file.
>> These files map revision numbers of file offsets in successor data files.
>> This map answers the question:
>> "Where in the successor data are those successors which were created
>> in revision N?"
>>
>> 3) Successor data files.
>> These files store raw successor data.
>>
>> The different kinds of files refer to each other as shown below:
>>
>> node_rev_id map revision map successor data
>> +-------+ +--------+ +-----------+
>> | ... | | ... | | ... |
>> +------>rN---------+ | offset | +----->successor |
>> | | ... | | | offset | | | successor |
>> | | ... | +----->offset----+ | END |
>> node_rev_id --+ | rQ | | ... | | successor |
>> | | ... | | offset | | ... |
>> +------>rX---------+ | ... | | ... |
>> | ... | | | ... | | END |
>> | ... | +----->offset---------->successor |
>> | rZ | | ... | | END |
>> +-------+ +--------+ +-----------+
>>
>> Each type of file is described below in detail.
>>
>>
>> -- Node-revision-ID map files --
>>
>> The purpose of node-revision-id map files is to facilitate lookup
>> of successors of a given node-revision.
>>
>> The files are named after the number of the revision the node-revision
>> was created in, modulo some fixed to-be-determined value (i.e. there
>> won't be one file per node revision, but one file for every 10, 100,
>> or so, node-revisions).
>>
>> The files are stored within the "node-revs" sub-directory:
>>
>> db/successors/node-revs/1
>> db/successors/node-revs/2
>> db/successors/node-revs/3
>> ...
>> db/successors/node-revs/N
>> db/successors/node-revs/M
>> ...
>>
>> Each node-revision map file stores a mapping of the form
>> node_rev_id => [revnum, revnum, ...]
>> The revision numbers identify those revisions in which one or more
>> successors of a given node-revision were added.
>>
>> In the first iteration of this design, the mapping is stored as lines
>> of ASCII text. Each line contains an unparsed node_revision_id, a space
>> separator (ASCII 0x20), and a revision number in ASCII decimal representation.
>> (The design may be refined later to use a more efficient storage model.)
>>
>>
>> -- Revision map files --
>>
>> The purpose of the revision map file is to facilitate lookup
>> of successor data created in a given revision.
>>
>> The files are named after the numbers of revisions they are responsible for,
>> modulo some fixed to-be-determined value (i.e. there won't be one file
>> per revision, but one file for every 10, 100, or so, revisions; each file
>> is responsible for some range of revisions).
>>
>> The files are stored within the "revs" sub-directory:
>>
>> db/successors/revs/1
>> db/successors/revs/2
>> db/successors/revs/3
>> ...
>> db/successors/revs/N
>> db/successors/revs/M
>> ...
>>
>>
>> Each file consists of an array of 64bit big-endian integers.
>> The N'th integer (starting from N=1) in the file specifies the offset
>> of successor data (in the corresponding successor data file) which was
>> added in the N'th revision within the revision range the map file is
>> responsible for.
>>
>>
>> -- Successor-data files --
>>
>> These files use the same naming scheme as the revision map files (i.e.
>> each successor data file is responsible for successors created within
>> a specific range of revisions).
>>
>> The files are stored within the "successor-ids" sub-directory:
>>
>> db/successors/successor-ids/1
>> db/successors/successor-ids/2
>> db/successors/successor-ids/3
>> ...
>> db/successors/successor-ids/N
>> db/successors/successor-ids/M
>> ...
>>
>>
>> Each file consists of lines each containing two unparsed node_rev_id
>> ASCII strings, separated by ASCII 0x20. The first node-revision ID is a
>> predecessor, and the second one is one of its successors.
>> The special string "END" on a single line signifies that the following
>> successor data belongs to the next revision.
>> (The design may be refined later to use a more efficient storage model.)
>>
>>
>> - Procedure for Readers -
>>
>> A reader first reads the 'current' file and remembers the revision number.
>>
>> It locates the node-revision map file corresponding to the given node_rev_id,
>> based on the node_rev_id's revision modulo a known fixed value.
>>
>> It opens this file and reads it to obtain a list of revisions which created
>> successors of the given node_rev_id. It ignores lines listing node_rev_ids
>> not matching the given node_rev_id. It also ignores revisions greater than
>> 'current'.
>>
>> Next, the reader opens the corresponding revision map files,
>> based on revision numbers in the list, each modulo a known fixed value.
>> For each revision N in the revision list, it seeks to the revision's byte
>> offset within a revision map file and reads 4 bytes to obtain the offset
>> corresponding to revision N within a successor data file.
>>
>> The reader then opens the corresponding successor data files,
>> based on revision numbers in the list, each modulo a known fixed value.
>> For each revision offset, it seeks to this offset and reads lines until it
>> finds the line "END". It loops through all lines and appends those
>> successor IDs to the set of successors which list a predecessor matching
>> the given node_rev_id.
>>
>> In exceptional cases (see below) it is possible that a given revision
>> does not actually contain any successors of the given node_rev_id.
>>
>>
>> - Procedure for Writers -
>>
>> Assume that the writer has access to a mapping from predecessor
>> node-revision ID to one or more successor node-revision IDs.
>> (The exact mechanism managing this mapping still needs to be determined.
>> But it can be assumed to be available as all predecessors and successors
>> are known within the context of a commit.)
>>
>> When committing a finished transaction (in fs_fs.c:commit_body()),
>> the writer obtains a write-lock on all files within the successors
>> directory it needs to modify.
>>
>> Let rN be the new revision the writer is committing.
>>
>> The writer creates an empty temporary file.
>>
>> If rN falls within the revision range of an existing successor data
>> file, the writer looks up the offset of revision N-1 in the corresponding
>> revision map file. It copies content from the corresponding successor data
>> file to a temporary file up to this offset, and copies any data that follows
>> until a line containing "END" has been copied. (Note that this step discards
>> data left behind by any previous attempt at committing rN).
>>
>> The writer appends any new successor data to the temporary file (note
>> that there may be no successor data in case the revision is empty).
>> It then adds a terminating "END".
>>
>> The writer creates another empty temporary file.
>>
>> If rN falls within the revision range of an existing revision map file,
>> it copies ((N-2)*4) bytes of content from the revision map file into the
>> temporary file. (Note that this step discards data left behind by any
>> previous attempt at committing rN).
>>
>> The writer appends, to the temporary file, the offset of the new data
>> it wrote into the successor data file.
>>
>> Likewise, the writer creates empty temporary files for each node-revision
>> map files is needs to modify. It copies all content from any such
>> node-revision map files which already exist, and appends a line to each
>> file containing the node_revision ID and the new revision number.
>>
>> After moving the proto-revprop file into place, and before updating
>> 'current', the writer moves temporary files into place in the successors
>> directory, in the following order:
>>
>> 1) the new successor data file
>> 2) the new revision map file
>> 3) each new node_rev_id map file
>>
>> If the writer fails to update all files in the successors directory,
>> it will also fail to update 'current'. In this case, readers will ignore
>> the new successor data until another commit succeeds in creating rN.
>> Once a commit of rN has succeeded, readers following node-rev-id map
>> entries updated by the failing commit might end up with an empty
>> successors list for rN. Such erroneous entries will not be cleaned
>> up automatically, but can be eliminated by re-creating the repository
>> (e.g. via a dump/load cycle). However, the actual successor data stored
>> for committed revisions is always correct, and the likelihood of incorrect
>> node-revision ID map entries to occur is small.
>

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/

Received on 2011-09-06 17:51:11 CEST

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"
Previous message: Philip Martin: "Re: svn commit: r1164796 - /subversion/branches/1.7.x/STATUS"
In reply to: Daniel Shahaf: "'svnadmin upgrade' Re: FSFS successor ID design draft"
Next in thread: Daniel Shahaf: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"
Reply: Daniel Shahaf: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]