Re: 'svnadmin upgrade' Re: FSFS successor ID design draft

From: Daniel Shahaf <danielsh_at_elego.de>
Date: Tue, 6 Sep 2011 18:59:54 +0300

That's the tool I had in mind, but to "lazily populate the cache" is
the tricky bit.

Mark Phippard wrote on Tue, Sep 06, 2011 at 11:50:40 -0400:
> If it makes any sense, you could look at what we did in 1.5 for merge tracking.
>
> Merge tracking required a new bit of information -- the node origin
> index. Dump/Load automatically populated this index. svnadmin
> upgrade bumped the format but did not populate the index. Instead the
> index was lazily built as needed and you suffered the performance
> penalty of doing it this way. To offset this, we provided a separate
> tool svn-populate-node-origins-index which a user can run of they want
> to avoid the dump/load but still have predictable performance. The
> tool runs a lot faster than a dump/load and basically lets you
> manually pre-populate the index so that users do not get the
> inconsistent performance they would get if it was being built on the
> fly at unpredictable moments.
>
> Mark
>
>
>
> On Tue, Sep 6, 2011 at 11:42 AM, Daniel Shahaf <danielsh_at_elego.de> wrote:
> > r1165700 mentions that we need to decide what to do with 'svnadmin
> > upgrade'.
> >
> > 1. We could punt and require a dump/load. All format >=6 FSFS instances
> > always have a full successors store.
> >
> > 2. We could store the progeny shard size and the successors data shard
> > size in the 'format' file. If those values are absent (or zero),
> > then the successor lookup operation returns SVN_ERR_UNSUPPORTED_FEATURE.
> >
> > We can also reconstruct the successors cache without a dump/load, via
> > a tools/ binary that operates as follows:
> >
> > for (i = 0; i < CONSTANT; i++)
> > check 'current' and build the successors data up to that revision
> > with write lock:
> > check 'current' and build the successors data up to that revision
> > updates 'format' with the shard sizes
> > (release write lock)
> >
> > Requiring a dump/load is a barrier for _any_ upgrade from pre-1.8 to
> > post-1.8, so I prefer (2) to (1). 'svnadmin upgrade' might want to warn
> > about this.
> >
> > Stefan Sperling wrote on Thu, Sep 01, 2011 at 00:06:40 +0200:
> >> On Tue, Aug 30, 2011 at 12:43:29PM +0200, Stefan Sperling wrote:
> >> > I'll try to tweak my proposal such that successor ID updates become
> >> > transactional and happen as part of every commit.
> >>
> >> Here's a first shot at this. Comments welcome.
> >>
> >> To support transactional semantics, any new data written to the FSFS
> >> filesystem must be tied to the newly created revision.
> >> This way, updating 'current' as the final step in each commit makes the
> >> new data appear. Failing to update 'current' will cause any data added
> >> by failing commits to be overwritten by the next successfull commit.
> >>
> >> The following design tries to keep this principle valid for sucessor-ids.
> >> There is one compromise though which hopefully won't hurt too much.
> >>
> >> Thanks again to danielsh for providing some initial feedback via IRC.
> >>
> >>
> >> Implementation proposal
> >> =======================
> >>
> >>
> >> - On-disk File Format -
> >>
> >> All files related to the map are stored in the repos/db/successors
> >> directory ("successors directory").
> >>
> >> The successor map contains 3 kinds of files:
> >>
> >> 1) Node-revision map files.
> >> These files map node-revision IDs to lists of revisions.
> >> This map answers the question:
> >> "In which revisions were successors of a node-revision created?"
> >>
> >> 2) Created-revision map file.
> >> These files map revision numbers of file offsets in successor data files.
> >> This map answers the question:
> >> "Where in the successor data are those successors which were created
> >> in revision N?"
> >>
> >> 3) Successor data files.
> >> These files store raw successor data.
> >>
> >> The different kinds of files refer to each other as shown below:
> >>
> >> node_rev_id map revision map successor data
> >> +-------+ +--------+ +-----------+
> >> | ... | | ... | | ... |
> >> +------>rN---------+ | offset | +----->successor |
> >> | | ... | | | offset | | | successor |
> >> | | ... | +----->offset----+ | END |
> >> node_rev_id --+ | rQ | | ... | | successor |
> >> | | ... | | offset | | ... |
> >> +------>rX---------+ | ... | | ... |
> >> | ... | | | ... | | END |
> >> | ... | +----->offset---------->successor |
> >> | rZ | | ... | | END |
> >> +-------+ +--------+ +-----------+
> >>
> >> Each type of file is described below in detail.
> >>
> >>
> >> -- Node-revision-ID map files --
> >>
> >> The purpose of node-revision-id map files is to facilitate lookup
> >> of successors of a given node-revision.
> >>
> >> The files are named after the number of the revision the node-revision
> >> was created in, modulo some fixed to-be-determined value (i.e. there
> >> won't be one file per node revision, but one file for every 10, 100,
> >> or so, node-revisions).
> >>
> >> The files are stored within the "node-revs" sub-directory:
> >>
> >> db/successors/node-revs/1
> >> db/successors/node-revs/2
> >> db/successors/node-revs/3
> >> ...
> >> db/successors/node-revs/N
> >> db/successors/node-revs/M
> >> ...
> >>
> >> Each node-revision map file stores a mapping of the form
> >> node_rev_id => [revnum, revnum, ...]
> >> The revision numbers identify those revisions in which one or more
> >> successors of a given node-revision were added.
> >>
> >> In the first iteration of this design, the mapping is stored as lines
> >> of ASCII text. Each line contains an unparsed node_revision_id, a space
> >> separator (ASCII 0x20), and a revision number in ASCII decimal representation.
> >> (The design may be refined later to use a more efficient storage model.)
> >>
> >>
> >> -- Revision map files --
> >>
> >> The purpose of the revision map file is to facilitate lookup
> >> of successor data created in a given revision.
> >>
> >> The files are named after the numbers of revisions they are responsible for,
> >> modulo some fixed to-be-determined value (i.e. there won't be one file
> >> per revision, but one file for every 10, 100, or so, revisions; each file
> >> is responsible for some range of revisions).
> >>
> >> The files are stored within the "revs" sub-directory:
> >>
> >> db/successors/revs/1
> >> db/successors/revs/2
> >> db/successors/revs/3
> >> ...
> >> db/successors/revs/N
> >> db/successors/revs/M
> >> ...
> >>
> >>
> >> Each file consists of an array of 64bit big-endian integers.
> >> The N'th integer (starting from N=1) in the file specifies the offset
> >> of successor data (in the corresponding successor data file) which was
> >> added in the N'th revision within the revision range the map file is
> >> responsible for.
> >>
> >>
> >> -- Successor-data files --
> >>
> >> These files use the same naming scheme as the revision map files (i.e.
> >> each successor data file is responsible for successors created within
> >> a specific range of revisions).
> >>
> >> The files are stored within the "successor-ids" sub-directory:
> >>
> >> db/successors/successor-ids/1
> >> db/successors/successor-ids/2
> >> db/successors/successor-ids/3
> >> ...
> >> db/successors/successor-ids/N
> >> db/successors/successor-ids/M
> >> ...
> >>
> >>
> >> Each file consists of lines each containing two unparsed node_rev_id
> >> ASCII strings, separated by ASCII 0x20. The first node-revision ID is a
> >> predecessor, and the second one is one of its successors.
> >> The special string "END" on a single line signifies that the following
> >> successor data belongs to the next revision.
> >> (The design may be refined later to use a more efficient storage model.)
> >>
> >>
> >> - Procedure for Readers -
> >>
> >> A reader first reads the 'current' file and remembers the revision number.
> >>
> >> It locates the node-revision map file corresponding to the given node_rev_id,
> >> based on the node_rev_id's revision modulo a known fixed value.
> >>
> >> It opens this file and reads it to obtain a list of revisions which created
> >> successors of the given node_rev_id. It ignores lines listing node_rev_ids
> >> not matching the given node_rev_id. It also ignores revisions greater than
> >> 'current'.
> >>
> >> Next, the reader opens the corresponding revision map files,
> >> based on revision numbers in the list, each modulo a known fixed value.
> >> For each revision N in the revision list, it seeks to the revision's byte
> >> offset within a revision map file and reads 4 bytes to obtain the offset
> >> corresponding to revision N within a successor data file.
> >>
> >> The reader then opens the corresponding successor data files,
> >> based on revision numbers in the list, each modulo a known fixed value.
> >> For each revision offset, it seeks to this offset and reads lines until it
> >> finds the line "END". It loops through all lines and appends those
> >> successor IDs to the set of successors which list a predecessor matching
> >> the given node_rev_id.
> >>
> >> In exceptional cases (see below) it is possible that a given revision
> >> does not actually contain any successors of the given node_rev_id.
> >>
> >>
> >> - Procedure for Writers -
> >>
> >> Assume that the writer has access to a mapping from predecessor
> >> node-revision ID to one or more successor node-revision IDs.
> >> (The exact mechanism managing this mapping still needs to be determined.
> >> But it can be assumed to be available as all predecessors and successors
> >> are known within the context of a commit.)
> >>
> >> When committing a finished transaction (in fs_fs.c:commit_body()),
> >> the writer obtains a write-lock on all files within the successors
> >> directory it needs to modify.
> >>
> >> Let rN be the new revision the writer is committing.
> >>
> >> The writer creates an empty temporary file.
> >>
> >> If rN falls within the revision range of an existing successor data
> >> file, the writer looks up the offset of revision N-1 in the corresponding
> >> revision map file. It copies content from the corresponding successor data
> >> file to a temporary file up to this offset, and copies any data that follows
> >> until a line containing "END" has been copied. (Note that this step discards
> >> data left behind by any previous attempt at committing rN).
> >>
> >> The writer appends any new successor data to the temporary file (note
> >> that there may be no successor data in case the revision is empty).
> >> It then adds a terminating "END".
> >>
> >> The writer creates another empty temporary file.
> >>
> >> If rN falls within the revision range of an existing revision map file,
> >> it copies ((N-2)*4) bytes of content from the revision map file into the
> >> temporary file. (Note that this step discards data left behind by any
> >> previous attempt at committing rN).
> >>
> >> The writer appends, to the temporary file, the offset of the new data
> >> it wrote into the successor data file.
> >>
> >> Likewise, the writer creates empty temporary files for each node-revision
> >> map files is needs to modify. It copies all content from any such
> >> node-revision map files which already exist, and appends a line to each
> >> file containing the node_revision ID and the new revision number.
> >>
> >> After moving the proto-revprop file into place, and before updating
> >> 'current', the writer moves temporary files into place in the successors
> >> directory, in the following order:
> >>
> >> 1) the new successor data file
> >> 2) the new revision map file
> >> 3) each new node_rev_id map file
> >>
> >> If the writer fails to update all files in the successors directory,
> >> it will also fail to update 'current'. In this case, readers will ignore
> >> the new successor data until another commit succeeds in creating rN.
> >> Once a commit of rN has succeeded, readers following node-rev-id map
> >> entries updated by the failing commit might end up with an empty
> >> successors list for rN. Such erroneous entries will not be cleaned
> >> up automatically, but can be eliminated by re-creating the repository
> >> (e.g. via a dump/load cycle). However, the actual successor data stored
> >> for committed revisions is always correct, and the likelihood of incorrect
> >> node-revision ID map entries to occur is small.
> >
>
>
>
> --
> Thanks
>
> Mark Phippard
> http://markphip.blogspot.com/
Received on 2011-09-06 18:00:53 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"
Previous message: Mark Phippard: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"
In reply to: Mark Phippard: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"
Next in thread: Stefan Sperling: "Re: 'svnadmin upgrade' Re: FSFS successor ID design draft"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]