[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Pristine store - spec

From: Hyrum K Wright <hyrum_at_hyrumwright.org>
Date: Wed, 16 Feb 2011 13:50:35 +0000

No comments on the content, but reading the ensuing email thread, it
may be useful to put the document in notes/wc-ng/pristines, and add
questions / comments / corrections there. It would allow folks down
the road to see all the critiques in one location, rather than reading
N mails.
</bike-shed>

-Hyrum

On Tue, Feb 15, 2011 at 3:06 PM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> Would anyone be able to review this spec please?  I'm trying to get
> straight what locking / access control rules need to be.
>
> /*
>  * THE PRISTINE STORE
>  * ==================
>  *
>  * === Introduction ===
>  *
>  * The Pristine Store is the part of the Working Copy metadata that holds
>  * a local copy of the full text of the base version of each WC file.
>  *
>  * Texts in the Pristine Store are addressed only by their SHA-1 checksum.
>  * The Pristine Store does not track which text relates to which repository
>  * and revision and path.  The Pristine Store does not hold pristine copies
>  * of directories, nor of properties.
>  *
>  * The Pristine Store data is held in
>  *  * the 'PRISTINE' table in the SQLite Data Base (SDB), and
>  *  * the files in the 'pristine' directory.
>  *
>  * This specification uses SDB transactions to ensure the consistency of
>  * writes and reads.
>  *
>  * ==== Invariants ====
>  *
>  * The operating procedures below maintain the following invariants.
>  * These invariants apply at all times except within the SDB txns defined
>  * below.
>  *
>  * * Each row in the PRISTINE table has an associated pristine text file
>  *   that is not open for writing and is available for reading and whose
>  *   content matches the columns 'size', 'checksum', 'md5_checksum'.
>  *
>  * ==== Operating Procedures ====
>  *
>  * The steps should be carried out in the order specified.  (See rationale.)
>  *
>  * * To add a pristine, do the following inside an SDB txn:
>  *      * Add the table row, and set the refcount as desired.  If a row
>  *        already exists, add the desired refcount to its refcount, and
>  *        preferably verify the old row matches the new metadata.
>  *      * Create the file. Creation should be fs-atomic, e.g. by moving a
>  *        new file into place, so as never to orphan a partial file.  If a
>  *        file already exists, preferably leave it rather than replace it,
>  *        and optionally verify it matches the new metadata (e.g. length).
>  *
>  * * To remove a pristine, do the following inside an SDB txn:
>  *      * First, check refcount == 0, and abort if not.
>  *      * Delete the table row.
>  *      * Delete the file or move it away. (If not present, log a
>  *        consistency error but, in a release build, return success.)
>  *
>  * * To query a pristine's existence or SDB metadata, the reader must:
>  *      * Ensure no pristine-remove txn is in progress while querying it.
>  *
>  * * To read a pristine text, the reader must:
>  *      * Ensure no pristine-remove txn is in progress while querying and
>  *        opening it.
>  *      * Ensure the pristine text remains in the store continuously from
>  *        opening it for the duration of the read. (Perhaps by ensuring
>  *        refcount remains >= 1 and/or by cooperating with the clean-up
>  *        code.)
>  *
>  * ==== Rationale ====
>  *
>  * * Adding a pristine:
>  *      * We can't add the file *before* the SDB txn takes out a lock,
>  *        because that would leave a gap in which another process could
>  *        see this file as an orphan and delete it.
>  *      * Within the txn, the table row could be added after creating the
>  *        file; it makes no difference as it will not become externally
>  *        visible until commit.  But then we would have to take out a lock
>  *        explicitly before adding the file.  Adding the row takes out a
>  *        lock implicitly, so doing it first avoids an extra step.
>  *      * Leaving an existing file in place is less likely to interfere with
>  *        processes that are currently reading from the file.  Replacing it
>  *        might also be acceptable, but that would need further
>  *        investigation.
>  *
>  * * Removing a pristine:
>  *      * We can't remove the file *after* the SDB txn that updates the
>  *        table, because that would leave a gap in which another process
>  *        might re-add this same pristine file and then we would delete it.
>  *      * Within the txn, the table row could be removed after creating the
>  *        file, but see the rationale for adding a pristine.
>  *      * In a typical use case for removing a pristine text, the caller
>  *        would check the refcount before starting this txn, but
>  *        nevertheless it may have changed and so must be checked again
>  *        inside the txn.
>  *
>  * * In the add and remove txns, we need to acquire an SDB 'RESERVED'
>  *   lock before adding or removing the file.  This can be done by starting
>  *   the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
>  *   as the table row update).  ### Would a 'SHARED' lock be sufficient,
>  *   and if so would it be noticably better?
>  *
>  * ==== Notes ====
>  *
>  * * This procedure can leave orphaned pristine files (files without a
>  *   corresponding SDB row) if Subvsersion crashes.  The Pristine Store
>  *   will still operate correctly.  It should be easy to teach "svn cleanup"
>  *   to safely delete these.  ### Do we need to define the clean-up
>  *   procedure here?
>  *
>  * * This specification is conceptually simple, but requires completing disk
>  *   operations within SDB transactions, which may make it too inefficient
>  *   in practice.  An alternative specification could use the Work Queue to
>  *   enable more efficient processing of multiple transactions.
>  *
>  *
>  * REFERENCE COUNTING
>  * ==================
>  *
>  * The Pristine Store spec above defines how texts are added and removed
>  * from the store.  This spec defines how the addition and removal of
>  * pristine text references within the WC DB are co-ordinated with the
>  * addition and removal of the pristine texts themselves.
>  *
>  * One requirement is to allow a pristine text to be stored some
>  * time before the reference to it is written into the NODES table.  The
>  * 'commit' code path, for example, needs to store a file's new pristine
>  * text somewhere (and the pristine store is an obvious option) and then,
>  * when the commit succeeds, update the WC to reference it.
>  *
>  * Store-then-reference could be achieved by:
>  *
>  *   (a) Store text outside Pristine Store.  When commit succeeds, add it
>  *       to the Pristine Store and reference it in the WC; if commit
>  *       fails, remove the temporary text.
>  *   (b) Store text in Pristine Store with initial ref count = 0.  When
>  *       commit succeeds, add the reference and update the ref count; if
>  *       commit fails, optionally try to purge this pristine text.
>  *   (c) Store text in Pristine Store with initial ref count = 1.  When
>  *       commit succeeds, add the reference; if commit fails, decrement
>  *       the ref count and optionally try to purge it.
>  *
>  * Method (a) would require, in effect, implementing an ad-hoc temporary
>  * Pristine Store, which seems needless duplication of effort.  It would
>  * also require changing the way the commit code path passes information
>  * around, which might be no bad thing in the long term, but the result
>  * would not appear to have any advantage over method (b).
>  *
>  * Method (b) plays well with automatically maintaining the ref counts
>  * equal to the number of in-SDB references, at the granularity of SDB
>  * txns.  It requires an interlock between adding/deleting references and
>  * purging unreferenced pristines - e.g. guard each of these operations by
>  * a WC lock.
>  *   * Add a pristine & reference it => any WC lock
>  *     (To prevent purging it while adding.)
>  *   * Unreference a pristine => no lock needed.
>  *   * Unreference a pristine & purge-if-0 => Same as doing these separately.
>  *   * Purge any/all refcount==0 pristines => an exclusive WC lock.
>  *     (To prevent adding a ref while purging.)
>  *   * If a WC lock remains after a crash, then purge refcount==0 pristines.
>  *
>  * Method (c):
>  *   * ### Not sure about this one - haven't thought it through in detail...
>  *   * Add a pristine & reference in separate steps => any WC lock (?)
>  *   * Remove a reference requires ... (nothing more?)
>  *   * Find & purge unreferenced pristines requires an exclusive WC lock.
>  *   * Ref counts are sometimes too high while a WC lock is held, so
>  *     uncertain after a crash if WC locks remain, so need to be re-counted
>  *     during clean-up.
>  *
>  * We choose method (b).
>  *
>  *
>  * === Invariants in a Valid WC DB State ===
>  *
>  * * No pristine text, even if refcount == 0, will be deleted from the store
>  *   as long as any process holds any WC lock in this WC.
>  *
>  * The following conditions are always true outside of a SQL txn:
>  *
>  *   * The 'checksum' column in each NODES table row is either NULL or
>  *     references a primary key in the 'pristine' table.
>  *
>  *   * The 'refcount' column in each PRISTINE table row is equal to the
>  *     number of NODES table rows whose 'checksum' column references this
>  *     pristine row.
>  *
>  * The following conditions are always true
>  *     outside of a SQL txn,
>  *     when the Work Queue is empty:
>  *     (### ?) when no WC locks are held by any process:
>  *
>  *   * The 'refcount' column in a PRISTINE table row equals the number of
>  *     NODES table rows whose 'checksum' column references that pristine row.
>  *     It may be zero.
>  *
>  * ==== Operating Procedures ====
>  *
>  * The steps should be carried out in the order specified.
>  *
>  * * To add a pristine text reference to the WC, obtain the text and its
>  *   checksum, and then do this while holding a WC lock:
>  *     * Add the pristine text to the Pristine Store, setting the desired
>  *       refcount >= 1.
>  *     * Add the reference(s) in the NODES table.
>  *
>  * * To remove a pristine text reference from the WC, do this while holding
>  *   a WC lock:
>  *     * Remove the reference(s) in the NODES table.
>  *     * Decrement the pristine text's 'refcount' column.
>  *
>  * * To purge an unreferenced pristine text, do this with an *exclusive*
>  *   WC lock:
>  *     * Check refcount == 0; skip if not.
>  *     * Remove it from the pristine store.
>  */
>
>
> - Julian
>
>
>
Received on 2011-02-16 14:51:17 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.