Would anyone be able to review this spec please? I'm trying to get
straight what locking / access control rules need to be.
/*
* THE PRISTINE STORE
* ==================
*
* === Introduction ===
*
* The Pristine Store is the part of the Working Copy metadata that holds
* a local copy of the full text of the base version of each WC file.
*
* Texts in the Pristine Store are addressed only by their SHA-1 checksum.
* The Pristine Store does not track which text relates to which repository
* and revision and path. The Pristine Store does not hold pristine copies
* of directories, nor of properties.
*
* The Pristine Store data is held in
* * the 'PRISTINE' table in the SQLite Data Base (SDB), and
* * the files in the 'pristine' directory.
*
* This specification uses SDB transactions to ensure the consistency of
* writes and reads.
*
* ==== Invariants ====
*
* The operating procedures below maintain the following invariants.
* These invariants apply at all times except within the SDB txns defined
* below.
*
* * Each row in the PRISTINE table has an associated pristine text file
* that is not open for writing and is available for reading and whose
* content matches the columns 'size', 'checksum', 'md5_checksum'.
*
* ==== Operating Procedures ====
*
* The steps should be carried out in the order specified. (See rationale.)
*
* * To add a pristine, do the following inside an SDB txn:
* * Add the table row, and set the refcount as desired. If a row
* already exists, add the desired refcount to its refcount, and
* preferably verify the old row matches the new metadata.
* * Create the file. Creation should be fs-atomic, e.g. by moving a
* new file into place, so as never to orphan a partial file. If a
* file already exists, preferably leave it rather than replace it,
* and optionally verify it matches the new metadata (e.g. length).
*
* * To remove a pristine, do the following inside an SDB txn:
* * First, check refcount == 0, and abort if not.
* * Delete the table row.
* * Delete the file or move it away. (If not present, log a
* consistency error but, in a release build, return success.)
*
* * To query a pristine's existence or SDB metadata, the reader must:
* * Ensure no pristine-remove txn is in progress while querying it.
*
* * To read a pristine text, the reader must:
* * Ensure no pristine-remove txn is in progress while querying and
* opening it.
* * Ensure the pristine text remains in the store continuously from
* opening it for the duration of the read. (Perhaps by ensuring
* refcount remains >= 1 and/or by cooperating with the clean-up
* code.)
*
* ==== Rationale ====
*
* * Adding a pristine:
* * We can't add the file *before* the SDB txn takes out a lock,
* because that would leave a gap in which another process could
* see this file as an orphan and delete it.
* * Within the txn, the table row could be added after creating the
* file; it makes no difference as it will not become externally
* visible until commit. But then we would have to take out a lock
* explicitly before adding the file. Adding the row takes out a
* lock implicitly, so doing it first avoids an extra step.
* * Leaving an existing file in place is less likely to interfere with
* processes that are currently reading from the file. Replacing it
* might also be acceptable, but that would need further
* investigation.
*
* * Removing a pristine:
* * We can't remove the file *after* the SDB txn that updates the
* table, because that would leave a gap in which another process
* might re-add this same pristine file and then we would delete it.
* * Within the txn, the table row could be removed after creating the
* file, but see the rationale for adding a pristine.
* * In a typical use case for removing a pristine text, the caller
* would check the refcount before starting this txn, but
* nevertheless it may have changed and so must be checked again
* inside the txn.
*
* * In the add and remove txns, we need to acquire an SDB 'RESERVED'
* lock before adding or removing the file. This can be done by starting
* the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
* as the table row update). ### Would a 'SHARED' lock be sufficient,
* and if so would it be noticably better?
*
* ==== Notes ====
*
* * This procedure can leave orphaned pristine files (files without a
* corresponding SDB row) if Subvsersion crashes. The Pristine Store
* will still operate correctly. It should be easy to teach "svn cleanup"
* to safely delete these. ### Do we need to define the clean-up
* procedure here?
*
* * This specification is conceptually simple, but requires completing disk
* operations within SDB transactions, which may make it too inefficient
* in practice. An alternative specification could use the Work Queue to
* enable more efficient processing of multiple transactions.
*
*
* REFERENCE COUNTING
* ==================
*
* The Pristine Store spec above defines how texts are added and removed
* from the store. This spec defines how the addition and removal of
* pristine text references within the WC DB are co-ordinated with the
* addition and removal of the pristine texts themselves.
*
* One requirement is to allow a pristine text to be stored some
* time before the reference to it is written into the NODES table. The
* 'commit' code path, for example, needs to store a file's new pristine
* text somewhere (and the pristine store is an obvious option) and then,
* when the commit succeeds, update the WC to reference it.
*
* Store-then-reference could be achieved by:
*
* (a) Store text outside Pristine Store. When commit succeeds, add it
* to the Pristine Store and reference it in the WC; if commit
* fails, remove the temporary text.
* (b) Store text in Pristine Store with initial ref count = 0. When
* commit succeeds, add the reference and update the ref count; if
* commit fails, optionally try to purge this pristine text.
* (c) Store text in Pristine Store with initial ref count = 1. When
* commit succeeds, add the reference; if commit fails, decrement
* the ref count and optionally try to purge it.
*
* Method (a) would require, in effect, implementing an ad-hoc temporary
* Pristine Store, which seems needless duplication of effort. It would
* also require changing the way the commit code path passes information
* around, which might be no bad thing in the long term, but the result
* would not appear to have any advantage over method (b).
*
* Method (b) plays well with automatically maintaining the ref counts
* equal to the number of in-SDB references, at the granularity of SDB
* txns. It requires an interlock between adding/deleting references and
* purging unreferenced pristines - e.g. guard each of these operations by
* a WC lock.
* * Add a pristine & reference it => any WC lock
* (To prevent purging it while adding.)
* * Unreference a pristine => no lock needed.
* * Unreference a pristine & purge-if-0 => Same as doing these separately.
* * Purge any/all refcount==0 pristines => an exclusive WC lock.
* (To prevent adding a ref while purging.)
* * If a WC lock remains after a crash, then purge refcount==0 pristines.
*
* Method (c):
* * ### Not sure about this one - haven't thought it through in detail...
* * Add a pristine & reference in separate steps => any WC lock (?)
* * Remove a reference requires ... (nothing more?)
* * Find & purge unreferenced pristines requires an exclusive WC lock.
* * Ref counts are sometimes too high while a WC lock is held, so
* uncertain after a crash if WC locks remain, so need to be re-counted
* during clean-up.
*
* We choose method (b).
*
*
* === Invariants in a Valid WC DB State ===
*
* * No pristine text, even if refcount == 0, will be deleted from the store
* as long as any process holds any WC lock in this WC.
*
* The following conditions are always true outside of a SQL txn:
*
* * The 'checksum' column in each NODES table row is either NULL or
* references a primary key in the 'pristine' table.
*
* * The 'refcount' column in each PRISTINE table row is equal to the
* number of NODES table rows whose 'checksum' column references this
* pristine row.
*
* The following conditions are always true
* outside of a SQL txn,
* when the Work Queue is empty:
* (### ?) when no WC locks are held by any process:
*
* * The 'refcount' column in a PRISTINE table row equals the number of
* NODES table rows whose 'checksum' column references that pristine row.
* It may be zero.
*
* ==== Operating Procedures ====
*
* The steps should be carried out in the order specified.
*
* * To add a pristine text reference to the WC, obtain the text and its
* checksum, and then do this while holding a WC lock:
* * Add the pristine text to the Pristine Store, setting the desired
* refcount >= 1.
* * Add the reference(s) in the NODES table.
*
* * To remove a pristine text reference from the WC, do this while holding
* a WC lock:
* * Remove the reference(s) in the NODES table.
* * Decrement the pristine text's 'refcount' column.
*
* * To purge an unreferenced pristine text, do this with an *exclusive*
* WC lock:
* * Check refcount == 0; skip if not.
* * Remove it from the pristine store.
*/
- Julian
Received on 2011-02-15 16:07:27 CET