Re: Pristine store - spec

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Wed, 16 Feb 2011 14:06:13 +0200

Julian Foad wrote on Tue, Feb 15, 2011 at 15:06:43 +0000:
> Would anyone be able to review this spec please? I'm trying to get
> straight what locking / access control rules need to be.
>

Sure, review below. Having finished the review, though, I wonder if
I amn't missing some details of the big picture while studying all the
low-level details. (If I did, please point them out.)

> /*
> * THE PRISTINE STORE
> * ==================
> *
> * === Introduction ===
> *
> * The Pristine Store is the part of the Working Copy metadata that holds
> * a local copy of the full text of the base version of each WC file.

Currently, yes. But in general it's just a cache of the text at
non-directory (file,revision) tuples, right?

i.e., there's no reason it has to be a subset or superset of the BASE tree.

> *
> * Texts in the Pristine Store are addressed only by their SHA-1 checksum.
> * The Pristine Store does not track which text relates to which repository
> * and revision and path. The Pristine Store does not hold pristine copies
> * of directories, nor of properties.
> *
> * The Pristine Store data is held in
> * * the 'PRISTINE' table in the SQLite Data Base (SDB), and
> * * the files in the 'pristine' directory.
> *
> * This specification uses SDB transactions to ensure the consistency of
> * writes and reads.
> *

Nit:
This doesn't say how this spec ensures the disk is consistent with the DB.

> * ==== Invariants ====
> *
> * The operating procedures below maintain the following invariants.
> * These invariants apply at all times except within the SDB txns defined
> * below.
> *
> * * Each row in the PRISTINE table has an associated pristine text file
> * that is not open for writing and is available for reading and whose
> * content matches the columns 'size', 'checksum', 'md5_checksum'.
> *

* Once written, a pristine text file never changes.

> * ==== Operating Procedures ====
> *
> * The steps should be carried out in the order specified. (See rationale.)
> *
> * * To add a pristine, do the following inside an SDB txn:
> * * Add the table row, and set the refcount as desired. If a row
> * already exists, add the desired refcount to its refcount, and
> * preferably verify the old row matches the new metadata.
> * * Create the file. Creation should be fs-atomic, e.g. by moving a
> * new file into place, so as never to orphan a partial file. If a
> * file already exists, preferably leave it rather than replace it,
> * and optionally verify it matches the new metadata (e.g. length).

What happens if you crash (kill -9) between these two steps?

(Reading ahead: I assume that in that case you'll leave a stray wc lock
or wq entry behind?)

(OT: how about numbering or naming the steps and invariants, so it's
easier to refer to them?)

> *
> * * To remove a pristine, do the following inside an SDB txn:
> * * First, check refcount == 0, and abort if not.
> * * Delete the table row.
> * * Delete the file or move it away. (If not present, log a
> * consistency error but, in a release build, return success.)
> *
> * * To query a pristine's existence or SDB metadata, the reader must:
> * * Ensure no pristine-remove txn is in progress while querying it.
> *

Why isn't it sufficient to "SELECT (refcount > 0) WHERE checksum = ?1
FROM pristine", without any ensuring?

Yes, an 'rm' txn could be parallel --- but won't that be indistinguishable
from the case than an 'rm' tnx begun instantaneously after the above
query finished?

(Reading my "Reading ahead" comment above: okay, if there are some
higher-level locks, then maybe a parallel 'rm' txn isn't possible)

> * * To read a pristine text, the reader must:
> * * Ensure no pristine-remove txn is in progress while querying and
> * opening it.

And what if the txn started after you opened it and returned the file
handle (stream) to the caller?

Perhaps a wrapper stream that makes sure the correct file was read (eg,
by comparing the length of the data read from the stream to the stored
length, and returning an error if the stream ended prematurely)?

> * * Ensure the pristine text remains in the store continuously from
> * opening it for the duration of the read. (Perhaps by ensuring
> * refcount remains >= 1 and/or by cooperating with the clean-up
> * code.)

I find this vague: how does one perform the "ensure" steps?

> *
> * ==== Rationale ====
> *
> * * Adding a pristine:
> * * We can't add the file *before* the SDB txn takes out a lock,
> * because that would leave a gap in which another process could
> * see this file as an orphan and delete it.
> * * Within the txn, the table row could be added after creating the
> * file; it makes no difference as it will not become externally
> * visible until commit. But then we would have to take out a lock
> * explicitly before adding the file. Adding the row takes out a
> * lock implicitly, so doing it first avoids an extra step.

It's not an "extra" step if it's done anyway (implicitly) :)

> * * Leaving an existing file in place is less likely to interfere with
> * processes that are currently reading from the file. Replacing it
> * might also be acceptable, but that would need further
> * investigation.
> *

> * * Removing a pristine:
> * * We can't remove the file *after* the SDB txn that updates the
> * table, because that would leave a gap in which another process
> * might re-add this same pristine file and then we would delete it.
> * * Within the txn, the table row could be removed after creating the
> * file, but see the rationale for adding a pristine.
> * * In a typical use case for removing a pristine text, the caller
> * would check the refcount before starting this txn, but
> * nevertheless it may have changed and so must be checked again
> * inside the txn.
> *

> * * In the add and remove txns, we need to acquire an SDB 'RESERVED'
> * lock before adding or removing the file. This can be done by starting
> * the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
> * as the table row update). ### Would a 'SHARED' lock be sufficient,
> * and if so would it be noticably better?
> *

Does that mean "We need an SDB lock that blocks both readers and writers"
in non-sqlite-specific terms?

> * ==== Notes ====
> *
> * * This procedure can leave orphaned pristine files (files without a
> * corresponding SDB row) if Subvsersion crashes. The Pristine Store
> * will still operate correctly. It should be easy to teach "svn cleanup"
> * to safely delete these. ### Do we need to define the clean-up
> * procedure here?
> *

Yes please. (It bypasses some of the DB-keyed logic so I think it'll be
good to spell it out)

> * * This specification is conceptually simple, but requires completing disk
> * operations within SDB transactions, which may make it too inefficient
> * in practice. An alternative specification could use the Work Queue to
> * enable more efficient processing of multiple transactions.
> *

I won't call it simple unless you dereference that "Ensure" I mentioned
earlier :).

For example, would returning a pristine stream to a caller would require
some long-living state on the DB handle, to avoid it being prematurely
deleted?

> * REFERENCE COUNTING
> * ==================
> *
> * The Pristine Store spec above defines how texts are added and removed
> * from the store. This spec defines how the addition and removal of
> * pristine text references within the WC DB are co-ordinated with the
> * addition and removal of the pristine texts themselves.
> *
> * One requirement is to allow a pristine text to be stored some
> * time before the reference to it is written into the NODES table. The
> * 'commit' code path, for example, needs to store a file's new pristine
> * text somewhere (and the pristine store is an obvious option) and then,
> * when the commit succeeds, update the WC to reference it.
> *

To clarify: I believe you mean "Needs" as it's currently designed,
rather than "Needs" due to the inherent semantics of a 'commit'
operation. (Extreme example: 'commit' of rN could leave the wc at
rN-1.)

> * Store-then-reference could be achieved by:
> *
> * (a) Store text outside Pristine Store. When commit succeeds, add it
> * to the Pristine Store and reference it in the WC; if commit
> * fails, remove the temporary text.
> * (b) Store text in Pristine Store with initial ref count = 0. When
> * commit succeeds, add the reference and update the ref count; if
> * commit fails, optionally try to purge this pristine text.
> * (c) Store text in Pristine Store with initial ref count = 1. When
> * commit succeeds, add the reference; if commit fails, decrement
> * the ref count and optionally try to purge it.
> *
> * Method (a) would require, in effect, implementing an ad-hoc temporary
> * Pristine Store, which seems needless duplication of effort. It would
> * also require changing the way the commit code path passes information
> * around, which might be no bad thing in the long term, but the result
> * would not appear to have any advantage over method (b).
> *
> * Method (b) plays well with automatically maintaining the ref counts
> * equal to the number of in-SDB references, at the granularity of SDB
> * txns. It requires an interlock between adding/deleting references and
> * purging unreferenced pristines - e.g. guard each of these operations by
> * a WC lock.
> * * Add a pristine & reference it => any WC lock
> * (To prevent purging it while adding.)
> * * Unreference a pristine => no lock needed.
> * * Unreference a pristine & purge-if-0 => Same as doing these separately.
> * * Purge any/all refcount==0 pristines => an exclusive WC lock.
> * (To prevent adding a ref while purging.)
> * * If a WC lock remains after a crash, then purge refcount==0 pristines.
> *

I don't understand this paragraph. What does it propose to do in order
to avoid purging pristines that have refcount=0 while a commit is in
progress? Does it propose that 'commit' take a lock for the duration
from adding the pristine to incrementing its refcount?

> * Method (c):
> * * ### Not sure about this one - haven't thought it through in detail...
> * * Add a pristine & reference in separate steps => any WC lock (?)
> * * Remove a reference requires ... (nothing more?)
> * * Find & purge unreferenced pristines requires an exclusive WC lock.
> * * Ref counts are sometimes too high while a WC lock is held, so
> * uncertain after a crash if WC locks remain, so need to be re-counted
> * during clean-up.
> *

I think (c) has some good points. There's a variant, (c'), where
instead of overloading the refcount field you use a separate
refcounts_by_commitsinprogress flag; in that scheme, 'find & purge'
doesn't need a lock, it just needs to respect that flag.

> * We choose method (b).
> *
> *
> * === Invariants in a Valid WC DB State ===
> *
> * * No pristine text, even if refcount == 0, will be deleted from the store
> * as long as any process holds any WC lock in this WC.
> *

Surely you mean "any *other* process" :-)

> * The following conditions are always true outside of a SQL txn:
> *
> * * The 'checksum' column in each NODES table row is either NULL or
> * references a primary key in the 'pristine' table.
> *
> * * The 'refcount' column in each PRISTINE table row is equal to the
> * number of NODES table rows whose 'checksum' column references this
> * pristine row.
> *

Will we have wc-checks.sql triggers enforce these?

> * The following conditions are always true
> * outside of a SQL txn,
> * when the Work Queue is empty:
> * (### ?) when no WC locks are held by any process:
> *
> * * The 'refcount' column in a PRISTINE table row equals the number of
> * NODES table rows whose 'checksum' column references that pristine row.
> * It may be zero.
> *

IOW, the refcount may be zero if the work queue is non empty.

But a few pages above you said that this specification doesn't use the
work queue to manage the pristine files. Why is the work queue involved
here then? Does this spec use the work queue for the disk-tree files
but not for the .svn/pristine/ files?

> * ==== Operating Procedures ====
> *
> * The steps should be carried out in the order specified.
> *
> * * To add a pristine text reference to the WC, obtain the text and its
> * checksum, and then do this while holding a WC lock:
> * * Add the pristine text to the Pristine Store, setting the desired
> * refcount >= 1.
> * * Add the reference(s) in the NODES table.
> *

So the first bullet will take an SDB lock, release it, and then the
reference will be added while the WC lock is held (and within a separate
SDB txn/lock)?

> * * To remove a pristine text reference from the WC, do this while holding
> * a WC lock:
> * * Remove the reference(s) in the NODES table.
> * * Decrement the pristine text's 'refcount' column.
> *
> * * To purge an unreferenced pristine text, do this with an *exclusive*
> * WC lock:
> * * Check refcount == 0; skip if not.
> * * Remove it from the pristine store.

Why is 'purge an unreferenced pristine text' part of the operating
procedures for incrementing/decrementing refcounts? It doesn't belong
here, does it?

(i.e., there is a 'manage the pristine store' layer --- first part of
your email --- and a 'manage the references' layer, here)

> */
>
>
> - Julian
>
>
Received on 2011-02-16 13:11:29 CET

This message: [ Message body ]
Next message: Stefan Sperling: "Re: contribute for subversion project"
Previous message: Philip Martin: "Re: #3641 vetoed"
In reply to: Julian Foad: "Pristine store - spec"
Next in thread: Branko Čibej: "Re: Pristine store - spec"
Reply: Branko Čibej: "Re: Pristine store - spec"
Reply: Julian Foad: "Re: Pristine store - spec"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]