Hi Daniel.  Thanks for this review.
On Wed, 2011-02-16, Daniel Shahaf wrote:
> Julian Foad wrote on Tue, Feb 15, 2011 at 15:06:43 +0000:
> > Would anyone be able to review this spec please?  I'm trying to get
> > straight what locking / access control rules need to be.
> > 
> 
> Sure, review below.  Having finished the review, though, I wonder if
> I amn't missing some details of the big picture while studying all the
> low-level details.  (If I did, please point them out.)
> 
> > /*
> >  * THE PRISTINE STORE
> >  * ==================
> >  *
> >  * === Introduction ===
> >  *
> >  * The Pristine Store is the part of the Working Copy metadata that holds
> >  * a local copy of the full text of the base version of each WC file.
> 
> Currently, yes.  But in general it's just a cache of the text at
> non-directory (file,revision) tuples, right?
> i.e., there's no reason it has to be a subset or superset of the BASE tree.
Correct.  I'll re-write the intro to say it's inherently just a blob
store and then describe roughly what libsvn_wc uses it for.
> >  *
> >  * Texts in the Pristine Store are addressed only by their SHA-1 checksum.
> >  * The Pristine Store does not track which text relates to which repository
> >  * and revision and path.  The Pristine Store does not hold pristine copies
> >  * of directories, nor of properties.
> >  *
> >  * The Pristine Store data is held in
> >  *  * the 'PRISTINE' table in the SQLite Data Base (SDB), and
> >  *  * the files in the 'pristine' directory.
> >  *
> >  * This specification uses SDB transactions to ensure the consistency of
> >  * writes and reads.
> >  *
> 
> Nit:
> This doesn't say how this spec ensures the disk is consistent with the DB.
Do you mean this intro should be more specific and say sthg like
"This spec defines how the store operates so as to ensure
   * consistency between disk and DB;
   * atomicity of, and arbitration between, add and delete and read
operations."
> >  * ==== Invariants ====
> >  *
> >  * The operating procedures below maintain the following invariants.
> >  * These invariants apply at all times except within the SDB txns defined
> >  * below.
> >  *
> >  * * Each row in the PRISTINE table has an associated pristine text file
> >  *   that is not open for writing and is available for reading and whose
> >  *   content matches the columns 'size', 'checksum', 'md5_checksum'.
> >  *
> 
>      * Once written, a pristine text file never changes.
Right.  That's implied by the fact that its SHA-1 checksum is assumed to
be unique, but it's clearer if I say it explicitly.
> >  * ==== Operating Procedures ====
> >  *
> >  * The steps should be carried out in the order specified.  (See rationale.)
> >  *
> >  * * To add a pristine, do the following inside an SDB txn:
> >  * 	* Add the table row, and set the refcount as desired.  If a row
> >  *        already exists, add the desired refcount to its refcount, and
> >  *        preferably verify the old row matches the new metadata.
> >  * 	* Create the file. Creation should be fs-atomic, e.g. by moving a
> >  *        new file into place, so as never to orphan a partial file.  If a
> >  *        file already exists, preferably leave it rather than replace it,
> >  *        and optionally verify it matches the new metadata (e.g. length).
> 
> What happens if you crash (kill -9) between these two steps?
> 
> (Reading ahead: I assume that in that case you'll leave a stray wc lock
> or wq entry behind?)
Not at this level of the spec.  Here, all that happens is that the SDB
txn prepares to commit a change to the table, but if we crash that txn
would be abandoned and that change would never happen.
On the other hand, if we crash after those two steps but before
committing (finalizing) the SDB txn, then the file would exist on disk
as an "orphan", which is harmless.
Moving up a layer to the "Reference Counting" layer of the spec, then
yes a WC lock is taken out before running this txn, so yes that WC lock
would be left behind if we crash anywhere in this sequence (between the
steps or before or after them).
> (OT: how about numbering or naming the steps and invariants, so it's
> easier to refer to them?)
Good idea, now that it's (presumably) becoming more stable.
> >  *
> >  * * To remove a pristine, do the following inside an SDB txn:
> >  * 	* First, check refcount == 0, and abort if not.
> >  * 	* Delete the table row.
> >  * 	* Delete the file or move it away. (If not present, log a
> >  *        consistency error but, in a release build, return success.)
> >  *
> >  * * To query a pristine's existence or SDB metadata, the reader must:
> >  * 	* Ensure no pristine-remove txn is in progress while querying it.
> >  *
> 
> Why isn't it sufficient to "SELECT (refcount > 0) WHERE checksum = ?1
> FROM pristine", without any ensuring?
> 
> Yes, an 'rm' txn could be parallel --- but won't that be indistinguishable
> from the case than an 'rm' tnx begun instantaneously after the above
> query finished?
Hmm... I think you are right.
> (Reading my "Reading ahead" comment above: okay, if there are some
> higher-level locks, then maybe a parallel 'rm' txn isn't possible)
> 
> >  * * To read a pristine text, the reader must:
> >  * 	* Ensure no pristine-remove txn is in progress while querying and
> >  *        opening it.
> 
> And what if the txn started after you opened it and returned the file
> handle (stream) to the caller?
That's addressed by the next bullet point below.
> Perhaps a wrapper stream that makes sure the correct file was read (eg,
> by comparing the length of the data read from the stream to the stored
> length, and returning an error if the stream ended prematurely)?
I think that is not necessary - see Brane's reply about "constraints"
and deleting a file that still has open file handles.
> >  * 	* Ensure the pristine text remains in the store continuously from
> >  *        opening it for the duration of the read. (Perhaps by ensuring
> >  *        refcount remains >= 1 and/or by cooperating with the clean-up
> >  *        code.)
> 
> I find this vague: how does one perform the "ensure" steps?
Well, I'm not quite sure yet.  TODO.
> >  *
> >  * ==== Rationale ====
> >  *
> >  * * Adding a pristine:
> >  * 	* We can't add the file *before* the SDB txn takes out a lock,
> >  *        because that would leave a gap in which another process could
> >  *        see this file as an orphan and delete it.
> >  * 	* Within the txn, the table row could be added after creating the
> >  *        file; it makes no difference as it will not become externally
> >  *        visible until commit.  But then we would have to take out a lock
> >  *        explicitly before adding the file.  Adding the row takes out a
> >  *        lock implicitly, so doing it first avoids an extra step.
> 
> It's not an "extra" step if it's done anyway (implicitly) :)
> 
> >  * 	* Leaving an existing file in place is less likely to interfere with
> >  *        processes that are currently reading from the file.  Replacing it
> >  *        might also be acceptable, but that would need further
> >  *        investigation.
> >  *
> 
> +1
> 
> >  * * Removing a pristine:
> >  * 	* We can't remove the file *after* the SDB txn that updates the
> >  *        table, because that would leave a gap in which another process
> >  *        might re-add this same pristine file and then we would delete it.
> >  * 	* Within the txn, the table row could be removed after creating the
> >  *        file, but see the rationale for adding a pristine.
> >  * 	* In a typical use case for removing a pristine text, the caller
> >  *        would check the refcount before starting this txn, but
> >  *        nevertheless it may have changed and so must be checked again
> >  *        inside the txn.
> >  *
> 
> +1
> 
> >  * * In the add and remove txns, we need to acquire an SDB 'RESERVED'
> >  *   lock before adding or removing the file.  This can be done by starting
> >  *   the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
> >  *   as the table row update).  ### Would a 'SHARED' lock be sufficient,
> >  *   and if so would it be noticably better?
> >  *
> 
> Does that mean "We need an SDB lock that blocks both readers and writers"
> in non-sqlite-specific terms?
Yes.
> >  * ==== Notes ====
> >  *
> >  * * This procedure can leave orphaned pristine files (files without a
> >  *   corresponding SDB row) if Subvsersion crashes.  The Pristine Store
> >  *   will still operate correctly.  It should be easy to teach "svn cleanup"
> >  *   to safely delete these.  ### Do we need to define the clean-up
> >  *   procedure here?
> >  *
> 
> Yes please.  (It bypasses some of the DB-keyed logic so I think it'll be
> good to spell it out)
Okay, will do.
> >  * * This specification is conceptually simple, but requires completing disk
> >  *   operations within SDB transactions, which may make it too inefficient
> >  *   in practice.  An alternative specification could use the Work Queue to
> >  *   enable more efficient processing of multiple transactions.
> >  *
> 
> I won't call it simple unless you dereference that "Ensure" I mentioned
> earlier :).
> 
> For example, would returning a pristine stream to a caller would require
> some long-living state on the DB handle, to avoid it being prematurely
> deleted?
See above.
> >  * REFERENCE COUNTING
> >  * ==================
> >  *
> >  * The Pristine Store spec above defines how texts are added and removed
> >  * from the store.  This spec defines how the addition and removal of
> >  * pristine text references within the WC DB are co-ordinated with the
> >  * addition and removal of the pristine texts themselves.
> >  *
> >  * One requirement is to allow a pristine text to be stored some
> >  * time before the reference to it is written into the NODES table.  The
> >  * 'commit' code path, for example, needs to store a file's new pristine
> >  * text somewhere (and the pristine store is an obvious option) and then,
> >  * when the commit succeeds, update the WC to reference it.
> >  *
> 
> To clarify: I believe you mean "Needs" as it's currently designed,
> rather than "Needs" due to the inherent semantics of a 'commit'
> operation.  (Extreme example: 'commit' of rN could leave the wc at
> rN-1.)
Yes.
> >  * Store-then-reference could be achieved by:
> >  *
> >  *   (a) Store text outside Pristine Store.  When commit succeeds, add it
> >  *       to the Pristine Store and reference it in the WC; if commit
> >  *       fails, remove the temporary text.
> >  *   (b) Store text in Pristine Store with initial ref count = 0.  When
> >  *       commit succeeds, add the reference and update the ref count; if
> >  *       commit fails, optionally try to purge this pristine text.
> >  *   (c) Store text in Pristine Store with initial ref count = 1.  When
> >  *       commit succeeds, add the reference; if commit fails, decrement
> >  *       the ref count and optionally try to purge it.
> >  *
> >  * Method (a) would require, in effect, implementing an ad-hoc temporary
> >  * Pristine Store, which seems needless duplication of effort.  It would
> >  * also require changing the way the commit code path passes information
> >  * around, which might be no bad thing in the long term, but the result
> >  * would not appear to have any advantage over method (b).
> >  *
> >  * Method (b) plays well with automatically maintaining the ref counts
> >  * equal to the number of in-SDB references, at the granularity of SDB
> >  * txns.  It requires an interlock between adding/deleting references and
> >  * purging unreferenced pristines - e.g. guard each of these operations by
> >  * a WC lock.
> >  *   * Add a pristine & reference it => any WC lock
> >  *     (To prevent purging it while adding.)
> >  *   * Unreference a pristine => no lock needed.
> >  *   * Unreference a pristine & purge-if-0 => Same as doing these separately.
> >  *   * Purge any/all refcount==0 pristines => an exclusive WC lock.
> >  *     (To prevent adding a ref while purging.)
> >  *   * If a WC lock remains after a crash, then purge refcount==0 pristines.
> >  *
> 
> I don't understand this paragraph.  What does it propose to do in order
> to avoid purging pristines that have refcount=0 while a commit is in
> progress?  Does it propose that 'commit' take a lock for the duration
> from adding the pristine to incrementing its refcount?
Yes: in the case of the way "commit" is currently implemented, where
"add a pristine" comes first and then "reference it" comes later, the
commit operation should hold a WC lock during that whole time.
> >  * Method (c):
> >  *   * ### Not sure about this one - haven't thought it through in detail...
> >  *   * Add a pristine & reference in separate steps => any WC lock (?)
> >  *   * Remove a reference requires ... (nothing more?)
> >  *   * Find & purge unreferenced pristines requires an exclusive WC lock.
> >  *   * Ref counts are sometimes too high while a WC lock is held, so
> >  *     uncertain after a crash if WC locks remain, so need to be re-counted
> >  *     during clean-up.
> >  *
> 
> I think (c) has some good points.  There's a variant, (c'), where
> instead of overloading the refcount field you use a separate
> refcounts_by_commitsinprogress flag; in that scheme, 'find & purge'
> doesn't need a lock, it just needs to respect that flag.
Food for thought, but I find that working carefully through these
scenarios is quite time consuming.  A large part of my aim here is to
try to find a good scheme without having to first define all the
possible schemes in detail.  Your comments are helping me to do that.
> >  * We choose method (b).
> >  *
> >  *
> >  * === Invariants in a Valid WC DB State ===
> >  *
> >  * * No pristine text, even if refcount == 0, will be deleted from the store
> >  *   as long as any process holds any WC lock in this WC.
> >  *
> 
> Surely you mean "any *other* process" :-)
Yes - something like that, though that's still not quite right.
> >  * The following conditions are always true outside of a SQL txn:
> >  *
> >  *   * The 'checksum' column in each NODES table row is either NULL or
> >  *     references a primary key in the 'pristine' table.
> >  *
> >  *   * The 'refcount' column in each PRISTINE table row is equal to the
> >  *     number of NODES table rows whose 'checksum' column references this
> >  *     pristine row.
> >  *
> 
> Will we have wc-checks.sql triggers enforce these?
The first one is enforced by constraints defined in wc-metadata.sql.
The second one is currently assisted by triggers defined in
wc-metadata.sql but not strictly enforced.  I would like to enforce it
but haven't looked at doing so yet.
> >  * The following conditions are always true
> >  *     outside of a SQL txn,
> >  *     when the Work Queue is empty:
> >  *     (### ?) when no WC locks are held by any process:
> >  *
> >  *   * The 'refcount' column in a PRISTINE table row equals the number of
> >  *     NODES table rows whose 'checksum' column references that pristine row.
> >  *     It may be zero.
> >  *
> 
> IOW, the refcount may be zero if the work queue is non empty.
No, I meant the number of refs does not necessarily match the refcount
column if the work queue is non empty.  When WQ is empty, refcount and
number of refs always match each other and may (both together) be zero
or more than zero.
> But a few pages above you said that this specification doesn't use the
> work queue to manage the pristine files.  Why is the work queue involved
> here then?  Does this spec use the work queue for the disk-tree files
> but not for the .svn/pristine/ files?
I think I've messed up this part.  I'll take another look.  I'm not sure
whether the Work Queue needs to be involved at all.
> >  * ==== Operating Procedures ====
> >  *
> >  * The steps should be carried out in the order specified.
> >  *
> >  * * To add a pristine text reference to the WC, obtain the text and its
> >  *   checksum, and then do this while holding a WC lock:
> >  *     * Add the pristine text to the Pristine Store, setting the desired
> >  *       refcount >= 1.
> >  *     * Add the reference(s) in the NODES table.
> >  *
> 
> So the first bullet will take an SDB lock, release it, and then the
> reference will be added while the WC lock is held (and within a separate
> SDB txn/lock)?
Yes.  That's to support cases like the current "commit" implementation.
In certain other cases, these could be collapsed into a single txn.
> >  * * To remove a pristine text reference from the WC, do this while holding
> >  *   a WC lock:
> >  *     * Remove the reference(s) in the NODES table.
> >  *     * Decrement the pristine text's 'refcount' column.
> >  *
> >  * * To purge an unreferenced pristine text, do this with an *exclusive*
> >  *   WC lock:
> >  *     * Check refcount == 0; skip if not.
> >  *     * Remove it from the pristine store.
> 
> Why is 'purge an unreferenced pristine text' part of the operating
> procedures for incrementing/decrementing refcounts?  It doesn't belong
> here, does it?
> 
> (i.e., there is a 'manage the pristine store' layer --- first part of
> your email --- and a 'manage the references' layer, here)
Well, it's because in the lower "manage the pristine store" layer,
purging depends only on refcount==0 (and no simultaneous add or remove).
At this higher "manage the references" layer, purging must be avoided
when the WC is in the middle of a WC-lock operation.  In other words,
this upper layer's "purge" is a wrapper that adds an extra level of
locking around the lower layer's "purge".  It's not safe to call the
lower-layer API directly.
Thanks for all this.  I'll do an update (and check it in like Hyrum
said).
- Julian
Received on 2011-02-16 20:28:03 CET