Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sat, 23 May 2020 01:45:39 +0000

Johan Corveleyn wrote on Fri, 22 May 2020 10:02 +0200:
> On Fri, May 22, 2020 at 12:16 AM Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> >
> > Johan Corveleyn wrote on Thu, 21 May 2020 17:53 +0200:
> > > Trying to summarize this thread a bit. I apologize in advance if I
> > > forgot something, or have misrepresented any of the points that were
> > > raised (feel free to correct / add).
> > >
> >
> > I have additions, below.
> >
> > > Denis summed up the following problems that might happen while
> > > 'verify' locks the repcache.db:
> > >
> > > > 1. a post-commit error "database is locked"
> > > > 2. new representations will not be added in the rep-cache.db
> > > > 3. deduplication does not work for new data committed at this time
> > > > 4. commits work with delays.
> > >
> > > We have also established that the new tool build-repcache is not
> > > suitable for post-factum fixing of 3). It does not reprocess already
> > > committed revisions.
> >
> > Note, however, that (3) is simply an observation that Denis made about
> > the semantics of the incumbent code. We do not know of a user who needs
> > rep-cache.db entries to be added so soon after the commit that running
> > Â«build-repcacheÂ» after Â«verifyÂ» finishes would be too late (or impractical).
> >
> > > We are currently considering two approaches to address these issues:
> > >
> >
> > Hold on. Those issues are what happens *whenever* the post-commmit FS
> > processing INSERTs are starved. They happen when a Â«verifyÂ» starves
> > INSERTs, but they are known to have at least one other case: when two
> > commits happen in quick succession, the INSERTs that happen during
> > post-commit FS processing of the first commit can starve the second
> > commit's INSERTs. (Don't we have a jira issue for this variant? I
> > thought we did, but my searches came up dry.)
> >
> > Which is to say, the presentation of the matter as "four issues to be
> > fixed" isn't accurate. There are _eight_ issues to consider: (1) to (4)
> > when they are caused by a Â«verifyÂ» starving INSERTs, and (1) to (4) when
> > they are caused by an Â«INSERTÂ» starving INSERTs.
> >
> > Thus, your analysis of the pros and cons is incomplete: it glosses over
> > the case of an Â«INSERTÂ» starving other INSERTs.
>
> Good point. So those "4 problems" are all consequences of "rep-cache
> insertion starvation". One possible cause of this might be 'verify',
> but there are other known causes, such as 'locked by another commit'.
> I'm not sure how likely that last form of starvation is in practice,
> though (but if you've seen a jira issue of it, hmmm, maybe).
>

It would happen on /repos/asf every time certain PMCs commit a new
release's javadocs. (I'm not sure whether it still does; those PMCs
may have switched to git since I last looked.)

> > Furthermore, over the course of the thread several other ideas have been
> > floated, intended to address various subsets of the eight issues. The
> > "Perform INSERTs asynchronously" idea, for example (raised by me, but
> > inspired by a remark of Brane's), can basically fix all eight issues by
> > itself. There was also the idea to stop marshalling post-commit FS
> > processing errors to the client, which would fix both variants of (1)
> > but none of the other six. I think there were other ideas as well, but
> > I've run out of time for this thread for tonight, sorry.
>
> Ack.
> So IIUC these are addressing the consequences of the starvation (which
> might help regardless of the cause), rather than taking away the
> locking-causes.
>

Some and some.

- The OP is preventative for the Â«verifyÂ» cause and a no-op for other
causes.

- Sharding rep-cache is preventative: it effectively makes the critical
  section shorter in duration, for all existing use-cases.

- Performing INSERTs asynchronously (and polylithically, i.e., in
  NÂ O(1)-sized transactions) is preventative for the cases of INSERTs
  starving either other INSERTs or readers, and a no-op for the case of
  readers starving INSERTs.

  (Speaking of which, here's another thought: add an option flag to
  build-repcache that makes it wait indefinitely in case it's locked out
  of rep-cache.db by a concurrent process. Denis, WDYT?)

- Not marshalling post-commit FS processing errors to the client is
  purely a failure mode handling thing.

- â‹®

> I think both 'angles' are quite useful. We might even want to combine
> ... mitigating the consequences, as well as avoiding some of the
> causes for the starvation (because even with consequence-mitigation
> things might not turn out ideal, or only be mitigated at the cost of
> X, so fixing some known causes can still be valuable).
>

Yeah, for example, the "don't marshal" thing probably makes sense
regardless of anything else. There's a reason GitHub doesn't page its
users when one of the disks in its RAID arrays needs replacementâ€¦

Cheers,

Daniel
Received on 2020-05-23 03:45:58 CEST

This message: [ Message body ]
Next message: Greg Stein: "Re: Lists history (was: Re: commit failed due to "backwards-sliding source views" - what do I do?)"
Previous message: Johan Corveleyn: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
In reply to: Johan Corveleyn: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]