Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sun, 10 May 2020 18:25:44 +0000

Denis Kovalchuk wrote on Sat, 09 May 2020 23:37 +0300:
> > What I'm trying to do here is to get a _problem statement_ to which your
> > patch is a solution, so we can consider other solutions.
>
> Partially repeating the first letter from this thread, the problem is the
> following: if a commit occurs during the rep-cache.db verification, this can
> lead to a post-commit error "database is locked" and new representations will
> not be added in the rep-cache.db. So a verification of a hot repository leads
> to user visible errors. Furthermore, deduplication does not work for new data
> committed at this time. And commits work with delays.

Okay. Let me quote this again with annotations added:

> if a commit occurs during the rep-cache.db verification, this can lead to:
> 1. a post-commit error "database is locked"
> 2. new representations will not be added in the rep-cache.db
> 3. deduplication does not work for new data committed at this time
> 4. commits work with delays.

As I said, you accurately describe the observed behaviour. However,
given the misunderstanding upthread, I would still like to ask you to
make it unambiguously clear which of those four items are requisites of
your use-case.

Would your use-case be adequately addressed if #1, #2, and #3 were fixed
(whether by committing your patch or otherwise), but #4 were not?

Would your use-case be adequately addressed if #1, #2, and #4 were fixed
(whether by committing your patch or otherwise), but #3 were not?

Would your use-case be adequately addressed if #1, #3, and #4 were fixed
(whether by committing your patch or otherwise), but #2 were not?

Would your use-case be adequately addressed if #2, #3, and #4 were fixed
(whether by committing your patch or otherwise), but #1 were not?

Thanks in advance for clarifying this.

My overall understanding from your previous emails was that only #2 is
a requisite.

Once this is settled, we can begin to consider other solutions which
will be acceptable to all involved parties.

> So running verify for a hot repository is problematic.

To be more precise, verifying a live repository can cause concurrent
processes to incur delays and report errors due to being unable to
read/write rep-cache.db, which in turn can necessitate build-repcache
runs.

> in reality repositories can always be hot and so it is not clear when
> to verify them at all.

The admin could deploy a workaround (e.g., verify a copy of the
repository, or split the repository into smaller repositories, etc.).

Alternatively, the admin might bite the bullet and verify the repository
hot anyway, incurring issues #1 through #4 above. That is, they might
assess that course of action as preferable to the risk not detecting
a corruption.

I agree that it would be nice to allow the admin to verify their
repository without incurring issues #1 through #4. However, I don't
agree that the patch in the OP is the best way to address those issues.

> Speaking about correctness, I think the implementation in the patch is correct
> and as far as I understand there are no concerns about this.

I have no concerns about how faithfully the patch implements the
approach you had chosen. I have concerns about the soundness of that
approach.

> Regarding the problem with verification depending on the guarantees
> provided by SQLite: If we cannot rely on SQLite guarantees, then I
> think we should not rely on the guarantee that the table does not
> change when it is read using one statement (which the current
> implementation relies on). In other words we should either rely on all
> SQLite guarantees or not rely on any.

I don't see things quite the same way as you.

For starters, while I'm not familiar with SQLite's implementation,
I expect that an implementation that locks out writers entirely will be
less susceptible to bugs than one that permits concurrent writes. In
other words, the assumption that «verify» in trunk makes is presumably
less likely to be violated than the assumption that the patch would have
it make.

In general, the extents to which «verify» and «commit» should rely on SQLite
are not _necessarily_ the same. They might _happen_ to be the same by
chance, but the fact that «commit» makes some assumption doesn't _imply_
that «verify» should make the same assumption.

However, it's certainly fair to question how much each use-case should
rely on SQLite's guarantees. We have many options, such as:

- Rely on SQLite guarantees freely in both «verify» and «commit».
(that's what the patch does)

- Rely on SQLite guarantees to a lesser extent in «verify» than
in «commit». (that's what trunk does)

- Rely on SQLite guarantees in «verify» and/or «commit» to a lesser
  extent than trunk does. (Unclear what this means exactly. Sharding
  rep-cache.db and using our own out-of-band locking? Replacing
  rep-cache.db with some non-SQLite storage? Something else?)

- …

> All alternative approaches where an administrator takes additional steps do not
> solve the problem in my opinion. Because it turns out that it’s better not to
> verify the hot repository at all, because additional steps will be required.

First of all, let's be clear: by "additional steps" you refer to running
«build-repcache» after «verify».

In most use-cases, an administrator who had chosen not to verify a repository
_because that would have required them to run build-repcache afterwards_
would have been fired.

> Speaking about other approaches, such as sharding rep-cache.db: It may be a
> good approach, but I think there may and probably will be other associated
> problems that are currently unknown.

You have just committed FUD.

Explain what problems those might be.

> I think the new proposed approach is an improvement of the current one, which
> also solves the problem. But if there are still significant concerns about that
> approach, I do not have anything else to offer at the moment. I may try to
> return to this topic if I have any additional thoughts.

You are welcome to return at any time. However, whether your return
will be fruitful will depend on what approach you take. If you continue
to refrain from openly considering solutions other than the one you
proposed, you will continue to experience dev@ as a brick wall too high
to throw patches over. On the other hand, if you participate in the
discussion not with the intent to sell us on a particular, predetermined
solution but with the intent to jointly improve a collaborated-upon
product, you'll find we're a welcoming bunch.

Daniel
Received on 2020-05-10 20:26:04 CEST

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
Previous message: Daniel Shahaf: "Re: [PATCH] fix unable to load *.pyd files with Python 3.8.x on Windows"
In reply to: Denis Kovalchuk: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
Next in thread: Johan Corveleyn: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
Reply: Johan Corveleyn: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]