> > if a commit occurs during the rep-cache.db verification, this can lead to:
> > 1. a post-commit error "database is locked"
> > 2. new representations will not be added in the rep-cache.db
> > 3. deduplication does not work for new data committed at this time
> > 4. commits work with delays.
>
> As I said, you accurately describe the observed behaviour. However,
> given the misunderstanding upthread, I would still like to ask you to
> make it unambiguously clear which of those four items are requisites of
> your use-case.
I'm not sure that I understand the question about the requisites of use case
correctly. The use case I am talking about is running verify for a hot
repository. And the points listed above are negative consequences that occur in
this case. I think that the best way is to fix all these consequences, if
possible. The proposed patch does exactly that.
> I look to hearing Denis's concerns with the sharding approach.
It seems to me that this approach has at least the following potential
problems:
- There may be considerable difficulties associated with supporting multiple
databases. For example, it may be necessary to open not one, but up to all
existing databases during a commit, that may affect the performance of the
commit. In addition, if we ever need atomic operations on the entire
rep-cache, we will have to use ATTACH DATABASE statement [1] with the master
journal [2], which I think is not used anywhere now, and is not supported by
all journaling modes.
- I assume that reading and verification of all entries in one shard will be
performed while holding a SQLite lock, because otherwise we return to the
variations of the proposed patch and the sharding approach will not be
necessary at all. Then, if the main part of the verification (for example,
reading the revision content) will be performed while holding the lock, the
problem may still occur in some cases, because this verification part can
potentially take a long time (for example, if repositories are located on a
network share). So the problem will not be completely fixed.
- If I'm not mistaken, this approach requires a format bump. So this does not
fix the problem for existing repositories. It is also necessary to perform
the division into shards in some form, which means that a fast in-place
upgrade also probably will not fix the problem.
> In this case, the tradeoff would seem to be among:
>
> - ship 1.14's «verify» and require «build-repcache» to be run afterwards;
>
> - ship the «verify» in the OP, about whose correctness we are less
> certain, but which doesn't require running «build-repcache» afterwards;
Speaking about tradeoffs, I would like to note that these cases are not
equivalent, when it comes to visible behavior, because the requirement to run
build-repcache does not fix 1), 3) and 4).
[1] https://www.sqlite.org/lang_attach.html
[2] https://www.sqlite.org/tempfiles.html#master_journal_files
Regards,
Denis Kovalchuk
Received on 2020-05-13 17:32:40 CEST