Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification

From: Denis Kovalchuk <denis.kovalchuk_at_visualsvn.com>
Date: Tue, 5 May 2020 14:35:17 +0300

> Hang on. In your previous reply you wrote that commits that "complete
> with a post-commit error" was a problem the patch was designed to solve,
> but now you say that's not an immediate concern. Which is it? We can't
> design a solution if it's not clear what set of problems the solution is
> supposed to solve.

I don't think there is a contradiction somewhere. What I was trying to say is:
- The problem is that rep-cache verification leads to errors for other writes
to the rep-cache. For example, this happens during commit post-processing.
- One visible consequence of this problem is that commits can hang.
- Another visible consequence is that commits complete with post-commit errors.
- Another visible consequence is that new data will not be deduplicated.
- I think the root cause of problem is that entire rep-cache verification runs
with holding a SQLite lock for a long time.
- I don't think that there are issues with the existing way to report
post-commit errors to clients. Also, I don't think that changing it would be
relevant to the problem.

> > For example, 'svnadmin verify' can be checking rep-cache for a long time and
> > 'svnadmin build-repcache' will keep completing with an error in this case. All
> > commits during this time will work without deduplication for new data.
>
> I take it that in your use-case it's important that a commit's rep-cache rows
> should be added as soon after the commit as possible, then?

I think it is important that entries don't get lost and do not make
deduplication worse. The rep-cache verification can take a long time for
real-world reportistories, for example an hour or more. And all commits during
this time will work without deduplication for new data and I think that
currently there is no way to fix it. In case of regular verifications, this can
be a significant issue.

> Yes, and it's up to the repository administrator to provide such
> guarantees when scheduling Â«verifyÂ» and Â«build-repcacheÂ» operations on
> their repository.

I think that in general it may be difficult to have such guarantees. For
example, verification can be performed with the API or by third-party tools.

And based on the above it would mean that it is necessary to also prohibit
commits during verification. The limitation seems too strong and so I think it
would be better to fix the problem with holding a SQLite lock during entire
verification.

> For example, how about sharding rep-cache.db to multiple separate
> databases, according to the most significant nibbles of the hash value?
> This way, instead of having one big database, we'd have 2**(4N) smaller
> databases.

If I am not mistaken, the sizes of such shards will still be O(N), but not
O(1), as in the patch. Therefore, this problem can still occur at some
rep-cache sizes.

> I understand this argument. However, at the end of the day, the patch
> makes Â«verifyÂ» read a database as it's being written to, which is more
> likely to run into bugs (either in our code or in SQLite's), so I'm not
> a fan of it.
>
> Maybe I'm being overly careful.

I think that concurrent reading and writing to rep-cache.db is already possible
in the case of concurrent commits. It is a bit unclear to me why the (optional)
verification has to be different from the commit in this aspect. Especially
considering that the current approach leads to the problem.

> P.S. How about using creating a temporary copy of rep-cache.db, and then
> have Â«verifyÂ» iterate that at its leisure and delete it when done with it?

I think that compared to the patch this approach has a problem that every
verify will lead to a lot of write operations. For example, the size of
rep-cache.db in the ASF repository is 1.1 GB.

Just in case, I will be happy to work on any specific things in the patch, if
there are some things you think can be improved.

Regards,
Denis Kovalchuk
Received on 2020-05-05 13:35:34 CEST

This message: [ Message body ]
Next message: Julian Foad: "Email moderators"
Previous message: Yasuhito FUTATSUKI: "Re: [PATCH] Fix differences of SWIG_PY_OPTS between Unix and Windows"
In reply to: Daniel Shahaf: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
Next in thread: Daniel Shahaf: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"
Reply: Daniel Shahaf: "Re: [PATCH] Fix "database is locked" error when committing during rep-cache.db verification"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]