On Wed, May 27, 2015 at 1:11 PM, Julian Foad <julianfoad_at_gmail.com> wrote:
> Stefan Fuhrmann wrote:
> > Julian Foad wrote:
> >> @@ -2262,14 +2264,20 @@ get_shared_rep(representation_t **old_re
> >>
> >> /* A simple guard against general rep-cache induced corruption. */
> >> if ((*old_rep)->expanded_size != rep->expanded_size)
> >> {
> [...]
> >> + ### [JAF] Why should we assume the cache is corrupt rather
> than the
> >> + new rep is corrupt? Is this really the logic we
> want?
> >
> > Hm. We don't say "the cache is corrupt" but log a warning with
> > a "FS corruption" error code and tell the user that there is an
> > inconsistency / mismatch.
>
> I meant that we proceed with storing the new data into the repository
> (ignoring the cache), which assumes the new data is correct and the
> cache is wrong.
>
>> + Should we do something more forceful -- flag the
> cache as
> >> + unusable, perhaps -- rather than just logging a
> warning?
> >
> > I don't have a particularly strong opinion on that one. I guess
> > it is very unlikely that the mismatch has "legitimately" been
> > caused by e.g. a SHA1 collision. Maybe, we should reset /
> > clear the rep-cache.db at that point and say so in the warning.
>
> I don't know what would be best. It's probably "good enough" based on
> our empirical experience. But maybe we should think in terms of what
> guarantees we should expect from the cache. It doesn't have to return
> a result, but if it does then the result *must* be correct. If one
> result is demonstrably wrong (as in this code block) then maybe we
> should work on the assumption that the whole cache is bad, rather than
> run the risk that some other entries are wrong but happen to have a
> matching expanded-size.
>
Alright. I gave it a bit more thought now.
Whenever we encounter this mismatch, something pretty
bad likely happened to the repo - such as a failed restore
attempt. In turn, we can expect those situations to be
very rare - which means we can afford some disruption
for the user.
I suggest that we do 3 things:
* log the warning - for future reference, for being picked
up by monitoring tools etc.
* clear the rep-cache.db
* fail the current commit
That way, we can be quite sure that only valid data gets
committed. Alternatively, we could block any commit
(inventing some new repo state) until the admin resolves
the situation manually. Not sure which one I would prefer.
On top of that, we should handle the other rep-cache.db
consistency checks (e.g. head vs. rev of latest entry)
the same way.
-- Stefan^2.
Received on 2015-05-27 18:10:52 CEST