On Wed, May 27, 2015 at 6:35 PM, Julian Foad <julianfoad_at_gmail.com> wrote:
> Stefan Fuhrmann wrote:
>> Alright. I gave it a bit more thought now.
>> Whenever we encounter this mismatch, something pretty
>> bad likely happened to the repo - such as a failed restore
>> attempt. In turn, we can expect those situations to be
>> very rare - which means we can afford some disruption
>> for the user.
>> I suggest that we do 3 things:
>> * log the warning - for future reference, for being picked
>> up by monitoring tools etc.
> We already do that.
Oh, absolutely. I just didn't mention it.
>> * clear the rep-cache.db
> Clearing the cache and continuing operation may make subsequent
> commits much larger than they should be, and there is no easy way to
> undo that if it happens.
Rep-sharing typically reduces the repo size by 25% (e.g. Apache)
to 60% (wordpress, inexperienced users using plain ADD for tags).
Assuming that most rep-sharing is relatively local, i.e. over the
span of a "few" revisions, e.g. due to catch-up merges between
branches, most of the inefficiency will only be temporary.
In short: no major impact.
> Attempting to clear the rep cache might itself fail in some way,
> depending on what kind of corruption has happened to it. It would also
> destroy the evidence of what went wrong.
That is a good point. Two good points, actually.
>> * fail the current commit
>> That way, we can be quite sure that only valid data gets
> Failing the current commit will ensure that no potentially bad (but
> undiagnosed) response from the rep cache has already been used in an
> earlier part of the transaction. I suppose that's what you're thinking
> of. That makes sense to me.
Yes that and the rep cache also beging used to validate for the
incoming data - even if it is very unlikely that we mess up the
server-side SHA1 calculation of the fulltext stream.
>> Alternatively, we could block any commit
>> (inventing some new repo state) until the admin resolves
>> the situation manually. Not sure which one I would prefer.
> I suggest this is the best option, unless we specifically design and
> the administrator specifically chooses an option to have higher
> availability at the expense of disk space, fault diagnosis, and so on.
We could add a "continue-upon-failure" option to the
[rep-cache] section in fsfs.conf. Default would be "false".
If set to true, commits would not be held off by rep-cache
failures but the rep-cache would be disabled. If set to
false, the repo goes into a r/o state.
>> On top of that, we should handle the other rep-cache.db
>> consistency checks (e.g. head vs. rev of latest entry)
>> the same way.
> That makes sense.
> I suggest all of this should be treated as a possible future
> enhancement, not anything urgent.
I agree. In particular because it will require a format bump
for putting the "r/o" or "corruption" indicator somewhere.
Received on 2015-05-29 09:26:58 CEST