Evgeny Kotkov wrote on Thu, May 21, 2015 at 18:23:22 +0300:
> SVN_ERR(svn_repos_verify_fs2(repos, 1, 5, NULL, NULL, NULL, NULL, pool));
>
> ...would return two different errors depending on the binaries being used,
> assuming that one of the revisions in the [r1:r5] range is corrupted:
>
> (With 1.8.13) E160004: Checksum mismatch in item at offset 0 of
> length 59 bytes in file path/asf/db/revs/0/2
>
> (With 1.9.0-rc1) E165011: Repository 'path/asf' failed to verify
...
> Thoughts?
I think the incumbent code doesn't use SVN_ERR_REPOS_CORRUPT (E165011)
correctly.
The keep-going implementation simply uses SVN_ERR_REPOS_CORRUPT to wrap
whatever error the FS layer reported. That's wrong, for two reasons:
- The SVN_ERR_REPOS_CORRUPT code should be used to indicate corruption
of repos-layer on-disk data (such as $REPOS_DIR/db not existing or
$REPOS_DIR/format being a directory). The code does not look for such
conditions. In fact, since the repos layer got around to calling into
the FS layer, the repos layer itself is almost certainly integral.
- The repos layer has no business second-guessing the FS layer's choice
of error code. For example, a permission error on a rev file could
happen and does not indicate corruption, or a user might have pressed
^C during svn_fs_verify()'s execution. The incumbent code assumes
that any non-SVN_ERR_CANCELLED code indicates a corruption; that
assumption is wrong.
So, the keep-going code should stop using SVN_ERR_REPOS_CORRUPT to
indiscriminately wrap svn_fs_verify()'s error code. That should make
the E160004 (SVN_ERR_FS_CORRUPT) error visible, addressing Evgeny's
issue. (I am almost sure that's exactly what Evgeny's patch does.)
Furthermore, I think the "if (found_corruption)" block at the end of
svn_repos_verify_fs3() should also avoid using SVN_ERR_REPOS_CORRUPT,
since that would be the wrong code to use in some circumstances (for
example, if all revision files had permission errors). The code should
either say what it _does_ know for a fact — "N out of M revisions failed
to verify" — or start inspecting the FS errors to determine whether they
indicate corruption or transient errors (which isn't always possible,
since the repos layer does not know the context the error was raised in).
In fact, since the aforementioned two uses of SVN_ERR_REPOS_CORRUPT are
the only two uses of that code, I conclude that I am of the opinion that
that error code should be deleted entirely from the 1.9 codebase. We
can always revive it in 1.10 if needed.
Cheers,
Daniel
Received on 2015-05-23 13:07:31 CEST