[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS rep-cache validation

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 23 Jan 2014 21:55:08 +0100

On Wed, Jan 22, 2014 at 10:19 PM, Philip Martin <philip_at_codematters.co.uk>wrote:

> On IRC today we were discussing the validation code in
> svn_fs_fs__set_rep_reference. When inserting a row into the rep-cache
> the insert can fail because the row already exists. The usual way this
> would happen is that two parallel commits add content with the
> same checksum at different locations, e.g.:
> svn import some_file file://`pwd`/repo/f1
> svn import some_file file://`pwd`/repo/f2
> That's harmless and nothing goes wrong, one of the commits populates the
> rep-cache.
> Another problem is that the only caller of svn_fs_fs__set_rep_reference
> always passes reject_dup=FALSE so there is code that is never
> exercised. Perhaps we should remove it?
> A more serious problem is if somebody edits a repository and ends up
> with a rep-cache that has references to revisions that do not exist.
> This could happen if somebody removes revision files and edits 'current'
> but doesn't run 'recover'. Or if somebody uses an old backup with a
> newer rep-cache. We have attempted to catch this by having
> svn_fs_fs__get_rep_reference return an error if the rep-cache refers to
> future revisions. However the code isn't working:
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
> $ svnadmin verify repo
> * Verifying metadata at revision 0 ...
> * Verifying repository metadata ...
> svnadmin: E165011: No representation found at offset 276 for item 4 in
> revision 1
> svnadmin: E165011: Repository 'repo' failed to verify
> Despite the verify error the new r1 is correct represented, it doesn't
> use the erroneous row from the rep-cache, and "svn cat" will work.
> However the commit also means that the erroneous row no longer refers to
> a revision in the future and so a subsequent commit may use the row and
> be corrupt.
> The problem is that svn_fs_fs__get_rep_reference calls
> svn_fs_fs__ensure_revision_exists which returns the error
> SVN_ERR_FS_NO_SUCH_REVISION while the caller,
> transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT.
> We could change the error returned by svn_fs_fs__get_rep_reference or
> the error expected by get_shared_rep. Either of those would cause the
> second commit above to fail. However that doesn't really solve the
> problem. When the erroneous row is present we really need to stop any
> commit that would make the erroneous row refer to a revision even if
> that commit itself doesn't use the row:
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svn -mm mkdir file://`pwd`/repo/X
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
> Perhaps commit should always check for the highest revision in the
> rep-cache and fail if it is greater than HEAD? This might be considered
> overkill to help people who have introduced corruption by editing the
> repository. However when such an erroneous row is present the failure
> mode is nasty: a commit succeeds but is corrupt and content is lost.
> The difficult bit here is that we do not have an SQLite index on
> rep_cache.revision.

But we do write the database strictly in revision order.
So, if we could simply read the latest record once upon
opening the DB. If that refers to a future revision, read
"current" and compare. If the DB it still "in the future",
abort txn, i.e. prevent any future commit until rep-cache.db
gets deleted by the admin.

-- Stefan^2.
Received on 2014-01-23 21:55:40 CET

This is an archived mail posted to the Subversion Dev mailing list.