Re: FSFS7 repository corruption during commit with background upgrade

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Sun, 28 Sep 2014 20:28:19 +0200

On Fri, Sep 26, 2014 at 4:38 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:

> I was looking how fsfs upgrade code works and found particular fsfs7
> log-addressing repository corruption bug:
> 0. Repository has 999 revisions.
> 1. Client begins committing some data to fsfs v6 repository through Apache
> 2. Apache web server opens svn_fs_t, reads format file. At this point
> repository has format 6 and uses physical (classic) addressing.
> 3. Client changes txn content
> 4. Before committing change, admin upgrades this repository to fsfs7. FSFS
> upgrade code marks that log-addressing will be available from next
> shard,
> i.e. from revision 1000.
> 5. Apache web server starting committing txn: obtaining write-lock and
> writes
> protorev for r1000. Since svn_fs_t instance was cached for connection so
> it didn't know that revision 1000 should be log addressing and writes
> physical addressing revision without any error (!)
>

Commit succeeded, but repository is unreadable.
>

That is the case.

> I'm attaching patch with test reproducing this issue. The commit may
> fail in maintainer mode because txn will be verified before commit.
>

Thanks for the test case. I committed it as part of r1627949.

While the latter also fixes the upgrade scenario above, I think
we should put a section into the release notes telling users that
upgrading hot repositories is a bad idea.

I've been going through the code looking at how an outdated
svn_fs_t instance would interact with an upgraded repo and it
seems that it usually either does not care (e.g. rep sharing) or
errors out (global IDs, packed shards). But I don't see a way
to guarantee that this is always the case.

For repos / FS API level users, things are more tricky still.
We got lucky that svn_fs_pack does not take a svn_fs_t*.
Otherwise, we would see incomplete packs (revprops not
packed in f6+) or even a corruption (f7 pack lock not acquired
and multiple processes appending to the same pack file).

So, we might be safe for now but I don't see a way to be sure.

-- Stefan^2.
Received on 2014-09-28 20:28:47 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]