Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Wed, 18 Jun 2014 12:27:13 +0200

On Tue, Jun 17, 2014 at 6:31 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Hi,
>
> Here is something I was working on at the hackathon. This is the problem
> I reported in http://svn.haxx.se/dev/archive-2014-02/0187.shtml
> In brief, the problem lies in the way we use the recovery code when
> hotcopying
> these old repositories. We grab the YOUNGEST revision from the source, do
> the
> hotcopy and then use the recovery procedure in order to determine the
> node-id
> and the copy-id to stamp in the destination db/current file. The problem
> is
> that proper recovery requires a *full* scan of the FS, which we do not do.
> As a consequence, the db/current in the destination is messed up and that
> is
> shown by the XFail'ing fsfs_hotcopy_old_with_propchanges() test.
>

Yes, the problem is real: HEAD might not touch a path
that belongs to the latest copy / copied sub-tree. Hence,
we won't catch the highest copy ID by simply looking
at the youngest revision.

Anyway, this "use recover to obtain db/current contents" approach seems
> broken
> from the very start. I would like to fix this problem by switching to
> something more obvious. We could atomically read the (YOUNGEST, NODE-ID,
> COPY-ID) tuple from the destination and write it to the destination at the
> very moment the hotcopy finished working. Here is a series of three
> patches,
> which I am willing to commit one-by-one. Please note that *I did not yet
> write
> the log messages* for this patch series (probably they are going to include
> parts of what I've written below). I am working on them at this very
> moment.
> And sorry for a lot of text :)
>
> In brief, I have the following plan:
>
> - Extend the existing fsfs_hotcopy_old_with_propchanges() test. We now
> know
> that the problem is not actually in the property changes, but in the
> node-
> and copy-id changes. We can extend the test (which would still be an
> @XFail on trunk) in order to test *more*.
>

This is the first patch of the series.
>

extend-the-existing-xfail-test v1 looks good to commit.

> - Fix the problem itself. The idea of the fix is pretty simple -- we stop
> using the recover()-related functions when doing the hotcopy for
> repositories
> which have the old FS format. Previously we did use the recovery
> mechanism
> in order to recover the next-node-id and the next-copy-id for the hotcopy
> target, but did it *wrong*:
>
> * The thing is that proper recovery actually needs a full rev-by-rev scan
> of the filesystem, otherwise it might actually miss some of the next-
> or
> copy-ids. That's exactly what we encountered here, because now on
> trunk
> we run the recovery *only* for the youngest revision of the destination
> and hence are messing the node ids in the target 'current' file.
>
> * We could probably switch to a full recovery scan for this case (that
> would be simpler in the patch terms), but that actually sucks. You do
> not want to run a full recovery scan in order for the backup (!) to
> work.
> So, I've fixed this by atomically reading the 'db/current' contents of
> the
> source and updating it as-is in the destination. The diff might look a
> bit scary, but this is just a consequence of me extracting the revision
> copying logic into two separate functions (for old and new FS formats
> accordingly).
>

So, in effect, we are actually doing *less* work than before.
I like.

> The new logic is pretty simple -- if we encounted an FS with a new
> format,
> just stick to the old code. If we see the FS format 1 or 2, we use
> the new
> approach, which is much simpler. We do not have to deal with the
> packs and
> shards and stuff like that and simply copy the revisions and revprops (
> sticking to the old approach which copies only files which differ in
> terms
> of filestamps). Then we atomically update the 'db/current' in the
> destination (we've atomically read the source 'db/current' contents
> just
> a bit earlier).
>
> This is the second patch of the series.
>

That patch in itself is o.k.. However, you should update
get_next_revision_ids() in transaction.c to use the new
utility function (or entirely replace it).

> - Cleanup a bit. Now, because the hotcopy code does not need the recovery
> code (which seems logical), we can remove the corresponding private API,
> namely svn_fs_fs__find_max_ids(). As a consequence, the id recovery code
> now is "private" in the subversion/libsvn_fs_fs/recovery.c file and that
> makes sense from the design point of view.
>
> This is the third patch of the series.
>

Removing the internal API is fine; however you might have
kept the code as a static function without the svn_fs_fs__
name prefix. There is no need to manually inline it into the
caller - but that's just me giving an opinion. I'd be fine with
committing that patch as is.

> In case this might be interesting to you, here is a checklist I've run
> against
> the trunk with all three patches applied:
>
> - Run svnadmin_tests.py with Debug / Windows
> - Run svnadmin_tests.py with Debug and --fsfs-packing / Windows
> - Run svnadmin_tests.py with Debug and --fsfs-sharding / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=3 / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=6 / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=8 / Windows
> - Run svnadmin_tests.py with Release / Windows
> - Run svnadmin_tests.py with Release and --fs-type=bdb / Windows
> - Run svnadmin_tests.py with Release and --fs-type=fsx / Windows
> - Run svnadmin_tests.py with Release and --parallel / Windows
> - Run svnadmin_tests.py with Release and --http-dir / Windows
> - Run all tests / Ubuntu
> - Run hotcopy for the new format serf repository, verify it, check
> db/current
> - Run hotcopy for the new format googletest repository, verify it,
> check db/current
> - Run hotcopy for the old format serf repository, verify it, check
> db/current
> - Run hotcopy for the old format googletest repository, verify it,
> check db/current
> - Run hotcopy for the PACKED new format serf repository, verify it,
> check db/current
> - Run hotcopy for the PACKED new format googletest repository, verify it,
> check db/current
> - Run hotcopy for the PACKED old format serf repository, verify it,
> check db/current
> - Run hotcopy for the PACKED old format googletest repository, verify it,
> check db/current
> - Somehow test the incremental hotcopy (say, abort the hotcopy for a
> sharded repository in the middle of it...)
> - Do the previous test with packed shards
> - Test that the test with changes still fails on trunk (itself)
> - Patch the test in order to run it with a normal repository
> version (> 3) / Windows
>
> What do you think?
>

Wow my total response is shorter than the mere list of tests
you ran ;) Good job.

Cheers!

-- Stefan^2.
Received on 2014-06-18 12:27:49 CEST

This message: [ Message body ]
Next message: Evgeny Kotkov: "Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories"
Previous message: Julian Foad: "Re: --strict vs --no-newline"
In reply to: Evgeny Kotkov: "[PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories"
Next in thread: Evgeny Kotkov: "Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories"
Reply: Evgeny Kotkov: "Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]