[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC/PATCH] svnadmin: recover/hotcopy erroring out for old FSFS repositories

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 23 Jan 2014 17:37:17 +0100

On Thu, Jan 23, 2014 at 11:00 AM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com
> wrote:

> Hi,
>
> I've noticed that svnadmin recover and hotcopy commands are broken for old
> repositories around trunk_at_1560210. The problem can be reproduced by
> creating
> a new --compatible-version=1.3 / 1.2 / 1.1 FSFS repository, doing something
> simple with it (say, running svn mkdir or svn checkout / add file /
> commit) and
> attempting to recover or hotcopy it:
> [[[
> (svnadmin-trunk recover REPOS)
> svnadmin: E200002: Serialized hash missing terminator
>
> (svnadmin-trunk hotcopy REPOS REPOS_BACKUP)
> svnadmin: E160006: No such revision 1
> ]]]
>
> Alternatively, this can be reproduced by running svnadmin_tests.py with,
> for
> example, --server-minor-version=3. This problem exists in both 1.8.x and
> trunk. In 1.8.x, however, the errors for both hotcopy and recover are
> visually
> identical:
> [[[
> (svnadmin-1.8.x recover REPOS)
> svnadmin: E200002: Serialized hash missing terminator
>
> (svnadmin-1.8.x hotcopy REPOS REPOS_BACKUP)
> svnadmin: E200002: Serialized hash missing terminator
> ]]]
>
> Overall, this looks like a regression since 1.7.x, where hotcopy works just
> fine. The 1.7.x recover also fails for these repositories with an
> "svnadmin: E000002: Can't open file 'X/db/revs/2': No such file or
> directory"
> error, but this is a well-known issue 4213 [1], which should be fixed by
> r1367683 [2].
>
> Being unable to recover and hotcopy old repositories with the most recent
> svnadmin version seems like an important issue, so I think it's worth a
> couple
> of tests. [See the attached patch #1]
>
> I did a quick investigation and spotted a couple of odd moments in the
> related code:
>
> 1) First of all, there is a reason for the different error messages in
> trunk
> and in the 1.8.x branch.
>
> This is a regression introduced in r1503742 [3], because now calling the
> svn_fs_fs__find_max_ids() routine during hotcopy creates a cyclic
> dependency. Hotcopy needs to update the db/current file, so it uses
> this
> routine to get the NEXT_NODE_ID / NEXT_COPY_ID for old repositories.
> However, this currently behaves the following way:
>
> hotcopy_update_current()
> ...
> (wants to update the db/current file, but doesn't know the next ids)
>
> svn_fs_fs__find_max_ids()
> svn_fs_fs__rev_get_root()
> svn_fs_fs__ensure_revision_exists()
> ...
> (reads the most recent db/current file in order to check the
> presence
> of this revision. it is reported missing, because we're right
> in the
> middle of updating the db/current file and cannot do that yet,
> because we still do not know the next ids)
>
> So, basically, we're calling a routine that assumes the db/current file
> is
> up-to-date in order to get the information on how to update the outdated
> db/current file. This ends with an "E160006: No such revision 1" error.
>
> 2) The reason for the "hash missing terminator" error is a bit trickier.
>
> The NEXT_NODE_ID / NEXT_COPY_ID recovery code recursively crawls
> through the dir representations in the db, however, it assumes that the
> EXPANDED_SIZE in the rep description always is non-zero. According to
> the docstring in subversion/libsvn_fs_fs/fs.h, this assumption is
> incorrect:
> [[[
> /* The size of the representation in bytes as seen in the revision
> file. */
> svn_filesize_t size;
>
> /* The size of the fulltext of the representation. If this is 0,
> * the fulltext size is equal to representation size in the rev file,
> */
> svn_filesize_t expanded_size;
> ]]]
>
> I did a small experiment with different svn (1.7 / 1.8 / trunk) versions
> writing dir reps to repositories with different --compatible-versions...
> [[[
> # text: REVISION OFFSET SIZE EXPANDED_SIZE
> svn 1.7, --compatible-version=1.3 repository (db/revs/1)
> text: 1 57 28 28
> svn 1.8, --compatible-version=1.3 repository (db/revs/1)
> text: 1 57 28 0
> ]]]
>
> ...and it looks like svn 1.7.x did have the EXPANDED_SIZE equal to
> SIZE, but
> this behavior changed with 1.8.x. In 1.8.x, EXPANDED_SIZE is set to 0
> for
> these representations, so, attemping to read them ends with an
> "E200002: Serialized hash missing terminator" error.
>
> See the attached 'rep-description-dumps' file.
>
> NOTE: This code is a part of the recover_find_max_ids() routine and,
> worth
> mentioning, has some history. There is r1404675 [4], which added the
> "pick
> the right one from EXPANDED_SIZE and SIZE" logic, however, it was
> reverted
> in r1404708 [5]. Unfortunately, I do not yet understand the reasoning
> behind this revert, so, I guess, I'll just have to continue
> investigating.
>
> 3) There is an "uniquifier" concept for the representations that makes the
> rep
> caching possible. Each uniquifier requires a new NODE_ID, so, for
> example,
> a transaction which requires 5 new NODE_IDs ends up with taking 10 of
> them.
> This assumption is not shared by the recovery procedure, which recovers
> the
> NEXT_NODE_ID / NEXT_COPY_ID for old repositories by finding the maximum
> NODE_ID / COPY_ID and bumping them by one.
>
> For example, for the old format Greek tree repository with db/current
> containing "1 14 1" the recovery procedure will replace the db/current
> contents with "1 13 1". '13' in this example is the recovered
> NEW_NODE_ID
> (but it also is the NODE_ID reserved for the uniquifier of the last
> "real"
> NODE_ID). As a consequence, hotcopying this repository (which utilizes
> the
> recover_find_max_ids() routine) will end up with *mismatching
> db/current*
> files in the hotcopy source and destination.
>
> It looks like making these uniquifiers for old repositories is
> unnecessary
> in the first place (they are only required for rep caching, which is
> only
> available for the newer >= SVN_FS_FS__MIN_REP_SHARING_FORMAT
> repositories).
>
> See set_uniquifier() in subversion/libsvn_fs_fs/transaction.c
>
> 4) The hotcopy_update_current() does something quite odd for old
> repositories.
>
> It grabs the MAX_NODE_ID / MAX_COPY_ID from svn_fs_fs__find_max_ids(),
> claims them to be the NEXT_NODE_ID / NEXT_COPY_ID and writes them into
> the
> db/current file. The maximum values found by the
> svn_fs_fs__find_max_ids()
> routine should be bumped by one (the same way it is done in
> recover_body(),
> subversion/libsvn_fs_fs/recovery.c).
>
> Not doing so will result in the hotcopy destination having incorrect
> NEXT_NODE_ID and NEXT_COPY_ID in the db/current file (pointing to
> already taken ids).
>
> If this might help to understand what is going on, I've made a
> quick-and-dirty
> patch for 1.8.x branch, which explicitly points the problems and rushes
> toward
> the green test suite. Doing this for 1.8.x allows to postpone the problem
> 1)
> and focus on 2-4), because, as far as I suspect, fixing 1) might get a bit
> messy. [See the attached patch #2]
>
> An appropriate fix for this issue undoubtedly requires a lot more work.
> I think I might be able to come up with a patch for the trunk in the nearby
> future (in case someone does not fix it earlier).
>
> [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4213
> [2] http://svn.apache.org/viewvc?view=revision&revision=r1367683
> [3] http://svn.apache.org/viewvc?view=revision&revision=r1503742
> [4] http://svn.apache.org/viewvc?view=revision&revision=r1404675
> [5] http://svn.apache.org/viewvc?view=revision&revision=r1404708
>
>
> Thanks and regards,
> Evgeny Kotkov
>

Hi Evgeny,

Thanks for the patches. I've been digging into the code
to figure out how and why the size==0 came about. But
I'll not be able to finish that bit before Saturday.

-- Stefan^2.
Received on 2014-01-23 17:37:50 CET

This is an archived mail posted to the Subversion Dev mailing list.