I'm trying to dump a very large (300GB) svn repo so we can split it up
into manageable sizes. However, I'm running into problems with
revision references. I don't know if these are bugs in svn or if our
repo somehow is corrupted, but here's an outline of the problems. Any
help would be appreciated. Explaining the issues in full will take
some space, so bear with me.
1. After filtering, there are Node-copyfrom-revs that refer to
filtered out revisions.
In the dumpfile, rev X of path A has Node-copyfrom-path B and
Node-copyfrom-rev Y. However, rev Y didn't touch path B, so it's
filtered out, and svnadmin load dies with "svnadmin: E160006: No such
revision Z". (Note that it reports the nonexisting revision as Z,
which is in fact *not* Y. It's the nearest existing rev < Y.)
This is not a problem with the filter; the original dump of the repo
contains the same references. Of course, there rev Y exists so there's
no problem loading it. If this is a bug, it presumably originated when
rev X was committed.
This appears to be the same problem reported previously in
http://svn.haxx.se/users/archive-2013-08/0560.shtml, but I saw no
resolution to that post.
I have attempted to bypass this problem by editing the dump stream to
change Node-copyfrom-rev entries to nonexistent versions to instead
point to the nearest, earlier, existing version. This seems to fix the
problem. But then I run into the next:
2. svnadmin load crashes with failed IS_VALID_FORWARD_RANGE assertion
The next problem occurs when svnadmin load encounters a mergeinfo
property. (The revisions in the mergeinfo properties also point to
nonexistent revisions, just like Node-copyfrom-path above, so they
have also been remapped to the nearest existing revs.)
The mergeinfo encountered is "A:X-Y". When this is parsed in
renumber_mergeinfo_revs, it turns into (X-1)-Y, because
parse_rangelist substracts one from firstrev. I don't know what this
-1 comes from, but when renumber_mergeinfo_revs calls
get_revision_mapping on X-1 (which does not exist in the filtered
stream) it gets a bogus rev. (It maps to the rev immediately preceding
X in the dump stream.) Y, however, maps correctly to the number in the
new loaded repo. This causes X>Y and the invalid range assertion
fires. If I manually call get_revision_mapping(X), I get the correct
rev for X in the loaded repo, so this is caused by the -1 in
I see that if it gets an invalid rev from the mapping (which,
according to the comment, happens for the revision immediately
preceding the oldest revision in the stream) it uses the oldest rev
from the load stream, minus 1.
If this is because the start rev in the merge_range is actually one
less than the start rev, it seems that rather than calling
rev_from_map = get_revision_mapping(pb->rev_map, range->start);
it needs to do
rev_from_map = get_revision_mapping(pb->rev_map, range->start+1) -1;
because while that may not matter if the revs are consecutive, if
there are filtered out revs in the stream, they are not the same. If I
make this change, the load appears to proceed OK.
Please CC me when replying, as I'm not subscribed to this list.
Received on 2015-01-14 23:48:59 CET