Mergeinfo containing r0 makes svnsync and svnadmin dump fail
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 3 Mar 2014 16:24:44 +0000 (GMT)
A customer found that 'svnsync' errored out on trying to sync a revision in which the source repository introduced some mergeinfo starting with r0, similar to this example:
$ svn propget svn:mergeinfo ^/foo_at_1234567
The svnsync error message was:
$ svnsync sync ...
We believe this mergeinfo entered the repository through a 1.6 server. It was committed in mid-2012. 1.7.x servers reject such commits, as do the later 1.6.x servers [1]. Probably 1.6 clients were in use too, although it may have been committed from a non-Subversion client such as git-svn.
Anyhow, the situation is that we have at least one Subversion repository containing data that the 1.7 server tools reject as invalid. 'svnsync' errors out. Even 'svnadmin dump' errors out at this revision if you specify any non-zero start revision, because it parses the mergeinfo to see if it points to a revision earlier than the start rev. Like this:
$ svnadmin dump --incremental -r5 repo
What is our migration path for this data?
We can figure out a short term work-around, perhaps using the unsupported "SVNSYNC_UNSUPPORTED_STRIP_MERGEINFO" environment variable to bypass the mergeinfo change for each revision that adds or changes such mergeinfo, if there aren't too many of them and if they aren't present on active branches. (We can write a pre-commit hook to make svnsync stop after just one revision, since it doesn't have a --revision option.)
But for a proper fix?
In the past we decided that some known ways in which mergeinfo can be malformed should be silently corrected or tolerated.
leading slash is required
path-revs pointing to non-existent node-revs
revision zero
other parse errors
This all makes me a bit uneasy. We seem to have a number of data transformations going on at quite a low level, and I'm not sure what the canonical position is. I would like us to have a definition of what constitutes "the same mergeinfo" in a repository before and after dump/load, and a way of testing that.
Philip pointed out that it's a good idea for 'dump' to dump whatever is present, and not error out and not try to correct or normalize it. If any correction or normalization is to be done, 'load' is a better place to do it. That minimizes the risk of a damaged archive due to bugs, if you archive the dump file.
Clearly there are some things we should do:
* Make 'dump' not error out, but rather ignore the broken mergeinfo for the purposes of the warning that it refers to older revisions.
Other changes?
* Make 'svnsync sync' strip out the revision 0 from that mergeinfo? Or make it strip out the whole mergeinfo property if it fails to parse? Or just that line of the property value? (If we do something like that, I'd like us to do it everywhere we ignore bad mergeinfo, not just here.)
Thoughts?
- Julian
[1] Servers >= 1.6.18 for HTTP, >= 1.6.17 for the other protocols, reject unparseable mergeinfo -- see issue <http://subversion.tigris.org/issues/show_bug.cgi?id=3895>.
-- Join WANdisco's free daily demo sessions on Scaling Subversion for the Enterprise <http://www.wandisco.com/training/webinars>Received on 2014-03-03 17:25:21 CET |
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.