Deduplicating dump/load
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 6 Jan 2015 12:50:40 +0000
The dump and load code contains lots of duplication between svnadmin, svndumpfilter and svnrdump. We know that code duplication leads to inconsistency and bugs. See, for example, issue #4476 "Mergeinfo containing r0 makes svnsync and dump and load fail". We should not have allowed this duplication to happen, but we did, and eventually if we are to avoid further decay we need to refactor it to remove the duplication.
Our dump/load modules are currently organized like this:
-----------------------------------------------------------------------|
svnadmin load
[ svn_repos_load_fs5() ]
dumpstream -> [parser] => [ get_fs_build_parser5() ] -> repos-API
svnrdump load
[ svn_rdump__load_dumpstream() ]
dumpstream -> [parser] => [ rdump-loader-vtable ] -> RA-API
svndumpfilter
[ do_filter() ]
dumpstream -> [parser] => [ filtering_vtable ] -> dumpstream
svnadmin dump
[ svn_repos_dump_fs3() ]
repos-API -> [svn_repos_replay2 ] => [dump_editor1]-> dumpstream
svnrdump dump
[ replay_revisions() ]
RA-API -> [svn_ra_replay_range] => [dump_editor2] -> dumpstream
-----------------------------------------------------------------------|
We already use three instances of the same "parser" (svn_repos_parse_dumpstream3) -- Hooray! -- but a lot of duplication still remains.
First, we need to separate the filtering logic from the input/output conversion. We can do it like this (the double arrow "=>" represents the dumpstream API "svn_repos_parse_fns3_t"):
-----------------------------------------------------------------------|
svnadmin load
dumpstream -> [parser] => [filter1] => [repos-writer ] -> repos-API
svnrdump load
dumpstream -> [parser] => [filter2] => [RA-writer ] -> RA-API
svndumpfilter
dumpstream -> [parser] => [filter3] => [stream-writer] -> dumpstream
svnadmin dump
repos-API -> [repos-] => [filter4] => [stream-writer] -> dumpstream
[reader]
svnrdump dump
RA-API -> [RA- ] => [filter5] => [stream-writer] -> dumpstream
[reader]
-----------------------------------------------------------------------|
Before we can deduplicate the filtering, we need to analyze what operations each tool supports. Most or all of these are optional. Here is my initial tally.
svnadmin load:
* rev range
* renumber revs (and adjust copyfrom+mergeinfo)
* parent dir (and adjust copyfrom+mergeinfo)
* set/keep repository UUID
* run/bypass pre- and post-commit hooks
* run/bypass prop validation
* keep/ignore original commit date stamps
svnrdump load:
* renumber revs (and adjust copyfrom+mergeinfo)
* parent dir (and adjust copyfrom+mergeinfo)
* skip specified revprops
svndumpfilter:
* include/exclude paths
* drop empty revs
* renumber revs
* skip missing merge sources
* rewrite/preserve revprops in empty revs
svnadmin dump:
* rev range
* incremental/full first revision
* deltas
svnrdump dump:
* rev range
* incremental/full first revision
Then decide which operations are generic filtering and which are specific to an input or output module.
Generic filtering:
* rev range (but more efficient if implemented in an input module)
* renumber revs
* parent dir
* skip specified revprops
* include/exclude paths
* drop empty revs
* skip missing merge sources?
* rewrite/preserve revprops in empty revs
Input/output module functionality:
* rev range
- all readers
* set/keep repository UUID
- repos-writer
* run/bypass hooks
- repos-writer
* run/bypass prop validation
- repos-writer
* keep/ignore date stamps
- repos-writer
* incremental/full first revision
- repos/RA readers
* deltas
- repos-reader?
So far, I have a working patch (attached) for "svnadmin load", splitting the filtering from the repos-writer. I am not sure whether to commit it as is; it's not perfect in a couple of details but it's pretty close. I intend to continue.
The biggest difficulty will likely be in deduplicating things like renumbering revisions, where the details of the functionality are already slightly different among the three tools and not very well planned nor documented. Most of the other parts I expect to be relatively straightforward.
Any thoughts or encouragement?
- Julian
|
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.