[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Deduplicating dump/load

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 6 Jan 2015 12:50:40 +0000

The dump and load code contains lots of duplication between svnadmin, svndumpfilter and svnrdump. We know that code duplication leads to inconsistency and bugs. See, for example, issue #4476 "Mergeinfo containing r0 makes svnsync and dump and load fail". We should not have allowed this duplication to happen, but we did, and eventually if we are to avoid further decay we need to refactor it to remove the duplication. Our dump/load modules are currently organized like this: -----------------------------------------------------------------------| svnadmin load                 [        svn_repos_load_fs5()         ]   dumpstream -> [parser] => [ get_fs_build_parser5()  ] -> repos-API svnrdump load                 [    svn_rdump__load_dumpstream()     ]   dumpstream -> [parser] => [   rdump-loader-vtable   ] -> RA-API svndumpfilter                 [             do_filter()             ]   dumpstream -> [parser] => [    filtering_vtable     ] -> dumpstream svnadmin dump                 [        svn_repos_dump_fs3()         ]   repos-API  -> [svn_repos_replay2  ] => [dump_editor1]-> dumpstream svnrdump dump                 [         replay_revisions()          ]   RA-API     -> [svn_ra_replay_range] => [dump_editor2] -> dumpstream -----------------------------------------------------------------------| We already use three instances of the same "parser" (svn_repos_parse_dumpstream3) -- Hooray! -- but a lot of duplication still remains. First, we need to separate the filtering logic from the input/output conversion. We can do it like this (the double arrow "=>" represents the dumpstream API "svn_repos_parse_fns3_t"): -----------------------------------------------------------------------| svnadmin load   dumpstream -> [parser] => [filter1] => [repos-writer ] -> repos-API svnrdump load   dumpstream -> [parser] => [filter2] => [RA-writer    ] -> RA-API svndumpfilter   dumpstream -> [parser] => [filter3] => [stream-writer] -> dumpstream svnadmin dump   repos-API  -> [repos-] => [filter4] => [stream-writer] -> dumpstream                 [reader] svnrdump dump   RA-API     -> [RA-   ] => [filter5] => [stream-writer] -> dumpstream                 [reader] -----------------------------------------------------------------------| Before we can deduplicate the filtering, we need to analyze what operations each tool supports. Most or all of these are optional. Here is my initial tally. svnadmin load:   * rev range   * renumber revs (and adjust copyfrom+mergeinfo)   * parent dir (and adjust copyfrom+mergeinfo)   * set/keep repository UUID   * run/bypass pre- and post-commit hooks   * run/bypass prop validation   * keep/ignore original commit date stamps svnrdump load:   * renumber revs (and adjust copyfrom+mergeinfo)   * parent dir (and adjust copyfrom+mergeinfo)   * skip specified revprops svndumpfilter:   * include/exclude paths   * drop empty revs   * renumber revs   * skip missing merge sources   * rewrite/preserve revprops in empty revs svnadmin dump:   * rev range   * incremental/full first revision   * deltas svnrdump dump:   * rev range   * incremental/full first revision Then decide which operations are generic filtering and which are specific to an input or output module. Generic filtering:   * rev range (but more efficient if implemented in an input module)   * renumber revs   * parent dir   * skip specified revprops   * include/exclude paths   * drop empty revs   * skip missing merge sources?   * rewrite/preserve revprops in empty revs Input/output module functionality:   * rev range     - all readers   * set/keep repository UUID     - repos-writer   * run/bypass hooks     - repos-writer   * run/bypass prop validation     - repos-writer   * keep/ignore date stamps     - repos-writer   * incremental/full first revision     - repos/RA readers   * deltas     - repos-reader? So far, I have a working patch (attached) for "svnadmin load", splitting the filtering from the repos-writer. I am not sure whether to commit it as is; it's not perfect in a couple of details but it's pretty close. I intend to continue. The biggest difficulty will likely be in deduplicating things like renumbering revisions, where the details of the functionality are already slightly different among the three tools and not very well planned nor documented. Most of the other parts I expect to be relatively straightforward. Any thoughts or encouragement? - Julian

Received on 2015-01-06 13:54:47 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.