Some of you will recall that I did a lot of work early this year
on the dump stream documentation as part of an effort to enable
reposurgeon to read Subversion dump files directly, translating
Subversion histories into a DVCS-style commit DAG that can then
be exported into git, hg, or bzr.
I am pleased to be able to announce that this effort has been
(somewhat belatedly) successful. The project had stalled for six
months, but production-quality Subversion support in reposurgeon has
now been verified by successful lift to DVCS of a large, old
Subversion repository with lots of ugly metadata corner cases in it
(the repo of the Network UPS Tools project).
Not only does it work, it even works *fast*. The dumpfile analyzer
cranks through more than a thousand commits per minute on vanilla
desktop hardware. Subversion contributor Greg Hudson deserves credit
for this, as he contributed a performance patch implementing
copy-on-write filemaps that tremendously sped up processing.
More than that, a slight extension of Greg's idea enabled me to
abolish some code that I had suspected (correctly, it turns out) of
harboring the subtle bug that had stalled the project for half a
year - eliminating another O(n**2) lookup that Greg hadn't been
originally targeting in the process.
It may be of interest that the bug involved incorrect translation of
two successive copies in opposite directions across a pair of
branches. Another case that gave me trouble was a branch delete
followed by a copy to the same branch name. A third was a directory
copy followed by a file change in one of the copied files *before*
commit.
There are also various mixtures of file system copies with Subversion
copy and commit operations that a tool like this needs to detect and
patch so the history looks as though proper Subversion operations
were used throughout, otherwise the commit DAG will be missing some
ancestry links that semantically ought to be there.
For example, if you (a) create a branch directory, (b) use file system
copy to populate it from another branch, and (c) commit, the DAG
builder needs to detect this and treat step (b) as though it had been
done with Subversion directory and/or file copies. Fortunately
this is a relatively simple exercise in hash matching.
I'm still polishing; one thing that needs more work is interpretation
of mergeinfo properties. The cherry-picking model Subversion uses
doesn't match the way git/hg/bzr want to do things. Simple mergeinfos
translate well but there are complex cases that yield perverse-looking
merge links.
Still, all that wrestling with strange corner cases paid off -
reposurgeon is now better at translating Subversion repos to DVCS
histories than anything else out there. It even handles cross-branch
mixed commits without breaking stride.
But it doesn't try to do everything. One of the philosophical premises
behind reposurgeon was that repository translation is more like
literary translation than people who write repository-conversion tools
normally understand. That is, low-level mechanical translations don't
work very well - they need to be cleaned up by a human who understands
the ontological mismatches between VCSes and the idioms of both source
and target VCS.
A very simple example of this requirement is: what should be done with
Subversion revision references like r456 in commit comments? Not just
"what should they be translated into?" but "how can we even
*recognize* them reliably?" Humans come up with lots of variant ways
to write these even within the same repo, and mechanical translators
have trouble spotting them all.
reposurgeon was built with the goal of amplifying human judgment
(making it as easy as possible for a human to improve on reposurgeon's
basic mechanical translation) rather than trying to eliminate human
judgment. This choice now seems well vindicated.
--
Eric S. Raymond
The only purpose for which power can be rightfully exercised over any
member of a civilized community, against his will, is to prevent harm
to others. His own good, either physical or moral, is not a sufficient
warrant. -- John Stuart Mill, "On Liberty", 1859
Received on 2012-11-19 19:47:00 CET