On Tue, 2010-11-09, Eric Raymond wrote:
> Some months back I contributed svncutter to Subversion. This was a tool
> for doing surgery on dumpfiles intended to remove artifacts associated with
> conversions from older VCSes.
>
> My interest in tools for repository surgery has continued, and I recently
> spotted an opportunity in the increasing use of git-fast-import streams
> as a history-interchange format. I have written what I believe is the first
> *native* application for fast-import streams, a repository editor I
> call reposurgeon.
Very cool. I wonder how practical it will be for doing various
"obliterate" tasks on large repositories. ("Obliterate" can mean quite
a range of different things, including remove files, or set their
content to empty, in certain revs or rev ranges.) There is still demand
for on-line obliteration (to be performed while the untouched parts of
the repository remain accessible) but that is very difficult to achieve
(and I'm stalled), and I hope this option for off-line editing may be
able to take some of the pressure off. Peter S just reminded me that
some obliterate tasks involve only a few recent revisions, so we wonder
if it is practical to dump and edit and re-load only the last few
revisions, if we assume that we can make Subversion forget the last N
revisions.
> You can read the announcement here: http://esr.ibiblio.org/?p=2718
>
> Project resource page with tarballs: http://www.catb.org/~esr/reposurgeon/
>
> Freshmeat page: http://freshmeat.net/projects/reposurgeon
>
> HTML manual: http://www.catb.org/~esr/reposurgeon/reposurgeon.html
>
> Perhaps the most interesting thing about reposurgeon is that, by
> design, it knows almost nothing about any individual VCS. All it
> counts on is the ability to get a fast-import dump from a repo and
> then the ability to create a repo from the dump after the contents of
> the import stream has been modified.
>
> If you hadn't heard about this before, it's because the project is in
> alpha and only two weeks old. Nevertheless, it is already good enough
> for production use on git repositories. Operations supported include
> editing of commit and tag metadata, deletion of commits, expunges of
> file history, coalescing single-file commit cliques with identical
> comments, and topological cut. The code is backed by an extensive
> regression-test suite and fully documented.
>
> I also have working support for bzr and hg, though the practical utility
> of same is presently limited by unstable and poorly-supported export/import
> tools. I'm working with a bzr dev to address this problem; better solutions
> should be forthcoming within weeks, if not days.
>
> Which brings me to my feature request. Please add native support for
> fast-export and fast-import to svndump. This would be a good idea
> in general, but my specific reason for wanting it is to enable
> reposurgeon to edit Subversion repositories.
>
> The export side is, of course, almost trivial. Proof of concept under
> MIT license is here: <http://c133.org/code/svn-fast-export.c>. It
> needs a bit of extension work around tags and branches; I won't
> belabor the obvious (and easily solvable) issues with those. There are
> two more substantive ones:
>
> 1) Whatever merge-tracking hair you represent internally should be dumped
> 'as 'merge' commit properties.
>
> 2) User commit properties (e.g. those not in the svn: namespace)
> should be exported using the bzr properties extension, which
> reposurgeon handles now and which seems likely to make it into git core at
> some point. Syntax:
>
> property <space> NAME <space> VALUE-LENGTH <space> VALUE LF
>
> or, if the value is empty:
>
> property <space> NAME LF
>
> NAME and VALUE are utf8-encoded. The properties for each commit are sorted
> by the property name
Ah, so the format doesn't support arbitrary 'binary' property values? I
guess we can seek a way to work around that.
> Also note that an import stream actually containing commit-property declarations
> should have a line reading "feature commit-properties" before the first commit.
>
> The import side is less trivial, but given that you've already got internal
> representations for merge-tracking it shouldn't be too difficult either.
>
> I'd offer to do this, but I'm deliberately staying away from writing
> export/import code myself, other than the implementations inside
> reposurgeon. It will be better, long-term, if my reposurgeon
> assumptions don't leak into other implementations; they ought to be
> engineered from the fast-import stream documentation. See the
> definitive web page at:
>
> <http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html>.
>
> Finally, I will note that I think this feature could be significant
> for Subversion's competitive posture. Because exporters are easy while
> importers are more difficult, supporting import streams only with
> exporters and only through sketchy third-party tools tends to
> encourage migration to git while discouraging migration away from it.
>
> Other VCSes, with bzr taking point, are positioning themselves as
> destinations rather than places to leave by mainlining importers. As
> a friend of Subversion, I strongly recommend that it should do
> likewise.
Thanks.
- Julian
Received on 2010-11-09 19:53:48 CET