Re: [RFC] Altering copyfrom information in repository

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Sun, 27 Nov 2011 23:16:30 +0100

On Wed, Nov 23, 2011 at 12:25 PM, Talden <talden_at_gmail.com> wrote:
> On Wed, Nov 23, 2011 at 3:04 AM, Stefan Sperling <stsp_at_elego.de> wrote:
>> On Tue, Nov 22, 2011 at 01:32:02PM +0100, Johan Corveleyn wrote:
>>>
>>> Having a way to do this with svnsync and svndumptool would already be
>>> very useful. It would at least give some assurance to svn admins that
>>> these things are 'repairable'. Being able to fix a live repository
>>> would of course be even better :-).
>>
>> Editing dump files has always been the approach to fixing up mistakes in
>> history. So I don't think we need a change in the FS backends. Instead
>> we need better dump file editing capabilities. svndumptool might go
>> part of the way. But ideally this would be built into svndumpfilter or
>> a new offical tool that edits dump files (rather than just filtering
>> nodes from a dumpfile).
>
> The approach I've taken a few times is to clone the repo up to but
> excluding the faulty revision, do an incremental dump of revisions
> (full-text, not deltas) following the faulty revision.
>
> Apply the patch of the faulty revision to a WC at the clones tip (but
> with corrected gestures), commit, load remaining revisions into the
> clone. Swap the master with the clone at an opportune time when the
> clone can get caught up and we know that all users are no longer using
> WCs at the problem revision.
>
> This seems a safe path. Step 1, making the clone of the bulk of
> history is pretty slow the official way though (via dump/load).

Interesting approach, which results in a "complete fix". But it seems
like a huge amount of work (and a long time) to fix a single mistake
in history. Just like you, I'd like to have some way of doing this
that's faster and more reliable.

> I'd like to know if there's a reliable way to hot-copy a repo and then
> roll it back to a specific revision (trashing the newer revisions) - I
> haven't looked into how safe that is since repcache and sharding came
> into being but that's the only way this approach could be used on our
> repository given that making a clone to a specific revision using
> dump/load will take a day or so to get anywhere close to our repo tip.
>
> Things I've had to fix
> - lost history (the example of this thread)
> - Squishing revisions (and padding with empties). People forgetting
> to commit a whole tree in merges seems to be the main culprit here or
> people using silly tools that commit every save/rename/add separately.
> - Stripping content - big binaries that are removed from HEAD in short
> order - 0-value history. really a space-is-precious special-case of
> the previous motivation.
>
> In rare-cases I've done some dump-file editing to reshape the tree
> (truly rewriting history). I'm always wary of that - you have to be
> very sure that the loss in real-history is worth the cleanup.
>
> I'd like to do this a lot more often than I do, but the tools are poor
> to achieve it. Sad when you consider that this is one of the
> strengths of the centralised model - that you can fix history - in a
> DVCS once it's out the gate you're pretty much done for unless you
> force everyone else to reclone from your rebase-point and forget any
> history they had intermingled in the abandoned timeline. Shining here
> would make Subversion even more attractive to the corporate space.

+1 to that.

If only we could come up with better tools to fix history (in various ways).

Thinking in the direction of "direct manipulation of the FS", maybe
it's best to postpone any efforts until FS-NG (whenever that will come
around), just like, it seems, is the consensus for 'obliterate'. Maybe
something to take into account when somebody starts working on
FS-NG...

Thinking about dumpfile manipulation: better dumpfile editing tools
would be nice, because they can fix things regardless of the type of
backend. But it would be even nicer if fixing a single revision (via
dumpfile) wouldn't require reloading the entire repository.

<wild idea>
What if we could 'svnadmin (re)load' a single revision $REV in a
repository, which would then automatically fix up everything coming
after $REV:

  0. Take backup
  1. Dump $REV
  2. Fix $REV.dumpfile with some dumptool
  3. Take repo offline
  4. Reload $REV (fixes up everything after $REV)
  5. Bring repo back online

For the part of "... automatically fix up everything coming after $REV":

- naive approach: simply dump+load internally in the repository
("reload") everything from $REV+1 until HEAD.

- better approaches may be possible, depending on the change that
was done in $REV, and depending on the type of backend.

Of course this reloading step will be more costly if $REV is far
before HEAD, but that's normal I guess. If you are able to fix
problems not too late after they happened, the reloading cost will be
reasonable.
</wild idea>

Thoughts?

-- 
Johan

Received on 2011-11-27 23:17:23 CET

This message: [ Message body ]
Next message: neels_at_apache.org: "[svnbench] Revision: 1206936 compiled Nov 28 2011, 00:21:32"
Previous message: Branko ÄŒibej: "Re: Choice of constant in libsvn_fs_base"
In reply to: Talden: "Re: [RFC] Altering copyfrom information in repository"
Next in thread: Daniel Shahaf: "Re: [RFC] Altering copyfrom information in repository"
Reply: Daniel Shahaf: "Re: [RFC] Altering copyfrom information in repository"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]