Re: [RFC] Altering copyfrom information in repository

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 7 Dec 2011 12:40:19 +0000 (GMT)

Hi Johan. See below...

On 28 November 2011, Johan Corveleyn wrote:
> On Mon, Nov 28, 2011 at 7:32 AM, Daniel Shahaf wrote:
>> On Sunday, November 27, 2011 11:16 PM, "Johan Corveleyn" wrote:
>>> <wild idea>
>>> What if we could 'svnadmin (re)load' a single revision $REV in a
>>> repository, which would then automatically fix up everything coming
>>> after $REV:
>>>
>>> 0. Take backup
>>> 1. Dump $REV
>>> 2. Fix $REV.dumpfile with some dumptool
>>> 3. Take repo offline
>>> 4. Reload $REV (fixes up everything after $REV)
>>> 5. Bring repo back online
>>>
>>> For the part of "... automatically fix up everything coming after
>>> $REV":
>>>
>>> - naive approach: simply dump+load internally in the repository
>>> ("reload") everything from $REV+1 until HEAD.
>>>
>>> - better approaches may be possible, depending on the change that
>>> was done in $REV, and depending on the type of backend.
>>>
>>> Of course this reloading step will be more costly if $REV is far
>>> before HEAD, but that's normal I guess. If you are able to fix
>>> problems not too late after they happened, the reloading cost will be
>>> reasonable.
>>> </wild idea>
>>>
>>> Thoughts?
>>>
>>> --
>>> Johan
>>>
>> You're asking how to implement a generic rewrite of a historical
>> revision, but aren't addressing the question of what to do with
>> younger-than-the-
>> rename revisions that do not apply (in the libsvn_delta, libsvn_diff, or
>> tree-delta sense) to the modified history.
>
> I'm not sure I understand. If all those younger-than-the-rename
> revisions are "reloaded", there wouldn't be a problem, right? Ok,
> maybe some of them don't need to be touched in any way, because they
> do not apply to the modified history, but that can be seen as an
> optimization, right?
>
> It's actually a bit similar to your suggestion of 'svnsync
> --up-to-revision', which you made elsethread. But with dump/load, and
> wrapped into a convenient tool for an svn administrator.
>
>> If you're serious about solving this problem I strongly suggest that you
>> talk to Julian. I think he went up and down this path so much that he
>> can tell the squirrels' furs' colors from hearing.
>
> Right. Julian, what do you think about all this?
>
> Is "making it easier to dump+load a single revision" an option to make
> it possible to "fix history" (of a single revision)? Or is it a dead
> end?

That could certainly be helpful in implementing one part of any such history-editing feature. I see two difficult areas. Let's say you change rX.

From a high-level point of view, what result do you want when a subsequent revision rY (where Y > X) touches a file or directory that would have existed in rX but no longer exists in rX because of the change made to rX? It's not difficult to specify some reasonable options here (things like: adjust rY to leave the final state of rY just as it was, which may involve recreating any nodes that were obliterated from rX; or delete the node; or bail out), it's just a matter of choosing, so in a sense this isn't a difficulty just a design choice.

From an implementation POV, as soon as you replace rX with a new rX, the subsequent revisions in the repository become invalid unless the change you made to rX was very simple. Any deltas based on rX, any copy-from pointers, node Ids, and so on, may become invalid. So you can't in general replace rX inside the repository. If you did so, then r(X+1) up to HEAD would immediately become more or less unreadable, broken. One solution is to copy the whole repo up to r(X-1) and then load the new revisions into that copy of the repository. But if you really want to do this inside the repository, which is what I was trying to do, then in order to fix up all the revisions rX+1:HEAD you need to do something like either keep track in memory of what you are updating and rewriting, which gets quite complex; or fork the history inside the repository (leave the old rX in place, write a new chain of revisions rX' rX+1' rX+2', while reading from the original
chain rX rX+1 ...), and then make the new (rX' ...) chain active and delete the old chain.

The benefit of 'forking' the chain of revisions is that the repository filesystem code can read the old revisions on request, and so you could for example convert them into dump file format. Conversely, to keep track in memory of what you are updating and rewriting, and traverse rX:HEAD fixing up as we go, that necessarily must be done at a very low level because those revisions are already 'broken' by the time we come to fix them up, and so they cannot be read by the normal APIs.

That's the stuff I tried to get my head around before.

If we choose to only support some very limited transformations within rX, then the 'traverse rX+1:HEAD, fixing them up as we go' approach could perhaps be simple enough to be feasible. But it's still low-level code and thus specific to each FS back-end, with the problem that FSFS is more in demand but BDB is much easier to do this sort of thing.

Now I'm thinking the 'fork history inside the repo' or 'clone the repo' approaches are better, even though they require more disk space and/or more time, because being higher level gives several advantages. If we adapt your idea of making it easier to 'replace' a revision, and instead make it easier to import and export a revision, then that would certainly be a useful part of such a solution.

- Julian

> As I said earlier in this thread, I'm staying away from direct
> manipulation of the FS (leaving that for FS-NG), but thinking about
> making it as easy and fast as possible to fix things through dumpfile
> manipulation (and absorbing a fixed dumpfile in an existing
> repository).
>
> Concerning the question "if I'm serious about solving this
> problem":
> well, at this point it's really only an academic exercise, trying to
> find a way which works in theory. I'm not sure if I'm serious about
> implementing this myself (at my current rate (of free time x speed)
> that might take me several years, and I'm not sure that's worth it for
> my personal situation). So I guess at this point I'm just throwing
> around some ideas, hoping that some good will come out of it, and that
> someone will one day write the code to do it :-).
>
> --
> Johan
>
Received on 2011-12-07 13:40:56 CET

This message: [ Message body ]
Next message: Philip Martin: "Re: Move tracking and NODES.moved_to/moved_here"
Previous message: Bert Huijben: "RE: Move tracking and NODES.moved_to/moved_here"
Next in thread: Johan Corveleyn: "Re: [RFC] Altering copyfrom information in repository"
Reply: Johan Corveleyn: "Re: [RFC] Altering copyfrom information in repository"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]