Hi Jack,
I would say that you both correctly spotted the problem (that
the complexity of consistently modifying the history of the repository
is magnified because of the wide variety of use-cases), and my proposed
solution (to try to factor some of the complexity out of what could
be thought of the "core" obliterate functionality, so that it could be
"dealt with later").
The question is then, is my proposed solution feasible? Needless to
say, I think it is. See specific comments below.
On Mar 2, 2009, at 8:17 PM, Jack Repenning wrote:
> I seem to see a problem here, or perhaps I only fail to see the
> solution. Let me spin a user story and see where it takes us.
>
> Suppose we're dealing with the "security" form of the problem: some
> information has been introduced into the repository that ought not to
> have been, and we need to ensure that it disappears, as thoroughly as
> possible. Suppose, further, that this sensitive information was
> introduced in the form of comment text in a source-code file. The
> error was introduced as a change at the/bad/path_at_BADREV. Changes to
> the/bad/path have also been made in (BADREV+1) and so on. Feel free to
> assume any ugly thing you like such as copies, post-BADREV, to other
> paths.
>
> In such a situation, it's not just the/bad/path_at_BADREV that must be
> expunged, but in fact all the later revisions based on it (unless,
> indeed, we can positively determine that someone edited that text out
> again at some later date).
Yes, absolutely. And all kinds of usability issues arise, not only
copies, but merges, too. And should we purge the copies, but leave the
merges, or vice versa. As you say, ugly.
> So either the OBLITERATION SET includes the/bad/path_at_BADREV and also
> all derived paths and revs (in which case, we need to automate finding
> them all, 'cause depending on the peoples for this won't fly), or
> alternatively some files_at_REVS not in the OBLITERATION SET need to have
> check-outs which differ depending on whether they come from the
> "original" or "modified" repository.
>
> Which did you have in mind?
The former.
And yes, my idea is to automate finding them all. It's just that I
think that "finding", or "constructing the correct obliteration set"
is going to seem much more manageable if we are absolutely clear on
what happens after the set has been defined, and don't have to worry
about that as well.
Writing code that messes with the repository data while leaving it in
a well defined and consistent state is a challenging task as it is,
even if the functionality is 100% defined.
> But conversely, if we're dealing with the disc-space form of the
> problem, then we exactly do not want these later paths_at_REVS affected.
Exactly.
> We want to remove the space no longer in use, but the space that makes
> up some ancient delta which is still in use we should not remove, but
> rather keep. A checkout of path_at_HEAD should show the same result,
> including lines that "svn blame" would show us were added at r1, even
> though we've removed (what we can of) revs 1-10000.
I absolutely agree that (core) obliterating ^/@1:10000 should have *no*
effect on the bytes returned by a checkout of HEAD, in a repository
that was up to revision 10001 before obliteration.
I have to think about svn blame. Are you saying that "svn blame"
should continue to return the same output as before the obliteration?
That does not seem right to me. I would say that after the above
obliteration the repository would look like it had 10000 empty
commits, and one huge commit in the end. Everything would look as if
the author of the last commit had added everything. After all, blame
is just a function of the revision in which a line was added to
the repository and of the revision properties.
> So it seems like one form of obliterate most definitely _does_ want
> some sort of closure used based on the indicated problem point, while
> the other form most definitely does _not_ want that closure applied.
Agreed, so after the first implementation of obliterate, which might
have the syntax:
svn obliterate ^/bad/path/very/bad/path_at_13:666
We might add switches to the command of the form:
svn obliterate --include-descendants ^/path_at_100
svn obliterate --include-descendants --include-copies-from ^/path_at_100
svn obliterate --include-descendants --include-merges-from ^/path_at_100
And of course, if we want to find the ancestors instead:
svn obliterate --include-ancestors ^/path_at_100
svn obliterate --include-ancestors --include-copies-to ^/path_at_100
svn obliterate --include-ancestors --include-merges-to ^/path_at_100
It would also be very reasonable to interpret
svn obliterate ^/bad/path
as a shorthand for
svn obliterate ^/bad/path_at_0:HEAD
But the list does not stop here. What about the following use-case,
which may seem silly, but is actually quite reasonable in some
work flows:
svn obliterate --find-me-all-psd-files-older-than-three-months-
that-have-modifications-occurring-less-than-one-week-apart-
and-obliterate-the-next-to-last-commit-in-the-series-
then-repeat-until-there-is-at-least-one-week-
between-deltas ^/my/really/big/photoshop/projects
(Of course, the above syntax is silly in any case).
And as Brane noted, obliterating key links in the revision tree
may be undesirable (even if the result is well-defined), so
we might imagine:
svn obliterate --exclude-copies-from ^/old/and/big
And so on ...
I think all of these use-cases, and more, can be implemented on
top of an "obliteration-set" driven core functionality. Some of
them can eventually (or immediately) find their way into the
utility that subversion users see, others will only be available
in perl scripts operating on log files (but note that all of them
could be implemented through "svn log", "perl" and
"core obliteration".)
Furthermore, if agreement is reached these use-cases will find
their way into obliterate-functional-spec.txt as "add-on"
features, of different priority.
Best,
Magnus
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1269031
Received on 2009-03-05 11:44:26 CET