[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Delete from Repository

From: Michael Eager <eager_at_eagercon.com>
Date: 2006-09-06 18:17:10 CEST

Gale, David wrote:
> Michael Eager wrote:
>> Gale, David wrote:
>>> Les Mikesell wrote:
>>>> On Tue, 2006-09-05 at 11:50 -0700, Karl Fogel wrote:
>>>>> Les Mikesell <lesmikesell@gmail.com> writes:
>>>>>> I'd expect it to have exactly the same results as a
>>>>>> dump/filter/load sequence back into the same location, in which
>>>>>> case the feature is already defined and acceptable. We just need
>>>>>> an implementation that is faster, doesn't need the intermediate
>>>>>> copies, and doesn't break checked-out workspaces any more than
>>>>>> necessary.
>>>>> Sure, any of the proposed behaviors in
>>>>> http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
>>>>> could be implemented via a dump/filter/load sequence, but that
>>>>> doesn't specify which exact one you had in mind. Can you describe
>>>>> it in terms of results, rather than of implementation?
>>>> I still think in terms of the way CVS works so I'd want the effect
>>>> of removing the file,v file from a CVS repository filesystem. That
>>>> is, all versions completely gone at once. Perhaps this could be
>>>> combined with a directory-level dump/filter/restore operation if you
>>>> wanted to put back some subset of versions in its place.
>>> I can see a need for obliterating an entire revision (user puts
>>> sensitive data into a file & commits; need to get rid of the
>>> sensitive info without getting rid of the file itself). This is
>>> option d) in Karl Fogel's comment (link quoted above).
>>> I can see a need for obliterating an entire file, complete with all
>>> files that were copied from it (security guy points out that the
>>> "encrypted" passwords that have been stored since rev 1 are simply
>>> ROT-13'd). I believe this maps to Karl's option c, though maybe b.
>>> This also seems to be what Les is desiring.
>>> I'm not sure what the other two options Karl listed are; could
>>> someone give plausible examples of each?
>>> These two choices would seem to indicate that "svn obliterate" should
>>> have multiple forms, to do different things:
>>> 1) "svn obliterate -r <rev>" should make svn act as if the changes in
>>> revision <rev> never happened; checking out revision <rev> should
>>> give the files as they appeared in revision <rev>-1, and any
>>> subsequent revisions which are stored as deltas off of revision
>>> <rev> would have to have their deltas recalculated. "svn up" on an
>>> existing working copy should rollback the changes in the revision,
>>> potentially causing a conflict. 2) "svn obliterate <file>" should
>>> remove the specified file, across all revisions, following copies &
>>> renames. Since svn doesn't currently have true renames, this may be
>>> tricky. Could potentially invalidate all deltas, though marking the
>>> file as "obliterated" and delaying recalculating the deltas until
>>> some convenient time (such as a dump/load cycle) would help. "svn
>>> up" on an existing working copy should remove the file if there
>>> haven't been any changes to it; otherwise, it should become
>>> unversioned. 3) "svn obliterate -r <rev> <filename>" should (I
>>> think) remove changes to the specified file in the specified
>>> revision, while leaving all other changes in that revision intact.
>>> Revisions which are delta's off of the specific revision would have
>>> to be recalculated. "svn up" would act like case 1), above. 3a) An
>>> alternative interpretation of the command in case 3) would be to
>>> totally remove the file from the specified revision/revision range,
>>> which could be tricky, but I think the interpretation in case 3)
>>> would be the more common one.
>> Cases 1 & 3 seem difficult to perform reliably. What if a subsequent
>> delta depended on the changes made in the revision to be deleted? It
>> might be impossible to recalculate the diffs.
>> Case 2 includes issues about renames and file copies, which may,
>> as noted, be problematic.
>> There appear to be multiple problems masquerading under the same
>> name. The problems for which these three cases are proposed
>> solutions are really not the same as the problem I'd like to see
>> addressed. My problem is not undoing a revision.
>> My problem is simpler: files are checked in which should not have
>> been. I want to delete these unwanted files from the repository.
>> This may be a case of the perfect being the enemy of the good.
>> I make no pretense that my problem or its solution would
>> solve any of the more complex possible problems which one might also
>> want to address. But addressing these problems isn't necessary
>> in order to address the problem I want to solve.
> As Ryan noted in his response, your request is Case 2, just in its
> simplest form.
> For a good design, we need to identify as many use cases and their edge
> cases as possible, so that we can design a system that should be able to
> support all of them (even if it doesn't support them all when initially
> released). Such a list of desired behavior would also be a very good
> first step towards being able to offer a bounty--there are more people
> who are capable of implementing a solid design than are capable of
> creating a solid design (and the former would be cheaper, anyhow).
> It'd be kind of pointless to do all the work to implement "svn
> obliterate" for *just* your situation, which, as I noted, is the
> simplest form of a potentially much more complex requirement, and which
> also has a current workaround (dump-filter-load). If we're going to
> modify the subversion engine, let's try to make sure we do it right.
> I think that this would be a very powerful feature, and one which should
> be added--but the first step needs to be gathering *all* of the
> requirements, not just the simplest use cases.

As I said, the perfect may be enemy of the good. This looks
to me like what Fred Brooks called "gold-plating" in "The Mythical
Man Month"

As others have mentioned, dump/filter/load is both inconvenient
because it requires the repo to be off-line, dumpfilter is awkward
to use, and the process is time-consuming. I don't see it as a viable

This feature has been requested for five years, as I understand
it. There's been substantial collection of requirements over this
time. So far, no one has stepped forward to design the feature,
either in the smaller version I need, or in the more expansive
form you would like to see. Perhaps that might happen if you come
up with a bounty, but it appears that you want a complex design
completed before finding the bounty. Sounds like this could take
more years.

I can implement the solution I need, if I find the time. It doesn't
sound complicated, although some suggestions about where to look
would be appreciated.

If I do this, I'll send in the patch.

Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Sep 6 18:38:48 2006

This is an archived mail posted to the Subversion Users mailing list.