[PROPOSAL] how to implement 'svn obliterate'

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Wed, 16 Apr 2008 16:23:36 -0400

This isn't 1.5-related, but I wanted to post it before I forgot.

Here's a cheap plan for implementing 'svn obliterate'. I'm interested
in implementing it, but there are more pressing things on my plate right
now, and there's no reason it has to be me. If someone wants to run
with this, go for it! I will link to this mail from issue #516.

The Plan:
=========

Use svn_repos_replay2() to send changes through an "obliterate editor"
(defined below) to create a new repository that doesn't have whatever is
being obliterated. As necessary, run repeated "catch up" passes until
HEAD is the same in both repositories. Then lock the old repository and
replace its db/ subdir with the one from the new repository. Finally,
remove the remainder of the old repository.

An "obliterate editor" is an editor created and massaged like so:

   /* Set @a *editor and @a *edit_baton to an editor that can obliterate
    * history from @a repos. Allocate the editor in @a pool.
    *
    * Pass @a *edit_baton to svn_repos_obliterate_path() and
    * svn_repos_obliterate_rev() as many times as necessary to specify
    * the obliterations you want, before you use @a *editor.
    *
    * @note @a *editor->open_root() creates a new temporary repository
    * (with the same UUID as @a repos) into which the filtered data is
    * replayed. When done, @a *editor->close_edit() locks @a repos,
    * splices the relevant parts of the temporary repository into
    * @a repos, and removes the temporary repository.
    *
    * @a *editor->abort_edit() just removes the temporary repository.
    */
   svn_repos_get_obliterate_editor(const svn_delta_editor_t **editor,
                                   void **edit_baton,
                                   svn_repos_t *repos,
                                   apr_pool_t *pool);

   /* In @a edit_baton (received from svn_repos_get_obliterate_editor())
    * specify that @a path and all its copies are to be obliterated.
    *
    * If @a rev is SVN_INVALID_REVNUM, then obliterate every occurrence
    * of @a path in the repository, no matter what its contents or
    * provenance, as though it had never been committed, and likewise
    * obliterate every copywise descendant of @a path.
    *
    * If @a rev is not SVN_INVALID_REVNUM, then obliterate @a path as it
    * appears in @a rev: that is, find the revision in which @a path in
    * @a rev was first committed and make that change not have happened,
    * so that the next change to @a path is whenever it was next
    * committed to after that. Then obliterate all copywise descendants
    * of @a path as it appears in @a rev, except for those that can be
    * re-assigned a copy-history from an unobliterated node revision (in
    * which case, do so).
    *
    * ### TODO: That last requirement is kind of complicated, and there
    * ### may be other reasonable ways to behave too. What can I say:
    * ### this kind of question is precisely why we haven't implemented
    * ### obliterate yet. For those who thought the obstacle was
    * ### difficulty of implementation, rather than the difficulty of
    * ### determining the right behaviors: now do you see what I meant? :-)
    *
    * If @a obliterate_identicals is true, obliterate every version of
    * every path in the repository that has contents identical to @a path
    * (in @a rev if @a rev is not SVN_INVALID_REVNUM).
    *
    * Use @a pool for temporary allocation only.
    *
    * ### TODO: Should we offer a 'keep_copies' flag? I don't see a
    * ### compelling use case for it, though.
    */
   svn_repos_obliterate_path(void *edit_baton,
                             const char *path,
                             svn_revnum_t rev,
                             svn_boolean_t obliterate_identicals,
                             apr_pool_t *pool);

   /* In @a edit_baton (received from svn_repos_get_obliterate_editor())
    * specify that @a rev is to be treated like it never happened.
    * That is, for each path P changed in @a rev, have the same effect
    * as calling:
    *
    * svn_repos_obliterate_path(@a edit_baton, P, @a rev, FALSE, @a pool)
    */
   svn_repos_obliterate_revision(void *edit_baton,
                                 svn_revnum_t rev,
                                 apr_pool_t *pool);

Advantages:
===========

* The repository remains accessible while obliterate runs, since it just
  locks the repository for a constant amount of time at the end, like
  commit. On the other hand, admins are certainly free to make the
  repository inacccessible during the obliteration if they want to. We
  should probably offer a flag ('svnadmin obliterate --lockout') to make
  that easy to do.

* Obliterate is server-side only, you have to be an admin to run it.

Disadvantages:
==============

* The cost of an obliterate is proportional to the total number of paths
  in the repository, not to the number of things being obliterated. Oh
  well. Since it doesn't have to block access, I'm not sure how
  important this is. Also, we can detect certain shortcut cases and do
  them more quickly (for example, deleting HEAD is just a matter of
  removing and tweaking some files/directories; deleting a revision
  older than HEAD when nothing touched in that revision changed between
  then and HEAD is subject to similar shortcuts).

* Obliterate is server-side only, you have to be an admin to run it.

The I-Know-What-You're-Thinking Department:
===========================================

Yes, we'll need to rev svn_repos_replay2() to make svn_repos_replay3(),
which has the following changes:

- There's a new flag 'fulltext_only' that tells the driver to assume
the consumer has no access to the prior content of files.

- The 'send_deltas' flag is changed to 'skelta_only' and its sense is
reversed.

The point of the first change is to allow us to leave out one revision
of a file and still be able to receive subsequent revisions of that
file. The point of the second change is to rename the 'send_deltas'
flag, so the first change won't be confusing.

General Discussion:
===================

In svn_repos_obliterate_path() and svn_repos_obliterate_revision(), you
can see that there are many, many possible ways they *could* behave, and
they are arguments for and against all of them. I picked the above
behaviors somewhat at random. Figuring out how those APIs should work
has been the central problem of designing 'svn obliterate' all along,
and is the real reason it has not been implemented yet. I hope that by
presenting the problem in API form, I've at least clarified the
questions somewhat.

This proposal doesn't really discuss the command-line interface. I
think we should decide on the programmatic API and let that suggest the
natural user interface. (Of course, both should ultimately be derived
from use cases.) Here I'm just trying to give an overview of *how* to
implement whatever we decide, not launch the bikeshed-painting party
that will inevitably come.

As long promised, we completely punt on the working copy side. Admins
are on their own there. (In fact, sending obliteration commands down to
the client would be rather undesirable in some use cases anyway.)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-04-16 22:23:47 CEST

This message: [ Message body ]
Next message: Karl Fogel: "Re: [PATCH] don't store plain-text passwords by default"
Previous message: Stefan Sperling: "Re: embedded versioning"
Next in thread: Branko ÄŒibej: "Re: [PROPOSAL] how to implement 'svn obliterate'"
Reply: Branko ÄŒibej: "Re: [PROPOSAL] how to implement 'svn obliterate'"
Reply: David Glasser: "Re: [PROPOSAL] how to implement 'svn obliterate'"
Reply: Blair Zajac: "Re: [PROPOSAL] how to implement 'svn obliterate'"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]