Re: Delete for real from a repository

From: Nico Kadel-Garcia <nkadel_at_gmail.com>
Date: Fri, 18 Sep 2009 08:34:27 -0400

On Fri, Sep 18, 2009 at 7:44 AM, Bolstridge, Andrew
<andy.bolstridge_at_intergraph.com> wrote:
> Obliterate:
>
>
>
> I see 4 reasons for obliterating data in a repository (and this applies to
> every SCM out there).
>
>
>
> 1. You’ve accidentally checked-in the wrong file, or the one that had
> “my boss is a stupid m******** making we work on this s***” that you forgot
> to remove before checkin. Or a password, or confidential data. You’d want to
> take that data out of the repository forever. This doesn’t have to be a true
> obliterate, you can overwrite the data with zeros and the job would still be
> acceptable, or the server could be marked with a ‘no way dude’ flag that
> absolutely prevented anyone from reading that revision.

Or typos: "svn copy branches tags" instead of "svn copy
branches/mybranch-1.0.1 tags" can happen, and causes havoc in the tags
filetree.

>
> 2. You accidentally checked in a 1Gb file and now you want to remove
> it to save disk space in your backups. This is relatively common in SVN as
> there are no filetype checks by default (you can set your client
> global-ignores, but if a new team member arrives and doesn’t read the
> corporate documentation because he thinks he knows what he’s doing, then you
> can easily get into this problem). In this case, you want to delete the file
> almost as soon as it enters the repository, and you’re not concerned about
> history or any other side effects.

And nightly binary build trees. These might benefit from being
mirrored to another repository better suited to holding binaries, but
the difficulty of transferring history to another repository makes
this prohibitively awkward.

>
> 3. You no longer need a file or project in the repository, you can
> delete it, but you will absolutely never ever ever need it again, and so it
> can be permanently removed to free up disk space in the backups.

See above about nightly binary build trees. Also, for old repositories
that used to have things arranged differently, "svn update -r[old
revision] can cause fascinating chaos in such cases.

> 4. You need to archive off ancient revisions to keep your live
> repository working smoothly. This really only applies to corporate orgs
> that have ever-increasing repositories full of data. One day you will want
> to archive all revisions that were checked in over a year ago. These may be
> kept as an archive repo in case someone wants to view them, but you want to
> keep the recent revisions only to make the repo work faster, and your
> current backups backup quicker.

And log commands, and diff commands.

> Now some of these are not really necessary with SVN, #4 for instance, as
> I’ve not found SVN to slow down much even though my repo is 12Gb in size. I
> know we used to archive off VSS dbs, but that’s a whole different bucket of
> manure.

Sounds like you've sensibly kept individual projects in individual
repositories. I've.... had less success encouraging my clients to do
so. They've often preferred to keep all projects in a central
repository, with individual subdirectories for individual projects.
Any one of them can balloon suddenly either with new development work
or check-in policies, especially of binaries, and accidental DVD
commitment has been a serious image.

> #3 and #4 – because svnsync is great at backups, and the backups are
> incremental, neither issue is much of a problem. Disk space might be, but
> even though I complain at the cost of some enterprise hardware (and the
> hoops you have to jump through to be allowed to buy one, those accountants
> have to earn their worth somehow too J), even several gig of data is still
> not an issue nowadays.

True, but svnsync doesn't bring over the server's configuration files.
That has to occur out-of-band. That's what hotcopy is for, but it gets
*SLOW* as accidental commitment errors recur and are very difficult to
flush. And 50 Gig of high-performance 15,000 RPM hard drive is still
expensive, in accumulated server requirements and especially in
off-line backup resources.

> #2 and #1 however.. these are valid reasons for having a svn obliterate
> command. In a perfect world, we’d have something that could archive
> individual or groups of revisions or directories, based on date or revnum.
> Unfortunately, SVN was designed without this feature in mind, and so we’re
> stuck with the problem until the significant investigation and rework gets
> done. It won’t happen until someone figures out how to rewrite a revision’s
> data (as they are all stored as deltas, you need the previous version to
> extract a revision, so you’d need to update your delta based on the previous
> revision-but-one). Also, because some files are cheap copies of others, you
> cannot just delete a file without finding which files were created from it
> and rewriting that file’s initial delta too.
>
>
>
> However, obliterating a newly added file/revision should be easy, and I
> think, cater for most cases where someone wants to obliterate a file for
> privacy reasons. Trouble is, can you guarantee that no-one has made a copy
> of that file in the time between checkin and obliterate?

Yeah. Under CVS, the ancestor to Subversion, you had the concept of
the 'Attic', and could eventually simply delete the directory from the
server. That followed RCS, where it was file-based information, not a
sophisticated database. Procedurally, it does make sense to allow a
flush only by an admin, to be done with caution and the understanding
that "this may cause problems".

> Incidentally I store binaries in my repo, and the binary compression
> algorithm can work wonderfully (a 2mb resource binary can be turned into a
> 100k delta), so I’m not too fussed about that, especially as rebuilding
> versions is truly horrible to contemplate a year down the line and you
> haven’t quite got the correct build dependencies anymore. This is why people
> do have a valid desire to check in binaries.

Oh, yeah, especially because build environments change despite our best efforts.

> So, perhaps it isn’t so difficult to include a form of obliterate that is
> only for admins to run, that might corrupt parts of your repo, but will
> remove a complete file or HEAD revision from the repo completely. A full
> archival mechanism can wait.

It would also seem reasonable to survey the repository for 'copy'
operations on files and directories, and inform the admin of exactly
what files or directories would be impinged. In fact, if svnadmin
could report that information, it might be very helpful indeed.

> Does anyone have links for archiving old revisions from a subversion repo.
> Google isn’t too hot with the word ‘archive’.

Heh. Yeah, keywords don't always work so well.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2396411

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-09-18 14:35:26 CEST

This message: [ Message body ]
Next message: Nico Kadel-Garcia: "Re: svnsync creates root owned directories"
Previous message: Bolstridge, Andrew: "RE: Delete *for real* from a repository"
In reply to: Bolstridge, Andrew: "RE: Delete *for real* from a repository"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

Re: Delete *for real* from a repository

Re: Delete for real from a repository