[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Delete *for real* from a repository

From: Bolstridge, Andrew <andy.bolstridge_at_intergraph.com>
Date: Fri, 18 Sep 2009 12:44:49 +0100



I see 4 reasons for obliterating data in a repository (and this applies
to every SCM out there).


1. You've accidentally checked-in the wrong file, or the one that
had "my boss is a stupid m******** making we work on this s***" that you
forgot to remove before checkin. Or a password, or confidential data.
You'd want to take that data out of the repository forever. This doesn't
have to be a true obliterate, you can overwrite the data with zeros and
the job would still be acceptable, or the server could be marked with a
'no way dude' flag that absolutely prevented anyone from reading that

2. You accidentally checked in a 1Gb file and now you want to
remove it to save disk space in your backups. This is relatively common
in SVN as there are no filetype checks by default (you can set your
client global-ignores, but if a new team member arrives and doesn't read
the corporate documentation because he thinks he knows what he's doing,
then you can easily get into this problem). In this case, you want to
delete the file almost as soon as it enters the repository, and you're
not concerned about history or any other side effects.

3. You no longer need a file or project in the repository, you can
delete it, but you will absolutely never ever ever need it again, and so
it can be permanently removed to free up disk space in the backups.

4. You need to archive off ancient revisions to keep your live
repository working smoothly. This really only applies to corporate orgs
that have ever-increasing repositories full of data. One day you will
want to archive all revisions that were checked in over a year ago.
These may be kept as an archive repo in case someone wants to view them,
but you want to keep the recent revisions only to make the repo work
faster, and your current backups backup quicker.


Now some of these are not really necessary with SVN, #4 for instance, as
I've not found SVN to slow down much even though my repo is 12Gb in
size. I know we used to archive off VSS dbs, but that's a whole
different bucket of manure.


#3 and #4 - because svnsync is great at backups, and the backups are
incremental, neither issue is much of a problem. Disk space might be,
but even though I complain at the cost of some enterprise hardware (and
the hoops you have to jump through to be allowed to buy one, those
accountants have to earn their worth somehow too J), even several gig of
data is still not an issue nowadays.


#2 and #1 however.. these are valid reasons for having a svn obliterate
command. In a perfect world, we'd have something that could archive
individual or groups of revisions or directories, based on date or
revnum. Unfortunately, SVN was designed without this feature in mind,
and so we're stuck with the problem until the significant investigation
and rework gets done. It won't happen until someone figures out how to
rewrite a revision's data (as they are all stored as deltas, you need
the previous version to extract a revision, so you'd need to update your
delta based on the previous revision-but-one). Also, because some files
are cheap copies of others, you cannot just delete a file without
finding which files were created from it and rewriting that file's
initial delta too.


However, obliterating a newly added file/revision should be easy, and I
think, cater for most cases where someone wants to obliterate a file for
privacy reasons. Trouble is, can you guarantee that no-one has made a
copy of that file in the time between checkin and obliterate?




Incidentally I store binaries in my repo, and the binary compression
algorithm can work wonderfully (a 2mb resource binary can be turned into
a 100k delta), so I'm not too fussed about that, especially as
rebuilding versions is truly horrible to contemplate a year down the
line and you haven't quite got the correct build dependencies anymore.
This is why people do have a valid desire to check in binaries.



So, perhaps it isn't so difficult to include a form of obliterate that
is only for admins to run, that might corrupt parts of your repo, but
will remove a complete file or HEAD revision from the repo completely. A
full archival mechanism can wait.


Does anyone have links for archiving old revisions from a subversion
repo. Google isn't too hot with the word 'archive'.




To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-09-18 13:45:43 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.