With all this talk of obliterate, I thought I'd just send along a
script I hacked up to obliterate files from an fsfs repository a while
back. What this does is overwrite the data in the repository with 0s,
making it inaccessible to anyone. If you try to check out the file
later, you'll get a 0-length file. It doesn't actually remove any
files, so a working-copy with that file won't know to retrieve
changes. Thus, after running this script, I also svn rm'd the files
normally to remove them from people's working copies on next svn
update.
It follows both the delta source and file copyfrom chains, clearing
the data of any file which was derived from the obliterated file or
its data.
The goal was to remove some sensitive data from the repository,
completely. Even for those who have access to the repository directly.
That data had been branched/copied/modified over the course of a year
or so, so there were many commits on many paths in the repository that
needed to be removed. That's why this just takes some starting paths
and tracks down all the derivations.
I ran this on a svn 1.3 repository. It may or may not work on others.
It may not work at all. No warranty. Don't blame me if it blows up
your repository. In fact, I recommend not ever running it on your
repository. All I know is that it worked once, and I may have even
broken/changed the script since then. I hope to never have to use it
again. :)
But, I hope this helps show at least one use case which should be
handled by a "proper" solution. I figure having actually written code
to handle this use-case shows its importantance better than just
saying it.
Note: this is derived from the fsfsverify.py script, which was a great
starting point.
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1239944
With all this talk of obliterate, I thought I'd just send along a
script I hacked up to obliterate files from an fsfs repository a while
back. What this does is overwrite the data in the repository with 0s,
making it inaccessible to anyone. If you try to check out the file
later, you'll get a 0-length file. It doesn't actually remove any
files, so a working-copy with that file won't know to retrieve
changes. Thus, after running this script, I also svn rm'd the files
normally to remove them from people's working copies on next svn
update.
It follows both the delta source and file copyfrom chains, clearing
the data of any file which was derived from the obliterated file or
its data.
The goal was to remove some sensitive data from the repository,
completely. Even for those who have access to the repository directly.
That data had been branched/copied/modified over the course of a year
or so, so there were many commits on many paths in the repository that
needed to be removed. That's why this just takes some starting paths
and tracks down all the derivations.
I ran this on a svn 1.3 repository. It may or may not work on others.
It may not work at all. No warranty. Don't blame me if it blows up
your repository. In fact, I recommend not ever running it on your
repository. All I know is that it worked once, and I may have even
broken/changed the script since then. I hope to never have to use it
again. :)
But, I hope this helps show at least one use case which should be
handled by a "proper" solution. I figure having actually written code
to handle this use-case shows its importantance better than just
saying it.
Note: this is derived from the fsfsverify.py script, which was a great
starting point.
James
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1239944
Received on 2009-02-27 19:09:43 CET