On Aug 21, 2006, at 22:20, Les Mikesell wrote:
> On Mon, 2006-08-21 at 17:21 +0200, Ryan Schmidt wrote:
>
>> The second is what has often been called "svnadmin obliterate". There
>> is this feature request:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=516
>>
>> I understand that this is extremely difficult, not just to implement,
>> since Subversion is specifically designed to retain all history, so
>> making it suddenly forget some of that is counter to its very nature,
>> but also difficult to define what exactly is meant. Are we
>> obliterating a single item in the repository at a single revision? Or
>> all revisions of this item throughout time?
>
> Assume that there is something in the repository that could make you
> or your company the target of a very expensive lawsuit if anyone is
> ever able to retrieve it and read it. In the US at least that doesn't
> take very much, so it is likely to be a common problem.
>
>> What if it moved or was
>> renamed? What if it was copied? Are the copies removed too? Even if
>> they've subsequently changed substantially or completely? It's a
>> complicated issue, which is why it's not done yet. In the mean time,
>> you have the "svnadmin dump" / "svndumpfilter" / "svnadmin load"
>> alternative, which is deliberately cumbersome because as mentioned
>> Subversion's goal is to retain all history.
>
> I agree that it should be an admin-only operation but making it
> difficult is just short-sighted. Everyone is going to accumulate
> junk (or worse) eventually.
Well, maybe "deliberately" was the wrong word. I certainly don't
think the Subversion team is intending to make anything difficult on
anybody. Rather, as explained briefly above and also in the bug,
because Subversion, as any good revision control system, is
specifically designed to keep everything, making it suddenly forget
specific somethings is difficult to do. There are many open
Subversion issues and feature requests, and only a limited number of
Subversion developers, who have to prioritize what tasks they work on
now and which ones they leave for later, using a number of criteria.
Priorities so far have apparently been such that this feature has not
yet been implemented.
I don't think this is unreasonable either. The developers have to
start somewhere to eventually get to software that works. Beginning
from the standpoint that a version control system should keep all
data, and implementing that, seems reasonable. That some people need
to be able to permanently kill data from the repository also seems
plausible to me. The dump / filter / load cycle currently enables
that to occur, even if it's a bit difficult currently. Proper
education of committers is necessary to make sure they understand the
permanency of what they're committing, so that the frequency of
mistakes necessitating the dump / filter / load cycle can be reduced.
Hopefully one day in the future an admin will be more easily able to
obliterate parts of the repository. It's just not there yet.
By not using a single repository, and instead using multiple
repositories, separated based on criteria that make sense to you (by
project, by department, etc.) you can reduce the impact of taking the
repository offline for a dump / filter / load cycle, and reduce the
time such a cycle takes, since there will be less data to dump,
filter and load.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Aug 22 03:19:22 2006