In re destruction...
There are really two issues to consider here: removal after deletion and
cleaning up from uncompleted transactions. Depending on your design, it may
be possible to address both through garbage collection. This won't be as
expensive as the usual disk GC case, since deletion is rare. If (during GC)
you build a database containing pairs of the form (container, entity), this
is sufficient to let you efficiently delete the liveness data without
re-traversing the objects. Since existing versions do not change, in
successive GC passes you only need to explicitly traverse newer versions.
Deletion of branch mistakes raises a philosophical question. Right or wrong,
these are part of the revision history. Is it a better design to delete them
or to provide some simple means to take a previous version of a line of
development and pull it back to the top. I.e. an operation that says:
On branch foo, make the version whose
sequence number is 10 be the top now.
For secure system development, the traceability requirements prohibit
deleting any part of the history.
Disk space issues are a tough nut. My take on this is that there is a
philosophical difference between deleting something and moving it to
tertiary store. Archiving can be thought of similarly. On the other hand,
the cost of disk is dropping fast enough that adding more disk may be the
right answer. I just priced a half-terrabyte RAID system at under $10k (dual
channel), and there is another (single channel) going at around $7.5k
One issue you did not mention, that is really messy: deleting in response to
a court order. Suppose you unintentionally violate someone's copyright and
the court orders you to destroy all the copies... This may be a case where
being *unable* to comply is your only possible defense...
Received on Sat Oct 21 14:36:07 2006