I'm implementing ref-counting for pristine texts, with the aim of
deleting any text whose ref-count reaches zero. The current situation
without this work is that many pristine texts are not deleted when they
become unreferenced, and they accumulate in the pristine store until the
user runs "svn cleanup". I think that is not good enough even for an
initial release.
Having discarded other approaches, my plan now is to:
* Maintain the ref count at the granularity of SQL statements: each
time we add or remove a reference in the DB (which is usually in the
NODES table), update the corresponding ref count in the PRISTINE table.
This could be changed later to once per SQLite transaction if we find
cases where that would make a significant saving.
* Act on the ref count (deleting texts whose count is zero) at some
higher level: when closing a "wcroot" object or after running the Work
Queue or just before returning from a libsvn_wc public API. For the
time being I'm doing it when closing a "wcroot" object.
The best way I have found to maintain the ref counts is using SQL
triggers to update the count when a reference is added or deleted.
CREATE TRIGGER nodes_insert_trigger
AFTER INSERT ON nodes
BEGIN
UPDATE pristine SET refcount = refcount + 1
WHERE checksum = NEW.checksum;
END;
CREATE TRIGGER nodes_delete_trigger
AFTER DELETE ON nodes
BEGIN
UPDATE pristine SET refcount = refcount - 1
WHERE checksum = OLD.checksum;
END;
CREATE TRIGGER nodes_update_checksum_trigger
AFTER UPDATE OF checksum ON nodes
BEGIN
UPDATE pristine SET refcount = refcount + 1
WHERE checksum = NEW.checksum;
UPDATE pristine SET refcount = refcount - 1
WHERE checksum = OLD.checksum;
END;
(A detail: This correctly handles NULL in the checksum column: such a
value won't match any row in the 'pristine' table.)
The only case that SQLite doesn't handle automatically is the
replacement part of "INSERT OR REPLACE INTO ...": it doesn't fire the
"delete" trigger in that case. To overcome this limitation, we must
instead use an explicit "DELETE FROM ..." statement followed by an
unconditional "INSERT INTO ..." statement. I'm puzzled about why some
of our INSERT statements should ever need to overwrite an existing row,
but they currently do, and I am marking such places with '###' comments.
In order to check whether this is all working, I have inserted in
close_wcroot() a call to svn_wc__db_pristine_cleanup_wcroot() and made
the latter function count (in debug mode only) the number of actual
references versus the recorded refcount of each pristine, and complain
about any difference. Using this a few tests in the test suite fail,
mostly copy/revert/upgrade tests, so I can see where I have a few more
cases to complete. I expect all of these to be cases where I need to
put an explicit "DELETE" statement instead of "INSERT OR REPLACE".
This checking code prints a report such as the following when there is a
mismatch:
DBG: wc_db.c:2692: unused/miscounted pristines in '/home/julianfoad/src/subversion-n-gamma/subversion/tests/cmdline/svn-test-work/working_copies/copy_tests-30'
DBG: wc_db.c:2697: 56388a031dffbf9df7c32e1f299b1d5d7ef60881: refcount 2 != actual_refs 1
DBG: wc_db.c:2717: pristine ref: 56388a031dffbf9df7c32e1f299b1d5d7ef60881, op_depth 0, 'A/D/G/rho'
Attached are two patches, including log messages. The one called
"pristine-refcount-triggers-1.patch" implements the trigger approach, as
described above. The one called "pristine-refcount-manual-1.patch" is
an earlier patch, in which I started adding explicit SQL queries to
increment and decrement the ref count whenever we update the NODES
table. Both of them include the same checking/debugging/cleanup code.
Any thoughts on the direction of the whole thing, or the details?
- Julian
Received on 2011-01-07 15:51:26 CET