I'm looking for something to dig back into wc-ng. People should grab
this rather than wait for me, but I'll see what I can do in the next
On Tue, Jan 18, 2011 at 10:58, Julian Foad <julian.foad_at_wandisco.com> wrote:
> Can anyone spare a bit of time to take this forward? I'll tell you what
> I'm planning to do next. Maybe someone can get to this before I can.
> The first thing I want to do is add a DB index on the 'refcount' column
> of the 'pristine' table. I'm convinced that will be necessary for
> adequate performance. To do so involves:
> * Adding the 'CREATE INDEX' SQL statement. That's simple - just
> copy/modify/paste one line in wc-metadata.sql.
> * Adding a WC format bump step that adds the index to existing dev
> WCs. Also quite simple.
> * I'd be happier if I ran a timing comparison, before and after, with
> searching for unreferenced pristines (those whose refcount is 0). That
> would give me confidence that adding the index has actually happened and
> that it has the desired effect. It would need a test with a large
> number of pristines, a scattered selection of them having refcount=0,
> and then time the running of svn_wc__db_pristine_cleanup(). With an
> index, this should find them all very quickly and the deletion time
> should dominate the whole cleanup operation, whereas without an index
> the time taken to scan the whole table should dominate when the table is
> very large.
> On Thu, 2011-01-13, Branko Čibej wrote:
>> On 13.01.2011 20:01, Julian Foad wrote:
>> > I have committed the ref counting for pristine texts (r1058523) and have
>> > had a bit more insight into when to perform the deletion.
>> > Deleting unreferenced texts on closing a "wcroot" is too late - too
>> > large a granularity - for the way the WC code is currently structured
>> > because this doesn't happen until a (long-lived) pool is cleaned up, as
>> > Bert pointed out.
>> > Deleting unreferenced texts after running the Work Queue is too soon -
>> > too fine a granularity - for the way "commit" is currently structured.
>> > A simple test of this approach showed that by the time the post-commit
>> > processing tries to install a reference to a pristine text whose
>> > checksum was noted earlier, in some cases that pristine row has already
>> > been purged.
>> This would indicate that the reference counting happens too soon ... in
>> other words, that a pristine can be dereferenced whilst some part of the
>> code (or database) still refers to it. That breaks database consistency
>> -- what happens if the user aborts a commit and then calls 'svn
>> cleanup', for example?
> If what I said about 'commit' is correct, then yes, that's bad and we
> should look for a better way. But I haven't tested it properly; I
> noticed that the commit failed saying that the DB failed a 'REFERENCES'
> clause, and what I said here is a hypothesis about how that happened.
> Investingating this is the second thing I would want to do next.
>> > At the moment I think the practical solution is to insert calls to the
>> > pristine cleanup at the end of each main libsvn_wc public API that may
>> > have freed up one or more pristines.
>> Mmm. Too many points of failure, don't you think? Of course, it's no
>> "worse" than the cancellation handlers, however, and I suppose missing a
>> chance to delete is better than having one too many.
> I'm thinking this is nevertheless the best option. The consequence of
> missing some places is not too bad at all. If we miss places where we
> should clean up, then all it means is that some operations will leak
> unused pristines, but they won't be leaked permanently, only until the
> user runs one of the operations that do have the cleanup call in them.
> That's a consequence of calling a global "clean up all unreferenced
> pristines" function rather than trying to keep track of which pristines
> have been freed up by *this* operation.
>> > Wherever we do it, if it's at all frequent, the search for unreferenced
>> > pristines needs to be efficient. At present I use "SELECT ... FROM
>> > PRISTINE WHERE refcount = 0". I believe that can most easily be made
>> > efficient by adding an index on the 'refcount' column, so that's what I
>> > intend to do.
>> That's the only thing you can do, unless you want to enhance the
>> triggers to populate a queue of to-be-deleted pristines.
> I agree that adding an index on the table is a better solution than
> trying to maintain a separate queue of the pristines that are waiting to
> be cleaned up. The index makes it efficient for a simple query
> ('refcount = 0') to provide the same information as the separate queue
> would provide.
> - Julian
Received on 2011-01-27 04:33:40 CET