On Wed, 2011-02-16, I (Julian Foad) wrote:
> > > This specification is conceptually simple, but requires completing disk
> > > operations within SDB transactions, which may make it too inefficient
> > > in practice. An alternative specification could use the Work Queue to
> > > enable more efficient processing of multiple transactions.
> >
> > Note that my initial design for the pristine inserted a row which
> > effectively said "we know about this pristine, but it hasn't been
> > written yet". The file would be put into place, then the row would get
> > tweaked to say "it is now there". That avoids the disk I/O within a
> > sqlite txn.
>
> That would indeed avoid disk I/O within any txn during "add". The
> "remove" operation would go through the same states in reverse, I
> assume. But this seems to imply an extra complexity. How would the
> whole thing work - locking for "add", "remove" and read operations;
> crash recovery? Maybe it's straightforward but I'll have to write it
> out in detail if you think it's worth it.
>
> I would suggest first we need to assess whether a file move within a txn
> is in fact a problem.
I'm thinking one thing we've learned about performance is we need to
avoid doing a transaction per node. Doing so limits the speed of
multi-file WC operations severely (I've heard numbers like 50 to 100
txns per second being quoted).
So I think we need to support installing multiple pristine files per
txn.
If we do that, then whether we do the file move(s) within the txn, or
two txns with the file move(s) coming in between them, probably isn't
important in terms of speed. What about the semantic difference between
them? The two-phase installation may have the advantage that a client
can tell whether a given pristine text is already being downloaded from
the repo and thus avoid duplicating the download. That would certainly
be nice to have. However, I don't feel it is worth the extra effort at
this time: it can be a future enhancement.
- Julian
Received on 2011-02-18 11:23:13 CET