On Thu, 2010-02-18, Neels J Hofmeyr wrote:
> Great, moving forward fast on pristine design questions!
Hi Neels.
Did you start working the new knowledge into a document? Lots of stuff
was said in this thread and it would be useful to see where we are at.
I have a couple of comments.
THE PRISTINE-WRITE API
I was thinking about the "write" API and how an API designed around a
stream is surely better than one designed around "tell me the path where
I can put a file".
The objection to "give me a writable stream and I'll write to it" was
that the stream close handler wouldn't know whether the stream was
closed after a successful write or because of an error. We can rectify
this by adding a second step: require the caller to call a special
"commit" API to close the stream on success, and any other way of
closing the stream would abandon the new content.
// Caller passes the checksum if it knows it, or passes NULL and
// the store in that case will calculate the checksum.
stream = pristine_start_new_file(expected_checksum);
[caller writes to stream]
// Now the store commits the new text, verifies the checksum
// (if it was given an expected one) and returns it.
new_checksum = pristine_close_new_file(stream);
Now let's examine the ways in which a caller might want to give new
content to the store:
1. Caller asks for a writable stream and pushes content to that, then
calls a "commit" function.
2. Caller has a readable stream and wants the store to pull from that.
3. Caller has a (non-temporary) file and wants the store to read from
that file.
4. Caller has to create a temporary file for reasons beyond its
control (output of an external tool perhaps) and wants the store to take
the entire file by an atomic move if possible. This is the case where it
would be more efficient if it know where to put the file in the first
place.
The caller can easily implement 2 and 3 in terms of an API that provides
1, so that just leaves 1 and 4 that are worthwhile as an API.
I feel that (1) is by far the more important one to have, and (4) is a
specialist optimisation.
VERIFYING CHECKSUMS
I didn't read everything you were discussing but I got worried by
hearing about providing options for the caller to request checksums to
be verified or not per call. That sounds like too much complexity. I'm
sure we should start with a global compile-time verification enable
switch, and if we really find we need more fine-grained control then we
should consider how to provide it then. It might not need an API flag:
for example we might decide it should automatically verify on the first
read and once in every hundred reads, or all sorts of internal
possibilities like that.
> The one thing left now is:
> > Can someone explain a motivation for even creating a database row before
> > the pristine file is moved into place in the pristine store? I currently
> > don't see why it can't be way simpler.
[...]
I would just write it down the way you think it should be in the main
flow of your document, and mention outstanding questions like this in
notes.
"Simultaneous or multi-threaded clients" would be my first reaction to
that particular question.
- Julian
Received on 2010-03-01 19:15:15 CET