Re: pristine store design

From: Neels J Hofmeyr <neels_at_elego.de>
Date: Wed, 03 Mar 2010 14:48:32 +0100

Julian Foad wrote:
> On Wed, 2010-03-03, Neels J Hofmeyr wrote:
>> Julian Foad wrote:
>>> On Thu, 2010-02-18, Neels J Hofmeyr wrote:
>>>> Great, moving forward fast on pristine design questions!
>>> Hi Neels.
>>>
>>> Did you start working the new knowledge into a document? Lots of stuff
>>> was said in this thread and it would be useful to see where we are at.
>> thanks for poking :)
>
> No problem. I'm glad you're working on this.
>
> Please go ahead with this pristine store. Delete the "_write" interface.

looking forward to that one, nice and simple :)

> Implement the path-based ones; at least they're already declared. It's
> really important to get it moving.

Just aside, I once started looking for places where the pristine is used
properly, but there's more to do before we can roll out the pristine:
foremost getting rid of write-metadata-via-entry_t.

>
> I want us to be able to move on to bigger things like considering
> whether to build a "pristine whole nodes" store on top of the "pristine
> texts" store, because I think we should be able to reference a whole
> node (especially a tree-conflict old/theirs/mine node) with a single
> handle, whether that handle be a checksum or a "repos:rev:path" triple.

You mean "whole subtrees"?? I don't understand.

>
> I continue to argue below why I think a stream-based interface is better
> than path-based, but the choice of "write" method is not a critical
> decision, as long as it stays within libsvn_wc. I'm not keen to spend
> more time on that, now that I've looked and as far as I can tell it
> doesn't need to be exposed outside libsvn_wc.

Note that the entire pristine API we are discussing is private to svn_wc,
like svn_wc__db_read_info() et al..

[...]
>> pseudocode:
>> [[[
>> pristine_get_tempdir(&tempdir)
>>
>> /* Make tempfile */
>> svn_stream_open_unique(&tempstream, &tempfile_path, tempdir)
>>
>> /* Write and checksum */
>> tempstream = svn_stream_checksummed2(tempstream, &write_checksum)
>> write to and close tempstream
>>
>> pristine_install(tempfile_path, write_checksum)
>> ]]]
>>
>> When there is only a write-stream available, some options get lost, while
>> _get_tempdir() and _install() offer the whole palette of possibilities. For
>> convenience, we can offer a wrapper for the first three instructions above.
>> But additionally, we can offer the possibility to determine a file location
>> *on the same file system device* as the pristine store. This is useful when
>> we can provide some external tool or specialized function X with a
>> destination path in advance. We create a tempfile in pristine_temp_dir() and
>> pass that location to X. X writes or OS-copies or <whatever>s to the final
>> device, possibly saving one copy-across-fs-devices operation per pristine.
>
> Sure. That's my "case (4)" of the cases I enumerated. But I think it's
For the records, I did see that :)

> a less important case. See my investigation of "actual uses" below,
> near the end of this email.
>
>> Also, X may already have a checksum tee in its implementation, obsoleting
>> the need to calculate again. In that case it's nice that pristine_install()
>> doesn't care where the checksum came from.
>
> Sure, but, for the cases where X writes to a stream, a stream-based API
> (slightly different from how I suggested) could do that too.

Yes, an optional parameter would do.

>
>> (Also, if we are going to
>> validate the checksum on "every" read, there is no need to be paranoid about
>> correctness of the checksum when writing -- we anyway have to "expect"
>> corruptions to occur after any successful write.)
>>
>> Note that your proposed usage is a subset of the usage provided by
>> _get_temp_dir() and _install(). Your way does KISS some API complexity
>> goodbye when thinking in human terms, but the OS will find the
>> _get_temp_dir() and _install() simpler in some cases :P
>
> Yes, but also the stream way allows the pristine store to grow new
> features transparently - see "disadvantages" below.

(also see below)

>
>> The simpler usage of above pseudocode would be:
>>
>> pseudocode:
>> [[[
>> /* convenience function provided by API or whomever. */
>> svn_error_t *
>> pristine_get_write_stream(stream_t **write_stream,
>> svn_checksum_t *write_checksum,
>> const char **temp_path)
>> {
>> pristine_get_tempdir(&tempdir)
>>
>> /* Make tempfile */
>> svn_stream_open_unique(&tempstream, &tempfile_path, tempdir)
>>
>> /* Write checksum to the checksum location provided by caller */
>> *write_stream = svn_stream_checksummed2(tempstream, write_checksum)
>> }
>>
>> ...
>>
>> /* calling code */
>> {
>> pristine_get_write_stream(&stream, &write_checksum, &stream_temp_path)
>> write to and close stream
>> if (had priori checksum and does not match write_checksum):
>> bail
>> pristine_install(stream_temp_path, write_checksum)
>> }
>> ]]]
>>
>> This is similar to what you propose,
>
> Yes, and that looks neater, but still has all the problems caused by
> exposing the path.

For the records, we'd never expose the final pristine-file path, not even
within libsvn_wc. This whole thing about having a write location is all
about the temp file that will finally be moved into place.

So at no point will any function that does not have _pristine_ in its name
be able to find the final pristine file location. Only the tempdir is known,
and only inside libsvn_wc.

The final _install() call would then have the option to just 'mv' the file
into its final location or do whatever with it. We don't lose feasibility of
any features by doing this.

>
>> when considering that your
>> implementation would also need a baton to pass the stream temp path along
>> (right?).
>
> The API wouldn't need an extra baton parameter. It would return a stream
> whose internal "baton" (user data) stores any metadata it needs.

That's cool, wasn't aware of it.

>
>> So IMHO it boils down to: Do you think it is better to opaquify
>> such baton for the sacrifice of not being able to get a write destination on
>> the target device in advance?
>
> Yes, I think that is the better API for general use. (Note that the
> stream would always write to a file that is on the destination device.
> The only thing you can't get with this API is a path to that file (or
> directory).)

Hm, this argument actually revolves around: will we have cases where a
tempfile already exists, and where we have no control on redirecting streams
into the pristine store? When committing new or modified files, we need to
copy&checksum, can't checksum in-place; When receiving from the repos, we
can direct the stream straight into the pristine store, and validate proper
reception via that.

I can't come up with a use case of interest. If there aren't (going to be)
any, I actually agree with a stream-only write API. Reversing my own
argument, we can still split such stream-only function into _get_temp_dir()
and _install() later on, making the original stream-only function a
convenience function. So I am actually convinced before even looking at the
disadvantages. Thanks :)

>
>
> Now let's look at some disadvantages of "return a directory path".
>
> If the caller creates the file, the caller chooses the file's name. We
> have to trust the caller to use the right kind of creation functions to
> give a safely unique name.
>
> If the caller creates the file, the pristine store can't keep track of
> what's in its temporary directory. Does it matter? Maybe not, yet.
>
> If the caller creates the file, the caller chooses the file's
> permissions. The pristine store loses the ability to control privacy.
> Does it matter? Maybe not, yet.

I never thought of that!

>
> If the caller writes to the file, the pristine store doesn't have a
> streamy way to compress the data if it wants to - it would have to read
> it again. Does it matter? Maybe we never will, or maybe we'll want to
> change the API anyway when we support compressed texts so it wouldn't
> matter.
>
> If the path is exposed just within libsvn_wc, these issues stay under
> our control, so that's OK. If the path is exposed in a public API, to
> libsvn_client and beyond, that's when I worry.

(No paths would ever leak out of libsvn_wc. The pristine-tempdir path would
be known inside libsvn_wc. I'd say we can trust libsvn_wc to use a proper
function for creating a unique file. But see my being convinced above.)

>
>
> The other thing we can do is look at the actual uses and see what would
> be best for them.
>
> * svn_wc_add_repos_file4() is a public API that takes (as an input
> parameter) the new pristine contents as a stream and writes it to the
> pristine store, among other things.
>
> * update_editor.c:apply_textdelta() gets a readable stream of the
> existing text base file, and a writable stream to a new (temporary) text
> base file, and passes the two streams to svn_txdelta_apply(). Presumably
> some other function then moves the temp base file to the real base path.
>
> It looks like it would be slightly easier and more "natural" for both of
> those functions to get a writable pristine stream than to create an
> intermediate file itself and then wrap it in a stream. Not a big deal,
> perhaps.

Agreed. Not a big deal, but why have high granularity when no-one uses it.

[...]
>>> VERIFYING CHECKSUMS
> [...]
>
> [DBs and concurrency ...]
> [...]
>
> I'm not replying to those parts in this email.

heh, me and stsp are also coming up with new directions by the hour.

We were discussing that having a separate pristine database basically sucks.
We need to ensure atomicity of mv-into-place with write-db-row.

This would be more straightforward if we had no DB, but instead an
information header in each pristine file itself. Either that header would
have fixed size, or (rather) we'd write the file offset of where the actual
content starts in the first few bytes.

The only thing that would remain a problem is wanting to record a pristine's
mtime. We can't write the mtime into the file itself, and hope that we are
still within the same mtime grain, not altering the real mtime of the file
*again* (think usec mtime resolution). But we agree that even storing the
mtime of a pristine is bogus. We should checksum every read anyway.

We currently have the pressing need to store *only* the md5-checksum
associated with a SHA1. I can't see why we need size or mtime when we
checksum anyway (I'm hereby taking back my own suggestion to add mtime).

That brought us into the realm of partial reads. We can't think of a case
where partial reads are even used. But if they are, we can easily introduce
a super-pristine header thing that references other pristines as chunks of
which it is made up, so that no pristine ever grows larger than, say, 10 Mb.
We've got it figured out already, and that would be easy to leave aside for
now and add later when the need arises. (Smaller pristines are still stored
the same way, but as soon as a checksum is not found, we can also look up
the set of super-headers. Writing would split up the large pristine into say
10 Mb chunks, write each one in itself, which may even coincide with
existing pristine chunks, and as a final step write the super-header into a
separate location. Easy as pie.)

In a nutshell, I'd like to push a filesystem-only pristine store! No DB
required. This removes all concurrency problems of syncing the DB and the
file system, because there is no DB anymore.
(except that on windows, concurrent moves to the same location may cause one
of the callers to fail. We'd need to cover this even with a DB in place.)
_pristine_read() would open the pristine file, seek to just after the
header, and then deliver the stream to the caller.

As said, we can store any and all metadata in a header in the pristine file
itself, except the mtime. I think we should go for it, mtime being a sad
excuse of a sanity check anyway, IMHO. (We *can* store the size, given that
we use a header of pre-determinable size, but same as mtime, I wouldn't even
want to store the size anyway.)

Given that we checksum all read operations, we only want to store the MD5
sum! The initial header could simply be always fixed 32 bytes of MD5
checksum. And even this is just a speed optimization. Given that we checksum
all reads, all reads can *calculate* the MD5 sum on-the-fly. Pushing for
simplicity here.

If we want more values in the future, like compression, we introduce a skel.

Any minus-ones?

~Neels

application/pgp-signature attachment: OpenPGP digital signature

Received on 2010-03-03 14:49:40 CET

This message: [ Message body ]
Next message: 'Stefan Sperling': "Re: pristine store design"
Previous message: Julian Foad: "RE: pristine store design"
In reply to: Julian Foad: "Re: pristine store design"
Next in thread: Julian Foad: "Re: pristine store design"
Reply: Julian Foad: "Re: pristine store design"
Reply: Greg Stein: "Re: pristine store design"
Reply: Greg Stein: "Re: pristine store design"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]