Re: pristine store design

From: Neels J Hofmeyr <neels_at_elego.de>
Date: Thu, 18 Feb 2010 00:32:58 +0100

Greg Stein wrote:
> On Wed, Feb 17, 2010 at 08:09, Bert Huijben <bert_at_qqmail.nl> wrote:
>> ...
>>> [ ] It is cheap to tee to a checksumming stream when reading a
>>> stored pristine. The I/O is much slower anyway.
>> When we are reading the complete stream: Yes.
>> I would skip verification if we only do a partial read (if possible).
>
> That's my belief, but testing can/should confirm. I imagine an SSD
> "disk" and a good OS buffer cache might actually be pretty damned
> fast, which would then lead to a checksum being a "noticable"
> percentage of reading the stream.
>
> Of course, we're usually doing *something* with that stream as we read
> it, and that invariably is to write it somewhere else (temp file,
> network, etc).
>
>>> [ ] It is expensive to *always* want to tee to a checksumming stream,
>>> because that defies speed gain when wanting to read just a subsection
>>> of a pristine, and defies using OS copy without de-/translation to put
>>> a pristine's copy into the WC.
>> See previous answer
>
> I doubt we read portions all that often. My naive approach would have
> pristine_read() return a checksumming stream. *IF* the stream is read
> to completion, then it verifies the result. If the stream is closed
> early (due to partial-read), then we just don't validate.
>
> That said... if "always checksumming" proves to be a bit expensive
> overall, then we may want to be strategic about *when* to do a
> checksum. That would imply a flag to pristine_read() asking it to do a
> checksum as we read. (and for known partial-reads, we just set the
> flag to FALSE).

+1 -- I was hoping it'd make sense to go that way.
- "always" internally checksum streams returned by pristine_read()
(until proven that performance is affected significantly)
- Allow a boolean flag to switch off this checksumming
- _read() knows the size of the pristine from the pristine file. It
will skip checksum comparison on premature close.

>
>>> [ ] We don't care about optimizing for reading subsections of a pristine.
>>> We "always" read the whole pristine anyway.
>> Not sure.
>> We certainly don't read the entire file if we are only trying to check if a
>> file has changed or not. (1st byte in a 2GB file has changed?)
>
> Right. I think this kind of goes to "we may want to be strategic about
> when to do a checksum verification"
>
>>> [ ] If the checksum is already known and the data exists in the local
>>> file system, there is no need to verify the checksum.
>
> I'm assuming you mean copying from a source in the local file system,
> and writing that into the pristine store.
>
> Yes, if we have a checksum for that file, then we can just use it
> since we (invariably) *just* computed it.

Are you talking specifically about a tempfile? Because below, we all seem to
agree that it is dangerous to compute a checksum from an Actual working copy
file and to only copy it later. ?

>
>> That depends. If we are going to read it anyway...
>> In most cases I agree. The filesystem also has verifications on bit errors.
>> But you also have sed -i <.....>
>
> Not all filesystems do. That is why we record checksums for repository
> contents. That "one bit error every five years" needs to be
> detectable.

^ Further support for rather doing checksums with every read, where certain
situations in the code get "special permission" to switch off checksumming.

>
>>> [ ] If the checksum is already known and the data exists in the local
>>> file system, we still have to verify the checksum again via a tempfile
>>> unless that local file already *is* a tempfile (and is thus reasonably
>>> protected from accidental modification before we finish our copy).
>> Agree. Any process can change files in the WC at any time. Even if we
>> checksummed and then moved we have a chance of failure.
>
> Yup. Note that we pretty much *always* copy a source file into a temp
> area. During that copy, we can checksum and (de)translate the thing.
> These kinds of file copies occur all over the WC.
>
>> ...
>>> [ ] We are totally fine with just checking size (and maybe mtime) of
>>> pristines locally, no need to *always* do checksumming when reading.
>> +1.
>> Just check where it doesn't impact performance. (E.g. incoming updates that
>> read the whole file anyway)
>
> I think a flag to pristine_read() may be the way to go.

And I'd prefer to term it from the other end, as I stated above, i.e. to say
"do checksums unless we know in advance that it makes no sense, and unless
we know that it hits performance really hard".

>
>>> Pristine check.
>>> Do we need a "fast" check? Note that it doesn't need to say how it's
>>> implemented; just say whether you think we need a presence check that
>>> tries
>>> to be as fast as possible by assuming the pristine store is consistent.
>>> Greg said there is a use case for that lurking, so I suggest keeping it in
>>> the design doc but marking it for post-1.7. Yes?
>
> Sure. I think the question that comes us is, "am I missing this
> pristine?" Would have to dig more to remember where/why that comes
> up.
>
>>> Write.
>>> Julian said: there possibly should be only _write(), a simple API that
>> takes
>>> care of the rest internally.
>>> Greg said: _write() has to go away, it has no way of telling whether the
>>> write stream was interrupted by error or ended successfully.
>>> Greg also said: It is fine to write directly to a pristine file location.
>>> Which are/should be true?
>
> Write does have to go away. No getting around that, and I won't
> support further changes to the stream API to "fix" that issue. We've

> already made the streams more complicated than I think they should be
> (the recent: reset, mark, seek, and line functions). They used to just
> be read/write/close. Grr.
>
>>> [ ] We will validate pristines when reading anyway, so we can neglect
>>> being paranoid about writing to pristine files without losing safety.
>> We should not write faulty files. If somebody opens the stream he assumes
>> the file is ok and on some OSs he keeps a lock on the file while he as
>> reading so nobody can repair it. (And on other OSs the file should never
>> change while it is open)
>
> The schema is designed to *allow* for faulty files. The faulty file
> will NOT be used because SIZE==null in the database.
>
> If the file is successfully written, *then* we store a value into
> SIZE, indicating the file is present and correct.
>
> Of course, if you're overwriting a pristine file, then you better
> store SIZE==null before you begin writing.

stsp and I have been discussing this a bit. I can't really see why
indicating a null size in the PRISTINE table is necessary. Our discussion
ended here:

If we always write (copy) a pristine to a tempfile in the pristine tempdir
first and then move it into the final pristine file location, there either
*is* a complete pristine file, or there *isn't* one at all. (assuming atomic
'mv' within the same filesystem device. *can that be assumed??*)

It's not difficult to prevent one client from wanting to install the same
pristine twice within the same code path.

To prevent two concurrently running clients from installing the same
pristine twice, a file lock seems more appropriate (with one wc.db per WC,
benefit of even that is debatable, but with one wc.db per user that might be
good).

Either way, a pristine can only have one specific content -- it does not
really matter that much if one client overwrites it just after the other
created it.

Can someone explain any motivation for even creating a database row before
the pristine file is moved into place in the pristine store? I currently
don't see why it can't be way simpler.

(And, if it is necessary, how does a client deal with encountering a
pristine table row that has a SIZE==null? Idle a bit and try to wait for the
hypothetical other client process to finish? Discard and start fetching the
pristine, possibly again? A file lock has the benefit of disappearing if its
client process dies unexpectedly.)

>
>>> [ ] It is ok to open a write stream to the final location of the
>>> pristine file for a given checksum, no need to store in a
>>> pristine-tempfile first.
>> I think we should say no here, to guarantee a stable behavior. If the file
>> is in the pristine store it should be OK. We can't trust the initial write
>> to always provide the full file and a valid hash of this file.
>> We should calculate ourselves if it is possible that we receive an
>> incomplete file.
>
> The schema allows for this, so it can/should be just fine.

This kinda contradicts the impression I get from you saying "_write() has to
go" and "we always copy to temp before/while calculating the checksum".
(s.b. ...)

>
>> And calculating ourselves defines that we don't know the final location.
>
> Usually, we assemble a file in the temp area that is detranslated. At
> that time, we can compute the checksum. And then we install the file
> as a pristine.
>
> We might have a file incoming from the repository with a known
> checksum. We can spool that directly into the pristine store.

(...) But when we receive a pristine over the network, we should surely
verify its checksum before writing it to the final pristine file location?

I tend to favor doing all partial work in the pristine tempdir and OS-moving
into place once the file is complete.

>
>> ...
>>> [ ] However, we do not provide an API function for that. The only way
>>> to create a pristine (for now) will be via _tempdir() and _install().
>> +1
>
> I see no reason to limit it this way.

(Does this confuse me in the same way I said above, with _write()-must-go
vs. allow-direct-writing? Or is this another confusion? I'd love to limit it
this way, noting that a file can be cp/mv'd to the _tempdir().)

>
>>> [ ] But we will likely see a need at some point, upon which such API
>>> will be added. We will change _write() so that it can tell the
>>> difference between successful and unsuccessful stream closure.
>
> -1

Nice, was hoping that. Thanks.

>
>> ...
>>> [ ] Callers can also pass any other file locations to _install(), to
>>> facilitate OS-copying/moving into the pristine store without de-/
>>> translation (when the checksum is known beforehand).
>> At which API level do we trust the callers?
>
> This would be a wc_db API, and (thus) trustable within WC.
>
> The APIs for pristine management provided as the WC public API... none
> beyond "read", I think.

I see agreement.

>
>>> [ ] ^ ... copying ...
>>> [ ] ^ ... moving ...
>> There are not that many operations where this is actually used. Maybe after
>> a commit? The pristine store is mostly read-only and no apis should deliver
>> the file locations; just streams.
>
> Absolutely never a path. That killed us in wc-1.

...overwhelming agreement.

When I said 'copying' / 'moving' I was trying to poke you to say something
like "_install() always does a move, never a copy, and it always assumes to
get a (disposable) tempfile in the pristine tempdir". Does that sound right?

>
>>> Separate pristine store?
>> ...
>> So: design needed
>
> Yah. Post-1.7, I say. For now, the pristine store and wc.db are tied
> on a 1:1 basis, at the root of each working copy.

Agreed.

>
>> ...
>>> SHA1/MD5 compat.
>>> [ ] When talking to an old server, we use only MD5 checksums to reference
>>> the pristine in the pristine store, nevermind hash collisions.
>
> If we don't have a SHA1 handy, then I don't mind this. But I think we
> should store stuff using *only* MD5 or *only* SHA1. Mixing them could
> end up with duplicates. Not sure if that is a problem, but it feels
> like it could be.

I was hoping to get a -1 on this. To me it feels preferable to make the
pristine store *always* use SHA1 for the hash index. I'd be totally against
having a mixed hash index. Given that streaming to create a checksum when
receiving from the repos should be cheap, let's only use SHA1 indexes?

>
>> ...
>>> [ ] When talking to an old server, we use SHA1 to index the pristine
>>> store and also store the MD5 checksum for server comm. We create the
>>> SHA1 checksum locally after having verified the MD5 checksum.
>> +1. This is the original WC-NG plan
>
> I wouldn't go *that* far. There was definitely some hand-waving going
> on. The MD5 in the editor API is a severely complicating item. I had
> hoped to fix that particular problem with Ev2.

...so you disagree. What about md5 collisions? Are they unlikely enough?
Also, with only SHA1 in use, we don't need to maintain code that figures out
which kind of checksum is used in a pristine store.

What say you?

>
>>> [ ] When talking to a 1.7 server, we will receive and use only SHA1
>>> checksums and don't store any MD5 checksums.
>> There is currently no delta editor upgrade for 1.7, so we still have to send
>> MD5 to our 1.7 server as it is now. And to wrap old editors with new editors
>> (and reversed) we probably have to send both hashes whenever possible.
>> (The repository filesystem already stores SHA1 hashes since 1.6 or ?)
>
> I think we need to keep MD5 checksums around, in addition to the SHA1
> for compat with the Editor v1 APIs. Sigh.

Group sigh.

~Neels

application/pgp-signature attachment: OpenPGP digital signature

Received on 2010-02-18 00:33:44 CET

This message: [ Message body ]
Next message: Greg Stein: "Re: pristine store design"
Previous message: Greg Stein: "Re: '@BASE keyword' vs. 'BASE database-tree' vs 'BASE conceptual-tree'"
In reply to: Greg Stein: "Re: pristine store design"
Next in thread: Greg Stein: "Re: pristine store design"
Reply: Greg Stein: "Re: pristine store design"
Reply: Neels J Hofmeyr: "Re: pristine store design"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]