Re: Data corruption bug in WC, apparently due to race condition?
From: Karl Fogel <kfogel_at_red-bean.com>
Date: Thu, 27 Jul 2017 15:27:52 -0500
Philip Martin <philip_at_codematters.co.uk> writes:
My hat is off to you for tracking this down, Philip! Thanks. Comments/questions at the end, after your transcript.
>In first terminal:
I'm not familiar these days with the current archicture of libsvn_client and libsvn_wc, but just in principle, there should be an easier way to do this than the above, without re-comparing full-texts  (or, equivalently, re-calculating the hash).
When the client sends the file for commit, have it remember the timestamp, file size, and hash of the working file (as of the exact version that was used for the commit -- and if the file is being streamily appended to during the commit, or something like that, well, then just remember the relevant values for what was sent in the commit). Then during commit finalization, just store that remembered metadata, *not* metadata derived from the possibly-now-changed working file.
In other words, why isn't the commit process just taking a data-and-metadata "snapshot" of each working file, and using that snapshot for both the commit and the post-commit bookkeeping on the client side?
If the client were to do that, then if a working file gets modified during the commit, the file would naturally show up as modified afterwards without any special checks like your step (3) above. (I guess yet another way to say it is: your steps 1-4 are fine, but they should all happen as part of the commit, and all be done by the time the post-commit stage arrives.)
(Also, I thought this was how we were always doing things! But my memory is fuzzy, and/or things might have changed.)
Am I missing some subtlety about this?
 In any case, a true full-text comparison should rarely be necessary. First we can look at the file size from the directory entry and see if it's as expected; in most cases it will differ if the contents differ, so that's the first "early out". Then we could look at the first 1024 (or whatever) bytes; many files, if they have changed, will show some change near the beginning, so that's a second "early out". I guess a third early-out would be to do an lseek into the middle and just see if the byte there is as expected? :-) But yes, eventually, a full-text comparison, i.e., a hash recalculation, may be necessary.
This is an archived mail posted to the Subversion Dev mailing list.