On Sun, Dec 04, 2005 at 04:23:30PM +0000, Malcolm Rowe wrote:
> Now the pool cleanup for the first open apr_file_t object occurs.
Oh, by the way, it occurred to me today that the pool cleanup can actually
run at any time, not just before the commit (which is good in one sense,
since I was wondering exactly why the cleanup would run then).
The reason is that the apr_file_t object will keep a reference to the open
file descriptor, and that'll still work after we rename the proto-rev
file to the final rev file. So the proto-rev file might be 'fine'
(or at least, useable - even if it has a bit of junk at the start),
but when we clean up the pool, we overwrite the real rev file.
I've now managed - I think - to reproduce the problem we're seeing using
my own 'client' (which is a client of libsvn_fs, but not a 'client' in
the normal sense). All it takes is for us to get a transient disk-write
error when writing the delta rep, and the caller to _not_ immediately
abort the transaction (which according to the documentation for managing
transactions in svn_fs.h, is fine).
(I can't prove that any of our clients do that, but I can't prove they
don't, either). Anyway, at this point, we're probably okay until the
pool cleanup runs.
[The basic problem is that if we open the file again and seek to the
end, we won't take into account anything that's still buffered up in
the first file handle. Actually, I suspect that it might be possible to
kill the transaction in such a way that it's _not_ possible to continue
succesfully, though I'm not quite sure how.]
Note that APR versions prior to 0.9.7 didn't report write errors on
buffered files at all, which I have a feeling may actually be the root
cause of this problem (though I'm not sure exactly how). I didn't think
that it was possible for a write to be the source of the problems, but
now I'm not so sure (especially as it was a write() to the proto-rev
file that I made break in my 'reproduction' client).
The fact that the pool cleanup can run at any time actually puts us in
a worse position, in some senses. I was thinking that we could possibly
run a consistency check on the rev file as part of the commit, but this
won't actually help, since the file doesn't actually get corrupted until
after it's been moved into place.
I think we're going to have to work out how to ensure that a failed write
in libsvn_fs_fs results in the file being definitively closed _before_ we
get out of libsvn_fs_fs, since otherwise we'll always have the potential
for 'after-the-fact' destruction of the rev file via pool cleanup.
(My reproduction code is _very_ hacked up - I'm not sure whether it's
worth looking at, btw).
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Thu Dec 8 19:28:21 2005