At 01:49 AM 31/10/2005 -0500, Greg Hudson wrote:
>I believe that for regular files on a local filesystem, write() does all
>or nothing. I don't have an authoritative source, since I didn't manage
>to pull up either the SUS or POSIX standards on the web easily.
This is what I want to be sure about. :-)
I've tracked down the SUS standard and located its entry on write(). Here
are the pertinent bits:
If a write() requests that more bytes be written than there is room for (for
example, [XSI] the process' file size limit or the physical end of a
medium), only as many bytes as there is room for shall be written. For
example, suppose there is space for 20 bytes more in a file before reaching
a limit. A write of 512 bytes will return 20. The next write of a non-zero
number of bytes would give a failure return (except as noted below).
If write() is interrupted by a signal after it successfully writes some
data, it shall return the number of bytes written.
So, basically, there are ways in which a write() could succeed only
partially. The file size limit is probably not so important, as other
attempts from other threads/processes will encounter the same lack of quota
or free space and receive an error with no data output at all. More
troubling is that a signal which hits the thread doing the write() can, if
I'm reading correctly, split the operation.
The chances of this actually happening for write()s of the size needed for
logging seem pretty slim to me, but they certainly cannot be said to be
zero without investigating the implementation. =/
Anyway, I suppose other important people live with those odds. Apache isn't
known for producing broken log files even though it typically has dozens of
forked processes all potentially vying to log.
>If a filesystem gives up on returning an error from write() on disk
>full, then pretty much every application is sunk when it comes to
>graceful filled disk recovery. Subversion won't be unusual in that
Hehe, true. The buggy filesystems listed in the Linux man page are probably
experimental versions of the Minix filesystem driver used before the ext
filesystem was first put together, or something :-)
>At any rate, locking or fsyncing for each log message would be a
>performance killer, so even if there are edge cases for simply opening
>for append and writing, I think we should do it anyway.
Okay. This is the simplest path to implement anyway :-) If people start
coming to us saying "1.4 produces broken log files", then we can start
investigating locking or possibly some other solution. Until such time,
we'll do without them :-)
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Mon Oct 31 10:46:23 2005