I felt at peace with Posix after many years of experience.
Then I read this message from Linus Torvalds, circa 2002:
int ret;
do {
ret = close(fd);
} while(ret == -1 && errno != EBADF);
NO.
The above is
(a) not portable
(b) not current practice
The "not portable" part comes from the fact that (as somebody pointed
out), a threaded environment in which the kernel does close the FD
on errors, the FD may have been validly re-used (by the kernel) for
some other thread, and closing the FD a second time is a BUG.
Not only is looping until EBADF unportable, but any loop is, due to a race condition that I probably would have noticed if I hadn't "made peace" by taking such things for granted.
However, in the GCC C++ standard library implementation, basic_file_stdio.cc, we have
do
__err = fclose(_M_cfile);
while (__err && errno == EINTR);
The primary target for this library is Linux, but it seems not to be heeding Linus.
As far as I've come to understand, EINTR happens only after a system call blocks, which implies that the kernel received the request to free the descriptor before commencing whatever work got interrupted. So there's no need to loop. Indeed, the SA_RESTART signal behavior does not apply to close and generate such a loop by default, precisely because it is unsafe.
This is a standard library bug then, right? On every file ever closed by a C++ application.
EDIT: To avoid causing too much alarm before some guru comes along with an answer, I should note that close only seems to be allowed to block under specific circumstances, perhaps none of which ever apply to regular files. I'm not clear on all the details, but you should not see EINTR from close without opting into something by fcntl or setsockopt. Nevertheless the possibility makes generic library code more dangerous.
With respect to POSIX, R..'s answer to a related question is very clear and concise: close() is a non-restartable special case, and no loop should be used.
This was surprising to me, so I decided to describe my findings, followed by my conclusions and chosen solution at end.
This is not really an answer. Consider this more like the opinion of a fellow programmer, including the reasoning behind that opinion.
POSIX.1-2001 and POSIX.1-2008 describe three possible errno values that may occur: EBADF, EINTR, and EIO. The descriptor state after EINTR and EIO is "unspecified", which means it may or may not have been closed. EBADF indicates fd is not a valid descriptor. In other words, POSIX.1 clearly recommends using
if (close(fd) == -1) {
/* An error occurred, see 'errno'. */
}
without any retry looping to close file descriptors.
(Even the Austin Group defect #519 R.. mentioned, does not help with recovering from close() errors: it leaves it unspecified whether any I/O is possible after an EINTR error, even if the descriptor itself is left open.)
For Linux, the close() syscall is defined in fs/open.c, with __do_close() in fs/file.c managing the descriptor table locking, and filp_close() back in fs/open.c taking care of the details.
In summary, the descriptor entry is removed from the table unconditionally first, followed by filesystem-specific flushing (f_op->flush()), followed by notification (dnotify/fsnotify hook), and finally by removing any record or file locks. (Most local filesystems like ext2, ext3, ext4, xfs, bfs, tmpfs, and so on, do not have ->flush(), so given a valid descriptor, close() cannot fail. Only ecryptfs, exofs, fuse, cifs, and nfs have ->flush() handlers in Linux-3.13.6, as far as I can tell.)
This does mean that in Linux, if a write error occurs in the filesystem-specific ->flush() handler during close(), there is no way to retry; the file descriptor is always closed, just like Torvalds said.
The FreeBSD close() man page describes the exact same behaviour.
Neither the OpenBSD nor the Mac OS X close() man pages describe whether the descriptor is closed in case of errors, but I believe they share the FreeBSD behaviour.
It seems clear to me that no loop is necessary or required to close a file descriptor safely. However, close() may still return an error.
errno == EBADF indicates the file descriptor was already closed. If my code encounters this unexpectedly, to me it indicates there is a significant fault in the code logic, and the process should gracefully exit; I'd rather my processes die than produce garbage.
Any other errno values indicate an error in finalizing the file state. In Linux, it is definitely an error related to flushing any remaining data to the actual storage. In particular, I can imagine ENOMEM in case there is no room to buffer the data, EIO if the data could not be sent or written to the actual device or media, EPIPE if connection to the storage was lost, ENOSPC if the storage is already full with no reservation to the unflushed data, and so on. If the file is a log file, I'd have the process report the failure and exit gracefully. If the file contents are still available in memory, I would remove (unlink) the entire file, and retry. Otherwise I'd report the failure to the user.
(Remember that in Linux and FreeBSD, you do not "leak" file descriptors in the error case; they are guaranteed to be closed even if an error occurs. I am assuming all other operating systems I might use behave the same way.)
The helper function I'll use from now on will be something like
#include <unistd.h>
#include <errno.h>
/**
* closefd - close file descriptor and return error (errno) code
*
* #descriptor: file descriptor to close
*
* Actual errno will stay unmodified.
*/
static int closefd(const int descriptor)
{
int saved_errno, result;
if (descriptor == -1)
return EBADF;
saved_errno = errno;
result = close(descriptor);
if (result == -1)
result = errno;
errno = saved_errno;
return result;
}
I know the above is safe on Linux and FreeBSD, and I assume it is safe on all other POSIX-y systems. If I encounter one that is not, I can simply replace the above with a custom version, wrapping it in a suitable #ifdef for that OS. The reason this maintains errno unchanged is just a quirk of my coding style; it makes short-circuiting error paths shorter (less repeated code).
If I am closing a file that contains important user information, I will do a fsync() or fdatasync() on it prior to closing. This ensures the data hits the storage, but also causes a delay compared to normal operation; therefore I won't do it for ordinary data files.
Unless I will be unlink()ing the closed file, I will check closefd() return value, and act accordingly. If I can easily retry, I will, but at most once or twice. For log files and generated/streamed files, I only warn the user.
I want to remind anyone reading this far that we cannot make anything completely reliable; it is just not possible. What we can do, and in my opinion should do, is to detect when an error occurs, as reliably as we can. If we can easily and with neglible resource use retry, we should. In all cases, we should make sure the notification (about the error) is propagated to the actual human user. Let the human worry about whether some other action, possibly complex, needs to be done before the operation is retried. After all, a lot of tools are used only as a part of a larger task, and the best course of action usually depends on that larger task.
Related
On my system (Ubuntu Linux, glibc), man page of a close call specifies several error return values it can return. It also says
Not checking the return value of close() is a common but nevertheless serious programming error.
and at the same time
Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed.
So I am not allowed to ignore the return value nor to retry the call.
Given that, how shall I handle the close() call failure?
If the error happened when I was writing something to the file, I am probably supposed to try to write the information somewhere else to avoid the data loss.
If I was only reading the file, can I just log the failure and continue the program pretending nothing happened? Are there any caveats, leak of file descriptors or whatever else?
In practice, close should never be retried on error, and the fd you passed to close is always invalid (closed) after close returns, regardless of whether an error occurred. In some cases, an error may indicate that data was lost (certain NFS setups) or unusual hardware conditions for devices (e.g. tape could not be rewound), so you may want to be cautious to avoid data loss, but you should never attempt to close the fd again.
In theory, POSIX was unclear in the past as to whether the fd remains open when close fails with EINTR, and systems disagreed. Since it's important to know the state (otherwise you have either fd leaks or double-close bugs which are extremely dangerous in multithreaded programs), the resolution to Austin Group issue #529 specified the behavior strictly for future versions of POSIX, that EINTR means the fd remains open. This is the right behavior consistent with the definition of EINTR elsewhere, but Linux refuses to accept it. (FWIW there's an easy workaround for this that's possible at the libc syscall wrapper level; see glibc PR #14627.) Fortunately it never arises in practice anyway.
Some related questions you might find informative:
What are the reasons to check for error on close()?
Trying to make close sleep on Linux
First of all: EINTR means exactly that: System call was interrupted, if this happens on a close() call, there is exactly nothing you can do.
Apart from maybe keeping track of the fact, that if the fd belonged to a file, this file is possibly corrupt, there is not much you can do about errors on close() at all - depending on the return value. AFAIK the only case, where a close can be retried is on EBUSY, but I have yet to see that.
So:
Not checking the result of close() might mean that you miss file corruption, especially truncation.
Depending on the error, most of the time you can do nothing - a failed close() just means something has gone awfully wrong outside the scope of your application.
So I'm studying fclose manpage for quite I while and my conclusion is that if fclose is interrupted by some signal, according to the manpage there is no way to recover...? Am I missing some point?
Usually, with unbuffered POSIX functions (open, close, write, etc...) there is ALWAYS a way to recover from signal interruption (EINTR) by restarting the call; in contrast documentation of buffered calls states that after a failed fclose attempt another try has undefined behavior... no hint about HOW to recover instead. Am I just "unlucky" if a signal interrupts fclose? Data might be lost and I can't be sure whether the file descriptor is actually closed or not. I do know that the buffer is deallocated, but what about the file descriptor?
Think about large scale applications that use lot's of fd's simultaneously and would run into problems if fd's are not properly freed -> I would assume there must be a CLEAN solution to this problem.
So let's assume I'm writing a library and it's not allowed to use sigaction and SA_RESTART and lots of signals are sent, how do I recover if fclose is interrupted?
Would it be a good idea to call close in a loop (instead of fclose) after fclose failed with EINTR? Documentation of fclose simply doesn't mention the state of the file descriptor; UNDEFINED is not very helpful though... if fd is closed and I call close again, weird hard-to-debug side-effects could occur so naturally I would rather ignore this case as doing the wrong thing... then again, there is no unlimited number of file descriptors available, and resource leakage is some sort of bug (at least to me).
Of course I could check one specific implementation of fclose but I can't believe someone designed stdio and didn't think about this problem? Is it just the documentation that is bad or the design of this function?
This corner case really bugs me :(
EINTR and close()
In fact, there are also problems with close(), not only with fclose().
POSIX states that close() returns EINTR, which usually means that application may retry the call. However things are more complicated in linux. See this article on LWN and also this post.
[...] the POSIX EINTR semantics are not really possible on Linux. The file descriptor passed to close() is de-allocated early in the processing of the system call and the same descriptor could already have been handed out to another thread by the time close() returns.
This blog post and this answer explain why it's not a good idea to retry close() failed with EINTR. So in Linux, you can do nothing meaningful if close() failed with EINTR (or EINPROGRESS).
Also note that close() is asynchronous in Linux. E.g., sometimes umount may return EBUSY immediately after closing last opened descriptor on filesystem since it's not yet released in kernel. See interesting discussion here: page 1, page 2.
EINTR and fclose()
POSIX states for fclose():
After the call to fclose(), any use of stream results in undefined behavior.
Whether or not the call succeeds, the stream shall be disassociated from the file and any buffer set by the setbuf() or setvbuf() function shall be disassociated from the stream. If the associated buffer was automatically allocated, it shall be deallocated.
I believe it means that even if close() failed, fclose() should free all resources and produce no leaks. It's true at least for glibc and uclibc implementations.
Reliable error handling
Call fflush() before fclose().
Since you can't determine if fclose() failed when it called fflush() or close(), you have to explicitly call fflush() before fclose() to ensure that userspace buffer was successfully sent to kernel.
Don't retry after EINTR.
If fclose() failed with EINTR, you can not retry close() and also can not retry fclose().
Call fsync() if you need.
If you care about data integrity, you should call fsync() or fdatasync() before calling fclose() 1.
If you don't, just ignore EINTR from fclose().
Notes
If fflush() and fsync() succeeded and fclose() failed with EINTR, no data is lost and no leaks occur.
You should also ensure that the FILE object is not used between fflush() and fclose() calls from another thread 2.
[1] See "Everything You Always Wanted to Know About Fsync()" article which explains why fsync() may also be an asynchronous operation.
[2] You can call flockfile() before calling fflush() and fclose(). It should work with fclose() correctly.
Think about large scale applications that use lot's of fd's simultaneously and would run into problems if fd's are not properly freed -> I would assume there must be a CLEAN solution to this problem.
The possibility to retry fflush() and then close() on the underlying file descriptor was already mentioned in the comments. For a large scale application, I would favour the pattern to use threads and have one dedicated signal handling thread, while all other threads have signals blocked using pthread_sigmask(). Then, when fclose() fails, you have a real problem.
I was reading DJB's "Some thoughts on security after ten years of Qmail 1.0" and he listed this function for moving a file descriptor:
int fd_move(to,from)
int to;
int from;
{
if (to == from) return 0;
if (fd_copy(to,from) == -1) return -1;
close(from);
return 0;
}
It occurred to me that this code does not check the return value of close, so I read the man page for close(2), and it seems it can fail with EINTR, in which case the appropriate behavior would seem to be to call close again with the same argument.
Since this code was written by someone with far more experience than I in both C and UNIX, and additionally has stood unchanged in qmail for over a decade, I assume there must be some nuance that I'm missing that makes this code correct. Can anyone explain that nuance to me?
I've got two answers:
He was trying to make a point about factoring out common code and often such examples omit error checking for brevity and clarity.
close(2) may return EINTER, but does it in practice, and if so what would you reasonably do? Retry once? Retry until success? What if you get EIO? That could mean almost anything, so you really have no reasonable recourse except logging it and moving on. If you retry after an EIO, you might get EBADF, then what? Assume that the descriptor is closed and move on?
Every system call can return EINTR, escpecially one that blocks like read(2) waiting on a slow human. This is a more likely scenario and a good "get input from terminal" routine will indeed check for this. That also means that write(2) can fail, even when writing a log file. Do you try to log the error that the logger generated or should you just give up?
When a file descriptor is dup'd, as it is in the fd_copy or dup2 function, you will end up with more than one file descriptor referring to the same thing (i.e. the same struct file in the kernel). Closing one of them will simply decrement its reference count. No operation is performed on the underlying object unless it is the last close. As a result, conditions such as EINTR and EIO are not possible.
Another possibility is that his function is used only in an application (or a part of one) which has done something to ensure that the call will not be interrupted by a signal. If you aren't going to do anything important with signals, then you don't have to be responsive to them, and it might make sense to mask them all out, rather than wrap every single blocking system call in an EINTR retry. Except of course the ones that will kill you, so SIGKILL and frequently SIGPIPE if you handle it by quitting, along with SIGSEGV and similar fatal errors which will in any case never be delivered to a correct user-space app.
Anyway, if all he's talking about is security, then quite possibly he doesn't have to retry close. If close failed with EIO, then he would not be able to retry it, it would be a permanent failure. Therefore, it is not necessary for the correctness of his program that close succeeds. It may well be that it is not necessary for the correctness of his program that close be retried on EINTR, either.
Usually you want your program to make a best effort to succeed, and that means retrying on EINTR. But this is a separate concern from security. If your program is designed so that some function failing for any reason isn't a security flaw, then in particular the fact that it happens to have failed EINTR, rather than for a permanent reason, isn't a flaw. DJB has been known to be fairly opinionated, so I would not be at all surprised if he has proved some reason why he doesn't need to retry, and therefore doesn't bother, even if doing so would allow his program to succeed in flushing the handle in certain situations where maybe it currently fails (like being explicitly sent a harmless signal with kill by the user at a crucial moment).
Edit: it occurs to me that retrying on EINTR could potentially itself be a security flaw under certain conditions. It introduces a new behaviour to that section of code: it can loop indefinitely in response to a signal flood, where previously it would make one attempt to close and then return. I don't know for sure that this would cause qmail any problems (after all, close itself makes no guarantees how soon it will return). But if giving up after one attempt does make the code easier to analyse then it could plausibly be a smart move. Or not.
You might think that retrying prevents a DoS flaw, where a signal causes a spurious failure. But retrying allows another (more difficult) DoS flaw, where a signal flood causes an indefinite stall. In terms of binary "can this app be DoSed?", which is the kind of absolute security question DJB was interested in when he wrote qmail and djbdns, it makes no difference. If something can happen once, then normally that means it can happen many times.
Only broken unices ever return EINTR without you explicitly asking for it. The sane semantics for signal() enable restartable system calls ("BSD style"). When building a program on a system with the sysv semantics (interrupting signals), you should always replace calls to signal() with calls to bsd_signal(), which you can define yourself in terms of sigaction() if it doesn't exist.
It's further worth noting that no systems will return EINTR on signal receipt unless you have installed signal handlers. If the default action is left in place, or if the signal is set to no action, it's impossible for system calls to be interrupted.
What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?
From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.
From some additional investigation, it appears that both dwc and bothie are right.
bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.
The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)
So thank you all for the help.
I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".
Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.
As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.
The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.
This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.
Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?
The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.
Regards, Bodo
It's a little confusing what you're asking...
Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.
If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.
It seems as though the following calls do what you'd expect (close the stream and not allow any further input - anything waiting for input on the stream returns error), but is it guaranteed to be correct across all compilers/platforms?
close(fileno(stdin));
fclose(stdin);
fclose(stdin) causes any further use of stdin (implicit or explicit) to invoke undefined behavior, which is a very bad thing. It does not "inhibit input".
close(fileno(stdin)) causes any further attempts at input from stdin, after the current buffer has been depleted, to fail with EBADF, but only until you open another file, in which case that file will become fd #0 and bad things will happen.
A more robust approach might be:
int fd = open("/dev/null", O_WRONLY);
dup2(fd, 0);
close(fd);
with a few added error checks. This will ensure that all reads (after the current buffer is depleted) result in errors. If you just want them to result in EOF, not an error, use O_RDONLY instead of O_WRONLY.
DO NOT DO a close on fileno(FILE*). FILE is a buffering object. Looking into its implementation and meddling with its state carries all the caveats and dangers that would come with similar misbehavior on any other software module.
Don't do it.
AGH. Seriously. Nasty.
Nothing is guaranteed correct across every possible operating system. However, calling fclose(stdin) will work on any POSIX compliant operating system as well as Windows operating systems, so you should hit pretty much anything in general use at the moment.
As stated by the previous answer as well as my comment, there is no need to call close on the file handle. fclose() will properly close everything down for you.