Is close/fclose on stdin guaranteed to be correct? - c

It seems as though the following calls do what you'd expect (close the stream and not allow any further input - anything waiting for input on the stream returns error), but is it guaranteed to be correct across all compilers/platforms?
close(fileno(stdin));
fclose(stdin);

fclose(stdin) causes any further use of stdin (implicit or explicit) to invoke undefined behavior, which is a very bad thing. It does not "inhibit input".
close(fileno(stdin)) causes any further attempts at input from stdin, after the current buffer has been depleted, to fail with EBADF, but only until you open another file, in which case that file will become fd #0 and bad things will happen.
A more robust approach might be:
int fd = open("/dev/null", O_WRONLY);
dup2(fd, 0);
close(fd);
with a few added error checks. This will ensure that all reads (after the current buffer is depleted) result in errors. If you just want them to result in EOF, not an error, use O_RDONLY instead of O_WRONLY.

DO NOT DO a close on fileno(FILE*). FILE is a buffering object. Looking into its implementation and meddling with its state carries all the caveats and dangers that would come with similar misbehavior on any other software module.
Don't do it.
AGH. Seriously. Nasty.

Nothing is guaranteed correct across every possible operating system. However, calling fclose(stdin) will work on any POSIX compliant operating system as well as Windows operating systems, so you should hit pretty much anything in general use at the moment.
As stated by the previous answer as well as my comment, there is no need to call close on the file handle. fclose() will properly close everything down for you.

Related

What to do if a posix close call fails?

On my system (Ubuntu Linux, glibc), man page of a close call specifies several error return values it can return. It also says
Not checking the return value of close() is a common but nevertheless serious programming error.
and at the same time
Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed.
So I am not allowed to ignore the return value nor to retry the call.
Given that, how shall I handle the close() call failure?
If the error happened when I was writing something to the file, I am probably supposed to try to write the information somewhere else to avoid the data loss.
If I was only reading the file, can I just log the failure and continue the program pretending nothing happened? Are there any caveats, leak of file descriptors or whatever else?
In practice, close should never be retried on error, and the fd you passed to close is always invalid (closed) after close returns, regardless of whether an error occurred. In some cases, an error may indicate that data was lost (certain NFS setups) or unusual hardware conditions for devices (e.g. tape could not be rewound), so you may want to be cautious to avoid data loss, but you should never attempt to close the fd again.
In theory, POSIX was unclear in the past as to whether the fd remains open when close fails with EINTR, and systems disagreed. Since it's important to know the state (otherwise you have either fd leaks or double-close bugs which are extremely dangerous in multithreaded programs), the resolution to Austin Group issue #529 specified the behavior strictly for future versions of POSIX, that EINTR means the fd remains open. This is the right behavior consistent with the definition of EINTR elsewhere, but Linux refuses to accept it. (FWIW there's an easy workaround for this that's possible at the libc syscall wrapper level; see glibc PR #14627.) Fortunately it never arises in practice anyway.
Some related questions you might find informative:
What are the reasons to check for error on close()?
Trying to make close sleep on Linux
First of all: EINTR means exactly that: System call was interrupted, if this happens on a close() call, there is exactly nothing you can do.
Apart from maybe keeping track of the fact, that if the fd belonged to a file, this file is possibly corrupt, there is not much you can do about errors on close() at all - depending on the return value. AFAIK the only case, where a close can be retried is on EBUSY, but I have yet to see that.
So:
Not checking the result of close() might mean that you miss file corruption, especially truncation.
Depending on the error, most of the time you can do nothing - a failed close() just means something has gone awfully wrong outside the scope of your application.

How to close a file?

I felt at peace with Posix after many years of experience.
Then I read this message from Linus Torvalds, circa 2002:
int ret;
do {
ret = close(fd);
} while(ret == -1 && errno != EBADF);
NO.
The above is
(a) not portable
(b) not current practice
The "not portable" part comes from the fact that (as somebody pointed
out), a threaded environment in which the kernel does close the FD
on errors, the FD may have been validly re-used (by the kernel) for
some other thread, and closing the FD a second time is a BUG.
Not only is looping until EBADF unportable, but any loop is, due to a race condition that I probably would have noticed if I hadn't "made peace" by taking such things for granted.
However, in the GCC C++ standard library implementation, basic_file_stdio.cc, we have
do
__err = fclose(_M_cfile);
while (__err && errno == EINTR);
The primary target for this library is Linux, but it seems not to be heeding Linus.
As far as I've come to understand, EINTR happens only after a system call blocks, which implies that the kernel received the request to free the descriptor before commencing whatever work got interrupted. So there's no need to loop. Indeed, the SA_RESTART signal behavior does not apply to close and generate such a loop by default, precisely because it is unsafe.
This is a standard library bug then, right? On every file ever closed by a C++ application.
EDIT: To avoid causing too much alarm before some guru comes along with an answer, I should note that close only seems to be allowed to block under specific circumstances, perhaps none of which ever apply to regular files. I'm not clear on all the details, but you should not see EINTR from close without opting into something by fcntl or setsockopt. Nevertheless the possibility makes generic library code more dangerous.
With respect to POSIX, R..'s answer to a related question is very clear and concise: close() is a non-restartable special case, and no loop should be used.
This was surprising to me, so I decided to describe my findings, followed by my conclusions and chosen solution at end.
This is not really an answer. Consider this more like the opinion of a fellow programmer, including the reasoning behind that opinion.
POSIX.1-2001 and POSIX.1-2008 describe three possible errno values that may occur: EBADF, EINTR, and EIO. The descriptor state after EINTR and EIO is "unspecified", which means it may or may not have been closed. EBADF indicates fd is not a valid descriptor. In other words, POSIX.1 clearly recommends using
if (close(fd) == -1) {
/* An error occurred, see 'errno'. */
}
without any retry looping to close file descriptors.
(Even the Austin Group defect #519 R.. mentioned, does not help with recovering from close() errors: it leaves it unspecified whether any I/O is possible after an EINTR error, even if the descriptor itself is left open.)
For Linux, the close() syscall is defined in fs/open.c, with __do_close() in fs/file.c managing the descriptor table locking, and filp_close() back in fs/open.c taking care of the details.
In summary, the descriptor entry is removed from the table unconditionally first, followed by filesystem-specific flushing (f_op->flush()), followed by notification (dnotify/fsnotify hook), and finally by removing any record or file locks. (Most local filesystems like ext2, ext3, ext4, xfs, bfs, tmpfs, and so on, do not have ->flush(), so given a valid descriptor, close() cannot fail. Only ecryptfs, exofs, fuse, cifs, and nfs have ->flush() handlers in Linux-3.13.6, as far as I can tell.)
This does mean that in Linux, if a write error occurs in the filesystem-specific ->flush() handler during close(), there is no way to retry; the file descriptor is always closed, just like Torvalds said.
The FreeBSD close() man page describes the exact same behaviour.
Neither the OpenBSD nor the Mac OS X close() man pages describe whether the descriptor is closed in case of errors, but I believe they share the FreeBSD behaviour.
It seems clear to me that no loop is necessary or required to close a file descriptor safely. However, close() may still return an error.
errno == EBADF indicates the file descriptor was already closed. If my code encounters this unexpectedly, to me it indicates there is a significant fault in the code logic, and the process should gracefully exit; I'd rather my processes die than produce garbage.
Any other errno values indicate an error in finalizing the file state. In Linux, it is definitely an error related to flushing any remaining data to the actual storage. In particular, I can imagine ENOMEM in case there is no room to buffer the data, EIO if the data could not be sent or written to the actual device or media, EPIPE if connection to the storage was lost, ENOSPC if the storage is already full with no reservation to the unflushed data, and so on. If the file is a log file, I'd have the process report the failure and exit gracefully. If the file contents are still available in memory, I would remove (unlink) the entire file, and retry. Otherwise I'd report the failure to the user.
(Remember that in Linux and FreeBSD, you do not "leak" file descriptors in the error case; they are guaranteed to be closed even if an error occurs. I am assuming all other operating systems I might use behave the same way.)
The helper function I'll use from now on will be something like
#include <unistd.h>
#include <errno.h>
/**
* closefd - close file descriptor and return error (errno) code
*
* #descriptor: file descriptor to close
*
* Actual errno will stay unmodified.
*/
static int closefd(const int descriptor)
{
int saved_errno, result;
if (descriptor == -1)
return EBADF;
saved_errno = errno;
result = close(descriptor);
if (result == -1)
result = errno;
errno = saved_errno;
return result;
}
I know the above is safe on Linux and FreeBSD, and I assume it is safe on all other POSIX-y systems. If I encounter one that is not, I can simply replace the above with a custom version, wrapping it in a suitable #ifdef for that OS. The reason this maintains errno unchanged is just a quirk of my coding style; it makes short-circuiting error paths shorter (less repeated code).
If I am closing a file that contains important user information, I will do a fsync() or fdatasync() on it prior to closing. This ensures the data hits the storage, but also causes a delay compared to normal operation; therefore I won't do it for ordinary data files.
Unless I will be unlink()ing the closed file, I will check closefd() return value, and act accordingly. If I can easily retry, I will, but at most once or twice. For log files and generated/streamed files, I only warn the user.
I want to remind anyone reading this far that we cannot make anything completely reliable; it is just not possible. What we can do, and in my opinion should do, is to detect when an error occurs, as reliably as we can. If we can easily and with neglible resource use retry, we should. In all cases, we should make sure the notification (about the error) is propagated to the actual human user. Let the human worry about whether some other action, possibly complex, needs to be done before the operation is retried. After all, a lot of tools are used only as a part of a larger task, and the best course of action usually depends on that larger task.

Restriction of C standard I/O and why we can't use C standard I/O with sockets

I am reading CSAPP recently. In section 10.9, it said that standard I/O should not be used with socket because of the reasons as follows:
(1) The restrictions of standard I/O
Restriction 1: Input functions following output functions. An input
function cannot follow an output function without an intervening call
to fflush, fseek, fsetpos, or rewind. The fflush function empties the
buffer associated with a stream. The latter three functions use the
Unix I/O lseek function to reset the current file position.
Restriction 2: Output functions following input functions. An output
function cannot follow an input function without an intervening call
to fseek, fsetpos, or rewind, unless the input function encounters an
end-of-file.
(2) It is illegal to use the lseek function on a socket.
Question 1: What would happen if I violate the restriction? I wrote a code snippet and it works fine.
Question 2: To walk around restriction 2, one approach is as follows:
File *fpin, *fpout;
fpin = fdopen(sockfd, "r");
fpout = fdopen(sockfd, "w");
/* Some Work Here */
fclose(fpin);
fclose(fpout);
In the text book, it said,
Closing an already closed descriptor in a threaded program is a
recipe for disaster.
Why?
Your workaround does not work as written, due to the double-close bug you cited. Double-close is harmless in single-threaded programs as long as there are no intervening operations which could open new file descriptors (the second close will just fail harmlessly with EBADF) but they are critical bugs in multi-threaded programs. Consider this scenario:
Thread A calls close(n).
Thread B calls open and it returns n which gets stored as int fd1.
Thread A calls close(n) again.
Thread B calls open again and it returns n again, which gets stored as fd2.
Thread B now attempts to write to fd1 and actually writes into the file opened by the second call to open instead of the one first opened.
This can lead to massive file corruption, information leak (imagine writing a password to a socket instead of a local file), etc.
However, the problem is easy to fix. Instead of calling fdopen twice with the same file descriptor, simply use dup to copy it and pass the copy to fdopen. With this simple fix, stdio is perfectly usable with sockets. It's not suitable for asynchronous event loop usage still, but if you're using threads for IO, it works great.
Edit: I think I skipped answering your question 1. What happens if you violate the rules about how to switch between input and output on a stdio stream is undefined behavior. This means testing it and seeing that it "works" is not meaningful; it could mean either:
The C implementation you're using provides a definition (as part of its documentation) for what happens in this case, and it matches the behavior you wanted. In this case, you can use it, but your code will not be portable to other implementations. Doing so is considered very bad practice for this reason. Or,
You just got the result you expected by chance, usually as a side effect of how the relevant functionality is implemented internally on the implementation you're using. In this case, there's no guarantee that it doesn't have corner cases that fail to behave as you expected, or that it will continue to work the same way in future releases, etc.

C - freopen() for redirecting needs fclose()?

After a call like freopen(file_name, open_mode, stderr), do I need to call fclose(stderr) before the process ends or it is done automatically ?
thanks in advance and sorry for my English
It is not necessary to close an open stream as all open streams are closed at program termination but it is good practice to have a fclose for every fopen call. In the case of stderr here, this stream is already open at startup (you didn't have to call fopen) and so I see no reason to explicitly close it even if some freopen calls were issued.
When a process ends, all its handles are closed automatically. However, it is good style to close every handle you acquire - for example, somebody may want to convert your program to a library, or you may be looking for leaks. Whether you call freopen does not matter for that.
However, in the case of stderr, the situation is a little bit different. Since you didn't specifically open that stream, you shouldn't close it. It's also very likely to be used by other components out of your control, for example atexit functions or stack smashing detection.
Yes, that is good practice, but not strictly necessary if your process will end immediately since the OS will clean everything up for you when the process dies.
Of course. And simply fclose(stderr); will do.

What does select(2) do if you close(2) a file descriptor in a separate thread?

What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?
From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.
From some additional investigation, it appears that both dwc and bothie are right.
bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.
The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)
So thank you all for the help.
I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".
Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.
As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.
The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.
This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.
Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?
The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.
Regards, Bodo
It's a little confusing what you're asking...
Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.
If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.

Resources