What does the FD_CLOEXEC fcntl() flag do? - c

Like so:
if (fcntl(fd, F_SETFD, FD_CLOEXEC) == -1) {
...
Though I've read man fcntl, I can't figure out what it does.

It sets the close-on-exec flag for the file descriptor, which causes the file descriptor to be automatically (and atomically) closed when any of the exec-family functions succeed.
It also tests the return value to see if the operation failed, which is rather useless if the file descriptor is valid, since there is no condition under which this operation should fail on a valid file descriptor.

It marks the file descriptor so that it will be close()d automatically when the process or any children it fork()s calls one of the exec*() family of functions. This is useful to keep from leaking your file descriptors to random programs run by e.g. system().

Note that the use of this flag is essential in some multithreaded programs, because using a separate fcntl(2) F_SETFD operation to set the FD_CLOEXEC flag does not suffice to avoid race conditions where one thread opens a file descriptor and attempts to set its close-on-exec flag using fcntl(2) at the same time as another thread does a fork(2) plus execve(2). Depending on the order of execution, the race may lead to the file descriptor returned by open() being unintentionally leaked to the program executed by the child process created by fork(2).
(This kind of race is, in principle, possible for any system call that creates a file descriptor whose close-on-exec flag should be set, and various other Linux system calls provide an equivalent of the O_CLOEXEC flag to deal with this problem.)

Related

File Descriptor flags and functions

I would like to know what happens with the O_NONBLOCK flag when I use the given file_des in a function. Does it keep the set flag, or not?
If not, should I reset it inside function? Is there any other way?
main()
{
int file_des;
fcntl(file_des, F_SETFD, O_NONBLOCK);
function(file_des);
}
function(int file_des)
{
//do something with file_des
//What happens with the O_NONBLOCK flag?
}
File descriptors are process-wide. When used in a function, or a thread, they always work the same way. That way is controlled by status flags. In Linux, there are five status flags:
O_APPEND: Causes all writes to occur at the end of the file, ignoring file position.
O_ASYNC: A signal is generated when read or write is possible; only available for terminals, pseudoterminals, sockets, pipes, and FIFOs. (I do seem to recall it is available also for some character devices, but I have not verified which ones, if any; the man pages do not say.)
O_DIRECT: Skip page cache for I/O. Complicated, with many constraints; do not use except in very limited special circumstances.
O_NOATIME: Do not update last access time.
O_NONBLOCK: Non-blocking I/O. Instead of waiting (blocking) when data is not immediately available, or cannot be immediately sent, return a short count. If nothing can be sent or received, read()/write() etc. return -1 with errno == EWOULDBLOCK.
O_NONBLOCK has no effect on normal files or block devices.
You modify these by setting the new set of status flags using fcntl(fd, F_SETFL, flags), with zero or more flags OR'ed together. (To disable all, use zero.)
The fcntl(fd, F_SETFD, dflags) sets the set of file descriptor flags. Currently, there is only one such flag, O_CLOEXEC, which causes the descriptor to be automatically closed when an execve() or other exec family of functions succeeds (including popen() and all others that fork and execute a new process). O_CLOEXEC is normally used as a flag for the open() call, though, to avoid the race window against another thread doing a fork() in between.
When you use open(filename, flags) or open(filename, flags, mode), the flags argument is a combination of access mode (O_RDONLY, O_WRONLY, or O_RDWR; exactly one must be used), file creation flags (including file descriptor flags), and file status flags, OR'ed together. (Except for O_ASYNC, which cannot be specified at open() time, and must be set afterwards using fcntl(fd, F_SETFL, flags | O_ASYNC).)

What are the unwanted side effects of open in ioctl?

According to man ioctl, opening file descriptors with open may cause unwanted side-effects. The manual also states that opening with O_NONBLOCK solves those unwanted issues but I can't seem to find what's the reason for that, nor what are the actual side-effects. Can someone shed light into that? With ioctl is it always possible and equivalent* to open file descriptors with O_NONBLOCK?
NOTES (from man ioctl)
In order to use this call, one needs an open file descriptor. Often the open(2) call has unwanted side effects, that can be avoided
under Linux by giving it the O_NONBLOCK flag.
(* I am aware of what O_NONBLOCK implies, but I don't know if that affects ioctl calls the same way it affects other syscalls. My program, which uses ioctl to write and read from an SPI bus, works perfectly with that flag enabled.)
The obvious place to look for the answer would be the open(2) man page, which, under the heading for O_NONBLOCK, says:
When possible, the file is opened in nonblocking mode.
Neither the open() nor any subsequent operations on the file
descriptor which is returned will cause the calling process to
wait.
[...]
For the handling of FIFOs (named pipes), see also fifo(7).
For a discussion of the effect of O_NONBLOCK in conjunction
with mandatory file locks and with file leases, see fcntl(2).
OK, that wasn't very informative, but let's follow the links and see what the manual pages for fifo(7) and fcntl(2) say:
Normally, opening the FIFO blocks until the other end is opened also.
A process can open a FIFO in nonblocking mode. In this case, opening
for read-only will succeed even if no-one has opened on the write
side yet and opening for write-only will fail with ENXIO (no such
device or address) unless the other end has already been opened.
Under Linux, opening a FIFO for read and write will succeed both in
blocking and nonblocking mode. POSIX leaves this behavior undefined.
So here's at least one "unwanted side effect": even just trying to open a FIFO may block, unless you pass O_NONBLOCK to the open() call (and open it for reading).
What about fcntl, then? As the open(2) man page notes, the sections to look under are those title "Mandatory locking" and "File leases". It seems to me that mandatory locks, in this case, are a red herring, though — they only cause blocking when one tries to actually read from or write to the file:
If a process tries to perform an incompatible
access (e.g., read(2) or write(2)) on a file region that has an
incompatible mandatory lock, then the result depends upon whether the
O_NONBLOCK flag is enabled for its open file description. If the
O_NONBLOCK flag is not enabled, then the system call is blocked until
the lock is removed or converted to a mode that is compatible with
the access.
What about leases, then?
When a process (the "lease breaker") performs an open(2) or
truncate(2) that conflicts with a lease established via F_SETLEASE,
the system call is blocked by the kernel and the kernel notifies the
lease holder by sending it a signal (SIGIO by default). [...]
Once the lease has been voluntarily or forcibly removed or
downgraded, and assuming the lease breaker has not unblocked its
system call, the kernel permits the lease breaker's system call to
proceed.
[...] If
the lease breaker specifies the O_NONBLOCK flag when calling open(2),
then the call immediately fails with the error EWOULDBLOCK, but the
other steps still occur as described above.
OK, so that's another unwanted side effect: if the file you're trying to open has a lease on it, the open() call may block until the lease holder has released the lease.
In both cases, the "unwanted side effect" avoided by passing O_NONBLOCK to open() is, unsurprisingly, the open() call itself blocking until some other process has done something. If there are any other kinds of side effects that the man page you cite refers to, I'm not aware of them.
From Linux System Programming, 2nd Edition, by R. Love (emphasis mine):
O_NONBLOCK
If possible, the file will be opened in nonblocking mode. Neither the
open() call,
nor any other operation will cause the process to block (sleep) on the I/O.
This
behaviour may be defined only for FIFOs.
Sometimes, programmers do not want a call to read() to block when there is
no data
available. Instead, they prefer that the call return immediately,
indicating that no data
is available. This is called nonblocking I/O; it allows applications to
perform I/O, potentially
on multiple files, without ever blocking, and thus missing data available in
another file.
Consequently, an additional errno value is worth checking: EAGAIN.
See this thread on the kernel mailing list.

C: blocking read should return, if filedescriptor is deleted

I am reading in a blocked way from a device/filedescriptor.
It might happen, that in a different thread the device is closed and filedescriptor is deleted. Unfortunatly the read doesn't return or take notice and keeps blocking.
As a workaround I could do a while loop with select as a timeout. If a timeout happens, I can check the filedescriptor and in case it is gone not calling read but return.
I am wondering, if there is a better way in Linux-C ?
The code you are describing has an inherent race condition - if another thread could be a in blocking read() on a file descriptor when you close() that file descriptor, the other thread could just as well be just about to call read() instead.
You can't call close() unless you know that all other threads are no longer in a position to be using that file descriptor at all.
The easiest way to handle cases like you describe is for one thread to be the 'owning' thread of each file descriptor, that is responsible for closing the file descriptor. Other threads don't directly close it - instead they mark the file descriptor as "to be closed" in some shared data structure and wake up the owning thread.
You can make it possible to wake the owning thread by having it not block in read() but instead block in select() or poll() with another file descriptor - usually a pipe - in the set as well as the target file descriptor. The thread is woken by writing to the other end of that pipe.
Once a file descriptor is closed by other thread, it's not easy to verify it happened. What if other thread re-opened a file and got the same
file descriptor? On a successful close() call, you can't access the file descriptor again and it'll be undefined. On a failed close() call, POSIX leaves the state of the file descriptor unspecified.
The select() option suffers from the same as described above.
Your problem is not really anything different to any other data race issue in a multi-threaded program. I suggest you re-write the
code so that threads don't access the file descriptor without synchronization. Or, avoid multiple threads reading from the same file descriptor if that's possible.

What is the order in which a POSIX system clears the file locks that were not unlocked cleanly?

The POSIX specification for fcntl() states:
All locks associated with a file for a given process shall be removed when a file descriptor for that file is closed by that process or the process holding that file descriptor terminates.
Is this operation of unlocking the file segment locks that were held by a terminated process atomic per-file? In other words, if a process had locked byte segments B1..B2 and B3..B4 of a file but did not unlock the segments before terminating, when the system gets around to unlocking them, are segments B1..B2 and B3..B4 both unlocked before another fcntl() operation to lock a segment of the file can succeed? If not atomic per-file, does the order in which these file segments are unlocked by the system depend on the order in which the file segments were originally acquired?
The specification for fcntl() does not say, but perhaps there is a general provision in the POSIX specification that mandates a deterministic order on operations to clean up after a process that exits uncleanly or crashes.
There's a partial answer in section 2.9.7, Thread Interactions with Regular File Operations, of the POSIX specification:
All of the functions chmod(), close(), fchmod(), fcntl(), fstat(), ftruncate(), lseek(), open(), read(), readlink(), stat(), symlink(), and write() shall be atomic with respect to each other in the effects specified in IEEE Std 1003.1-2001 when they operate on regular files. If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them.
So, for a regular file, if a thread of a process holds locks on segments of a file and calls close() on the last file descriptor associated with the file, then the effects of close() (including removing all outstanding locks on the file that are held by the process) are atomic with respect to the effects of a call to fcntl() by a thread of another process to lock a segment of the file.
The specification for exit() states:
These functions shall terminate the calling process with the following consequences:
All of the file descriptors, directory streams[, conversion descriptors, and message catalog descriptors] open in the calling process shall be closed.
...
Presumably, open file descriptors are closed as if by appropriate calls to close(), but unfortunately the specification does not say how open file descriptors are "closed".
The 2004 specification seems even more vague when it comes to the steps of abnormal process termination. The only thing I could find is the documentation for abort(). At least with the 2008 specification, there is a section titled Consequences of Process Termination on the page for _Exit(). The wording, though, is still:
All of the file descriptors, directory streams, conversion descriptors, and message catalog descriptors open in the calling process shall be closed.
UPDATE: I just opened issue 0000498 in the Austin Group Defect Tracker.
I don't think the POSIX specification stipulates whether the releasing of locks is atomic or not, so you should assume that it behaves as inconveniently as possible for you. If you need them to be atomic, they aren't; if you need them to be handled separately, they're atomic; if you don't care, some machines will do it one way and other machines the other way. So, write your code so that it doesn't matter.
I'm not sure how you'd write code to detect the problem.
In practice, I expect that the locks would be released atomically, but the standard doesn't say, so you should not assume.

Process started from system command in C inherits parent fd's

I have a sample application of a SIP server listening on both tcp and udp ports 5060.
At some point in the code, I do a system("pppd file /etc/ppp/myoptions &");
After this if I do a netstat -apn, It shows me that ports 5060 are also opened for pppd!
Is there any method to avoid this? Is this standard behaviour of the system function in Linux?
Thanks,
Elison
Yes, by default whenever you fork a process (which system does), the child inherits all the parent's file descriptors. If the child doesn't need those descriptors, it SHOULD close them. The way to do this with system (or any other method that does a fork+exec) is to set the FD_CLOEXEC flag on all file descriptors that shouldn't be used by the children of you process. This will cause them to be closed automatically whenever any child execs some other program.
In general, ANY TIME your program opens ANY KIND of file descriptor that will live for an extended period of time (such as a listen socket in your example), and which should not be shared with children, you should do
fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC);
on the file descriptor.
As of the 2016? revision of POSIX.1, you can use the SOCK_CLOEXEC flag or'd into the type of the socket to get this behavior automatically when you create the socket:
listenfd = socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
bind(listenfd, ...
listen(listemfd, ...
which guarentees it will be closed properly even if some other simultaneously running thread does a system or fork+exec call. Fortunately, this flag has been supported for awhile on Linux and BSD unixes (but not OSX, unfortunately).
You should probably avoid the system() function altogether. It's inherently dangerous, in that it invokes the shell, which can be tampered with and rather non-portable, even between Unicies.
What you should do is the fork()/exec() dance. It goes something like this
if(!fork()){
//close file descriptors
...
execlp("pppd", "pppd", "file", "/etc/ppp/myoptions", NULL);
perror("exec");
exit(-1);
}
Yes, this is standard behavior of fork() in Linux, from which system() is implemented.
The identifier returned from the socket() call is a valid file descriptor. This value is usable with file-oriented functions such as read(), write(), ioctl(), and close().
The converse, that every file descriptor is a socket, is not true. One cannot open a regular file with open() and pass that descriptor to, e.g., bind() or listen().
When you call system() the child process inherits the same file descriptors as the parent. This is how stdout (0), stdin (1), and stderr (2) are inherited by child processes. If you arrange to open a socket with a file descriptor of 0, 1 or 2, the child process will inherit that socket as one of the standard I/O file descriptors.
Your child process is inheriting every open file descriptor from the parent, including the socket you opened.
As others have stated, this is standard behavior that programs depend on.
When it comes to preventing it you have a few options. Firstly is closing all file descriptors after the fork(), as Dave suggests. Second, there is the POSIX support for using fcntl with FD_CLOEXEC to set a 'close on exec' bit on a per-fd basis.
Finally, though, since you mention you are running on Linux, there are a set of changes designed to let you set the bit right at the point of opening things. Naturally, this is platform dependent. An overview can be found at http://udrepper.livejournal.com/20407.html
What this means is that you can use a bitwise or with the 'type' in your socket creation call to set the SOCK_CLOEXEC flag. Provided you're running kernel 2.6.27 or later, that is.
system() copies current process and then launch a child on top of it. (current process is no more there. that is probably why pppd uses 5060. You can try fork()/exec() to create a child process and keep parent alive.

Resources