Is there any expert out there that can help me with the following?
I have the following system calls in C:
access()
unlink()
setsockopt()
fcntl()
setsid()
socket()
bind()
listen()
I want to know if they may fail with error code -1 and errno EINTR/EAGAIN.
Should I have to handle EINTR/EAGAIN for these?
The documentation do not refer anything related to EINTR/EAGAIN but many people I see handle it.
Which is the correct?
Here is how I register signal handlers : https://gitorious.org/zepto-web-server/zepto-web-server/source/b1b03b9ecccfe9646e34caf3eb04689e2bbc54dd:src/signal-dispatcher-utility.c
With this configuration: https://gitorious.org/zepto-web-server/zepto-web-server/source/b1b03b9ecccfe9646e34caf3eb04689e2bbc54dd:src/server-signals-support-utility.c
Also here is a commit that I added some EINTR/EAGAIN handling in some system calls that I know that return EINTR or EAGAIN : https://gitorious.org/zepto-web-server/zepto-web-server/commit/b1b03b9ecccfe9646e34caf3eb04689e2bbc54dd
Unless you install an interrupting signal handler (one installed with sigaction omitting the SA_RESTART flag, or one installed with the signal function on some systems) you should not expect to see EINTR at all.
Among your particular list of functions, I don't see any that could experience EINTR anyway except fcntl, and for it, only when it's used for locking. The link in John's answer should be helpful answering questions about specific functions, though.
See http://man7.org/linux/man-pages/man7/signal.7.html -- start reading near the bottom where it talks about "Interruption of system calls and library functions..." This is a Linux man page, but the info is pretty generally applicable to any Unix/Posix/Linux-flavored system.
There is a section entitled ERRORS in every man page of *NIX system call. Refer to the manual, for example : http://man7.org/linux/man-pages/man2/accept.2.html. You can also use the command line man accept to view it.
In general, system calls that can take some time to compute can set -1+EINTR on signal delivery and short system calls not. For example, accept() can block your process so it can be interrupted by a signal, but setsid() is so short that it has been written to not be interrupted by signals.
signal(7) for Linux lists
accept
connect
fcntl
flock
futex
ioctl
open
read
readv
recv
recvfrom
recvmsg
send
sendmsg
sendto
wait
wait3
wait4
waitid
waitpid
write
writev
as possibly interruptible (EINTR) by no-SA_RESTART handlers and
setsockopt
accept
recv
recvfrom
recvmsg
connect
send
sendto
sendmsg
pause
sigsuspend
sigtimedwait
sigwaitinfo
epoll_wait
epoll_pwait
poll
ppoll
select
lect
msgrcv
msgsnd
semop
semtimedop
clock_nanosleep
nanosleep
read
io_getevents
sleep
as EINTR-interruptible, even by SA_RESTART handlers.
Furthermore, it lists:
setsockopt
accept
recv
recvfrom
recvmsg
connect
send
sendto
sendmsg
epoll_wait
epoll_pwait
semop
semtimedop
sigtimedwait
sigwaitinfo
read
futex
msgrcv
msgsnd
nanosleep
as EINTR-interruptible by a stopping signal + SIGCONT, and says this particular behavior is Linux-specific and not sanctioned by POSIX.1.
Apart from these, especially if the function's specification doesn't list EINTR, you shouldn't get EINTR.
If you don't trust the system to honor it, you can try bombarding a loop with your suspected system function by SIGSTOP/SIGCONT+a signal with a no-SA_RESTART no-op handler and see if you can elicit an EINTR.
I tried that with:
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
static void chld(int Sig)
{
int status;
if(0>wait(&status))
_exit(1);
if(!WIFEXITED(status)){
//this can only interrupt an AS-safe block
assert(WIFSIGNALED(status) && WTERMSIG(status) == SIGALRM);
puts("OK");
exit(0);
} else {
switch(WEXITSTATUS(status)){
case 1: puts("FAIL"); break;
case 2: puts("EINTR"); break;
}
}
exit(0);
}
static void nop(int Sig)
{
}
int main()
{
sigset_t full;
sigfillset(&full);
sigaction(SIGCHLD, &(struct sigaction){ .sa_handler=chld, .sa_mask=full, .sa_flags=0 } , 0);
sigaction(SIGUSR1, &(struct sigaction){ .sa_handler=nop, .sa_mask=full, .sa_flags=0 } , 0);
pid_t p;
if(0>(p=fork())) { perror(0); return 1; }
if(p!=0){
//bombard it with SIGSTOP/SIGCONT/SIGUSR1
for(;;){
usleep(1); kill(p, SIGSTOP); kill(p, SIGCONT); kill(p, SIGUSR1);
}
}else{
sigaction(SIGCHLD, &(struct sigaction){ .sa_handler=SIG_DFL }, 0);
if(0>alarm(1))
return 1;
for(;;){
#if 1
/*not interruptible*/
if(0>access("/dev/null", R_OK)){
if(errno==EINTR)
return 2;
perror(0);
return 1;
}
#else
int fd;
unlink("fifo");
if(0>mkfifo("fifo",0600))
return 1;
/*interruptible*/
if(0>(fd=open("fifo", O_RDONLY|O_CREAT, 0600))){
if(errno==EINTR)
return 2;
perror(0);
return 1;
}
close(fd);
#endif
}
}
return 0;
}
and unlink and access definitely appear to be EINTR-uninterruptible (in compliance with their spec), which means an EINTR-retry loop around them would be unnecessary.
Related
When I call open("./fifo",O_RDONLY), the syscall will block because no one is writing to the fifo ./fifo. If a signal is received during that time that has no signal handler, the process ends instantly. So far so good.
But when a signal is received that has a signal handler, the signal handler is executed and the open() syscall is still blocking.
How can I make open() return when I catch the signal?
I tried to block the signal, that does not work because there is no sigmask argument for open() like there is for pselect(). Using O_NONBLOCK does not work either, because then open() will return with an error, whether there is a signal or not. Removing the signal handler is also no good because I want to be able to react to the signal.
My test code:
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static volatile bool end=0;
static void catchSignal(int signal)
{
(void)signal;
const char *s="Catched Signal\n";
write(STDERR_FILENO,s,strlen(s));
end=1;
}
static int openFile(void)
{
int fd=open("./in",O_RDONLY);
if(fd<0)
{
perror("can't open file");
exit(1);
}
return fd;
}
int main()
{
if(SIG_ERR==signal(SIGTERM,catchSignal))
{
perror("cant set up signal handler");
return -1;
}
int fd = openFile();
while(end==0)
{
puts("Still running");
usleep(300UL*1000);
}
puts("End now");
if(fd>0)
{
close(fd);
}
return 0;
}
The signal() function is problematic because of a history of implementations with different details. According to its Linux manual page:
The only portable use of signal() is to set a signal's disposition to SIG_DFL or SIG_IGN. The semantics when using signal() to establish a signal handler vary across systems (and POSIX.1 explicitly permits this variation); do not use it for this purpose.
(Emphasis in the original)
Instead of signal(), you should be using sigaction():
struct sigaction sa = { .sa_handler = catchSignal };
if (SIG_ERR == sigaction(SIGTERM, &sa, NULL))
Note that among the fields of a struct sigaction is sa_flags, a bitmask with which you can select among the various behaviors historically implemented by different versions of signal(). In particular, if you do not include the SA_RESTART flag, as the above indeed does not, then you should not see system calls automatically resume when interrupted by a signal (except for those few that are explicitly specified to do so).
When you strace your program, you see that signal() functions sets SA_RESTART flag for the signal:
rt_sigaction(SIGTERM, {sa_handler=0x562f2a8c3249, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb504d2d210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
meaning, that the open() syscall will be automatically restarted after handling the signal.
You can use sigaction() to have more fine-grained control over signal handling and not set the SA_RESTART:
struct sigaction sa;
memset (&sa, 0, sizeof (sa));
sa.sa_handler = catchSignal;
sa.sa_flags = 0;
sigemptyset (&sa.sa_mask);
if (sigaction (SIGTERM, &sa, NULL) == -1) {
perror("sigaction");
return -1;
}
you can use O_NONBLOCK in which case open() will return immediately, and you will block as soon as you fcntl(2) cancelling the O_NONBLOCK.
Read the man page, as probably you have some way to make open(2) return -1 and errno equal to EINTR. But the normal usage is what has been described, the call is reissued, so the signal handler doesn't make the calll to be interrupted (much code depends on this behaviour). I'm not sure about this, but I think only pause(2) and select(2) and friends are interrupted (and return) when a non-ignored signal is received.
Is worth noting that only the thread that is blocked in an interruptible call and receives the signal is awaken and the call interrupted, and the thread receiving the interrupt can be any of the ones you have started in your process.
To ensure that all destructors are properly called if the program is terminated from keyboard (Ctrl+C), the approach with signals are used:
a handler, which sets an exit flag, is set for SIGINT
if a blocking call (accept(), read(), connect(), etc) is waiting for completion, it returns -1 and errno is set to EINTR
The problem is that SIGINT can arrive between check for exit flag (while (!finish)) and calling read(). In this case, read() will be blocked until the signal is sent once again.
This is a minimal working example:
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
enum { STDIN, STDOUT, STDERR };
static unsigned char finish=0;
static void handleSignal(int signal) {
finish=1;
}
int main(int argc, char ** e) {
struct sigaction action;
memset(&action, 0, sizeof(action));
action.sa_handler=handleSignal;
action.sa_flags=0;
sigaction(SIGINT, &action, NULL);
char buffer[256];
puts("<<");
while (!finish) {
sleep(2);
ssize_t n=read(STDIN, buffer, sizeof(buffer));
if (n==0) {
// End of stream
finish=1;
}
else if (n<0) {
// Error or interrupt
if (errno!=EINTR)
perror("read");
}
else {
// Convert data to hexadecimal format
for (size_t i=0; i<n; i++)
printf("%02x", buffer[i]);
}
}
puts(">>\n");
return 0;
}
sleep(2) is added for visibility (a real program may perform some preparational work before reading from file descritor).
If there any way of reliable handling of signals without using non-crossplatform things like signalfd()?
The pselect(2) system call was invented to solve this exact problem. It's POSIX, so hopefully cross-platform enough for you.
The purpose of pselect is to atomically unblock some signals, wait for I/O as select() does, and reblock them. So your loop can look something like the following pseudocode:
sigprocmask(SIG_BLOCK, {SIGINT});
while (1) {
if (finish)
graceful_exit();
int ret = pselect(1, {STDIN}, ..., { /* empty signal set */});
if (ret > 0) {
read(STDIN, buf, size); // will not block
// process data
// If you like you can do
sigprocmask(SIG_UNBLOCK, {SIGINT});
// work work work
if (finish)
graceful_exit();
// work work work
sigprocmask(SIG_BLOCK, {SIGINT});
} else {
// handle timeout or other errors
}
}
There is no race here because SIGINT is blocked for the time in between checking the finish flag and the call to pselect, so it cannot be delivered during that window. But the signal is unblocked while pselect is waiting, so if it arrives during that time (or already arrived while it was blocked), pselect will return without further delay. We only call read when pselect has told us it was ready for reading, so it cannot block.
If your program is multithreaded, use pthread_sigmask instead of sigprocmask.
As was noted in comments, you have to make your finish flag volatile, and for best compatibility it should be of type sig_atomic_t.
There is more discussion and another example in the select_tut(2) man page.
I'm trying to add a signal handler for proper cleanup to my event-driven application.
My signal handler for SIGINT only changes the value of a global flag variable, which is then checked in the main loop. To avoid races, the signal is blocked at all times, except during the pselect() call. This should cause pending signals to be delivered only during the pselect() call, which should be interrupted and fail with EINTR.
This usually works fine, except if there are already events pending on the monitored file descriptors (e.g. under heavy load, when there's always activity on the file descriptors).
This sample program reproduces the problem:
#include <assert.h>
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/select.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
volatile sig_atomic_t stop_requested = 0;
void handle_signal(int sig)
{
// Use write() and strlen() instead of printf(), which is not async-signal-safe
const char * out = "Caught stop signal. Exiting.\n";
size_t len = strlen (out);
ssize_t writelen = write(STDOUT_FILENO, out, len);
assert(writelen == (ssize_t) len);
stop_requested = 1;
}
int main(void)
{
int ret;
// Install signal handler
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handle_signal;
ret = sigaction(SIGINT, &sa, NULL);
assert(ret == 0);
}
// Block SIGINT
sigset_t old_sigmask;
{
sigset_t blocked;
sigemptyset(&blocked);
sigaddset(&blocked, SIGINT);
ret = sigprocmask(SIG_BLOCK, &blocked, &old_sigmask);
assert(ret == 0);
}
ret = raise(SIGINT);
assert(ret == 0);
// Create pipe and write data to it
int pipefd[2];
ret = pipe(pipefd);
assert(ret == 0);
ssize_t writelen = write(pipefd[1], "foo", 3);
assert(writelen == 3);
while (stop_requested == 0)
{
printf("Calling pselect().\n");
fd_set fds;
FD_ZERO(&fds);
FD_SET(pipefd[0], &fds);
struct timespec * timeout = NULL;
int ret = pselect(pipefd[0] + 1, &fds, NULL, NULL, timeout, &old_sigmask);
assert(ret >= 0 || errno == EINTR);
printf("pselect() returned %d.\n", ret);
if (FD_ISSET(pipefd[0], &fds))
printf("pipe is readable.\n");
sleep(1);
}
printf("Event loop terminated.\n");
}
This program installs a handler for SIGINT, then blocks SIGINT, sends SIGINT to itself (which will not be delivered yet because SIGINT is blocked), creates a pipe and writes some data into the pipe, and then monitors the read end of the pipe for readability.
This readability monitoring is done using pselect(), which is supposed to unblock SIGINT, which should then interrupt the pselect() and call the signal handler.
However, on Linux (I tested on 5.6 and 4.19), the pselect() call returns 1 instead and indicates readability of the pipe, without calling the signal handler. Since this test program does not read the data that was written to the pipe, the file descriptor will never cease to be readable, and the signal handler is never called. In real programs, a similar situation might arise under heavy load, where a lot of data might be available for reading on different file descriptors (e.g. sockets).
On the other hand, on FreeBSD (I tested on 12.1), the signal handler is called, and then pselect() returns -1 and sets errno to EINTR. This is what I expected to happen on Linux as well.
Am I misunderstanding something, or am I using these interfaces incorrectly? Or should I just fall back to the old self-pipe trick, which (I believe) would handle this case better?
This is a type of resource starvation caused by always checking for active resources in the same order. When resources are always checked in the same order, if the resources checked first are busy enough the resources checked later may never get any attention.
See What is starvation?.
The Linux implementation of pselect() apparently checks file descriptors before checking for signals. The BSD implementation does the opposite.
For what it's worth, the POSIX documentation for pselect() states:
If none of the selected descriptors are ready for the requested operation, the pselect() or select() function shall block until at least one of the requested operations becomes ready, until the timeout occurs, or until interrupted by a signal.
A strict reading of that description requires checking the descriptors first. If any descriptor is active, pselect() will return that instead of failing with errno set to EINTR.
In that case, if the descriptors are so busy that one is always active, the signal processing gets starved.
The BSD implementation likely starves active descriptors if signals come in too fast.
One common solution is to always process all active resources every time a select() call or similar returns. But you can't do that with your current design that mixes signals with descriptors because pselect() doesn't even get to checking for a pending signal if there are active descriptors. As #Shawn mentioned in the comments, you can map signals to file descriptors using signalfd(). Then add the descriptor from signalfd() to the file descriptor set passed to pselect().
Disclaimer: Absolute newbie in C, i was mostly using Java before.
In many C beginner tutorials, waitpid is used in process management examples to wait for its child processes to finish (or have a status change using options like WUNTRACED). However, i couldn't find any information about how to continue if no such status change occurs, either by direct user input or programmatic (e.g. timeout). So what is a good way to undo waitpid? Something like SIGCONT for stopped processes, but instead for processes delayed by waitpid.
Alternatively if the idea makes no sense, it would be interesting to know why.
How about if I suggest using alarm()? alarm() delivers SIGALRM after the countdown passes (See alarm() man page for more details). But from the signals man page, SIGALRM default disposition is to terminate the process. So, you need to register a signal handler for handling the SIGALRM. Code follows like this...
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void sigalrm(int signo)
{
return; // Do nothing !
}
int main()
{
struct sigaction act, oldact;
act.sa_handler = sigalrm; // Set the signal handler
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
#ifdef SA_INTERRUPT // If interrupt defined set it to prevent the auto restart of sys-call
act.sa_flags |= SA_INTERRUPT;
#endif
sigaction(SIGALRM, &act, &oldact);
pid_t fk_return = fork();
if (fk_return == 0) { // Child never returns
for( ; ; );
}
unsigned int wait_sec = 5;
alarm(wait_sec); // Request for SIGALRM
time_t start = time(NULL);
waitpid(-1, NULL, 0);
int tmp_errno = errno; // save the errno state, it may be modified in between function calls.
time_t end = time(NULL);
alarm(0); // Clear a pending alarm
sigaction(SIGALRM, &oldact, NULL);
if (tmp_errno == EINTR) {
printf("Child Timeout, waited for %d sec\n", end - start);
kill(fk_return, SIGINT);
exit(1);
}
else if (tmp_errno != 0) // Some other fatal error
exit(1);
/* Proceed further */
return 0;
}
OUTPUT
Child Timeout, waited for 5 sec
Note: You don't need to worry about SIGCHLD because its default disposition is to ignore.
EDIT
For the completeness, it is guaranteed that SIGALRM is not delivered to the child. This is from the man page of alarm()
Alarms created by alarm() are preserved across execve(2) and are not inherited by children created via fork(2).
EDIT 2
I don't know why it didn't strike me at first. A simple approach would be to block SIGCHLD and call sigtimedwait() which supports timeout option. The code goes like this...
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
sigset_t sigmask;
sigemptyset(&sigmask);
sigaddset(&sigmask, SIGCHLD);
sigprocmask(SIG_BLOCK, &sigmask, NULL);
pid_t fk_return = fork();
if (fk_return == 0) { // Child never returns
for( ; ; );
}
if (sigtimedwait(&sigmask, NULL, &((struct timespec){5, 0})) < 0) {
if (errno == EAGAIN) {
printf("Timeout\n");
kill(fk_return, SIGINT);
exit(1);
}
}
waitpid(fk_return, NULL, 0); // Child should have terminated by now.
/* Proceed further */
return 0;
}
OUTPUT
Timeout
The third argument to waitpid takes a set of flags. You want to include the WNOHANG flag, which tells waitpid to return immediately if no child process has exited.
After adding this option, you would sit in a loop a sleep for some period of time and try again if nothing has exited. Repeat until either a child has returned or until your timeout has passed.
Waiting for process to die on a typical Unix system is an absolute PITA. The portable way would be to use various signals to interrupt wait function: SIGALARM for timeout, SIGTERM/SIGINT and others for "user input" event. This relies on a global state and thus might be impossible to do.
The non-portable way would be to use pidfd_open with poll/epoll on Linux, kqueue with a EVFILT_PROC filter on BSDs.
Note that on Linux this allows waiting for a process to terminate, you will still have to retrieve status via waitid with P_PIDFD.
If you still want to mix in "user events", add signalfd to the list of descriptors on Linux or EVFILT_SIGNAL filter of kqueue on BSDs.
Another possible solution is to spawn a "process reaper" thread which is responsible for reaping of all processes and setting some event in a process object of your choice: futex word, eventfd etc. Waiting on such objects can be done with a timeout. This requires everyone to agree to use the same interface for process spawning which might or might not be reasonable. Afaik Java implementations use this strategy.
I have just written the below routine to handle the EINTR error.
The routine is given below,
while((s = sem_wait(&w4compl)) == -1)
{
if (errno == EINTR)
{
perror("call interrupted by sig. handler\n");
continue;
}
else
printf("Other Error Generated\n");
}
SO, here i am not able to see the print "call interrupted by sig. handler\n" statement. How can test this so that it will print the same(How can i execute the part of if (errno == EINTR)).
Install a signal handler, and cause a signal to be delivered (using alarm(), setitimer(), or timer_create()+timer_settime()), so that the delivery of the signal will interrupt the sem_wait() call.
Consider this example program:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <semaphore.h>
#include <stdio.h>
#include <errno.h>
static void dummy_handler(int signum)
{
}
static int install_dummy_handler(int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_handler = dummy_handler;
act.sa_flags = 0;
return sigaction(signum, &act, NULL);
}
static const char *errname(const int errnum)
{
switch (errnum) {
case EINTR: return "EINTR";
case EINVAL: return "EINVAL";
default: return "(other)";
}
}
int main(void)
{
sem_t s;
if (install_dummy_handler(SIGALRM) == -1) {
fprintf(stderr, "Cannot install ARLM signal handler: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
sem_init(&s, 0, 0);
alarm(1);
if (sem_wait(&s) == -1) {
const int errnum = errno;
printf("sem_wait() failed with errno == %s (%d; %s).\n",
errname(errnum), errnum, strerror(errnum));
} else
printf("sem_wait() succeeded.\n");
return EXIT_SUCCESS;
}
In main(), we install a signal handler for the SIGALRM signal. It does not matter if the signal handler function does anything at all, because it is the delivery of the signal that causes "slow" syscalls to return with EINTR error. (As long as the SA_RESTART flag was not used when that handler was installed. If you look at act.sa_mask in install_dummy_handler(), you'll see we used no flags at all. All the flags and sigaction() usage are described in the man 2 sigaction man page.)
In main(), we first initialize our semaphore, then set an alarm for one second. When the real, wall-clock time has elapsed, the SIGALRM signal is raised.
Do note that although SIGALRM is just fine for this example and similar purposes, you'll probably want to use POSIX per-process interval timers instead.
Next, we simply call sem_wait() on the semaphore, and examine the result. In practice, if you compile and run the above example.c using e.g.
gcc -Wall -O2 example.c -lpthread -o example
./example
the program will output
sem_wait() failed with errno == EINTR (4; Interrupted system call).
after one second.
Just about any system call on Linux can return EINTR if the system call is interrupted.
From the man page (emphasis mine):
sem_wait() decrements (locks) the semaphore pointed to by sem. If
the semaphore's value is greater than zero, then the decrement
proceeds, and the function returns, immediately. If the semaphore
currently has the value zero, then the call blocks until either it
becomes possible to perform the decrement (i.e., the semaphore value
rises above zero), or a signal handler interrupts the call.
To trigger this case, you should make sure that the sem_wait system call is blocked (waiting), and then send a signal (which has a handler) to the thread.
Some psuedo-code:
sigint_handler:
return
thread2:
<Your while loop from the question>
main:
signal(SIGINT, sigint_handler) // Setup signal handler
sem_wait(&w4compl)
t2 = start_thread(thread2)
sleep(5) // Hack to make sure thread2 is blocked
pthread_kill(t2, SIGINT)