Why does ignoring SIGCONT still make a process continue? - c

This is my code, ignoring SIGCONT:
int main() {
signal(SIGCONT, SIG_IGN);
while(1);
}
This is what happens:
> ./main &
[1] 25093
> kill -STOP 25093
[1]+ Stopped ./main
> ps -aux | grep 25093
xxx 25093 98.6 0.0 2488 872 pts/8 T 18:23 0:20 ./main
> kill -CONT 25093
> ps -aux | grep 25093
xxx 25093 52.1 0.0 2488 872 pts/8 R 18:23 0:28 ./main
It seems that SIGCONT still made my process continue. Does it mean that a handler of SIGCONT is just a "side-effect"?
I wonder at what time SIGCONT makes the process run again? At what time is the process put into the dispatch queue again? Is it when the kill syscall is performed or when the process is going to be dispatched? (I read a passage about Linux signals that indicates that dispatch code doesn't treat SIGCONT specially. The code segment is showed below.)
if (ka->sa.sa_handler == SIG_DFL) {
int exit_code = signr;
/* Init gets no signals it doesn't want. */
if (current->pid == 1)
continue;
switch (signr) {
case SIGCONT: case SIGCHLD: case SIGWINCH:
continue;
case SIGTSTP: case SIGTTIN: case SIGTTOU:
if (is_orphaned_pgrp(current->pgrp))
continue;
/* FALLTHRU */
case SIGSTOP:
current->state = TASK_STOPPED;
current->exit_code = signr;
if (!(current->p_pptr->sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
notify_parent(current, SIGCHLD);
schedule();
continue;
case SIGQUIT: case SIGILL: case SIGTRAP:
case SIGABRT: case SIGFPE: case SIGSEGV:
case SIGBUS: case SIGSYS: case SIGXCPU: case SIGXFSZ:
if (do_coredump(signr, regs))
exit_code |= 0x80;
/* FALLTHRU */
default:
sigaddset(&current->pending.signal, signr);
recalc_sigpending(current);
current->flags |= PF_SIGNALED;
do_exit(exit_code);
/* NOTREACHED */
}
}
...
handle_signal(signr, ka, &info, oldset, regs);
return 1;
}
...
return 0;

This is the intended behavior of SIGCONT according to the POSIX standard. Quoting from POSIX.1-2017 chapter 2, section 2.4 "Signal Concepts", subsection 2.4.1:
When SIGCONT is generated for a process that is stopped, the process shall be continued, even if the SIGCONT signal is ignored by the process or is blocked by all threads within the process and there are no threads in a call to a sigwait() function selecting SIGCONT. If SIGCONT is blocked by all threads within the process, there are no threads in a call to a sigwait() function selecting SIGCONT, and SIGCONT is not ignored by the process, the SIGCONT signal shall remain pending on the process until it is either unblocked by a thread or a thread calls a sigwait() function selecting SIGCONT, or a stop signal is generated for the process or any of the threads within the process.
You cannot prevent SIGCONT from resuming execution of your process. The most you can do is block its delivery, meaning that if you add SIGCONT to the set of blocked signals your process will not "notice" it (registered handlers will not run until unblocked), but it will nonetheless resume execution.
In Linux, the "continuing" action of SIGCONT is performed right away on signal generation, i.e. during the kill syscall. This is done before even checking if the signal is blocked or ignored. The code responsible for this is in prepare_signal():
/*
* Handle magic process-wide effects of stop/continue signals. Unlike
* the signal actions, these happen immediately at signal-generation
* time regardless of blocking, ignoring, or handling. This does the
* actual continuing for SIGCONT, but not the actual stopping for stop
* signals. The process stop is done as a signal action for SIG_DFL.
*
* Returns true if the signal should be actually delivered, otherwise
* it should be dropped.
*/
static bool prepare_signal(int sig, struct task_struct *p, bool force)
{
// ...
} else if (sig == SIGCONT) {
unsigned int why;
/*
* Remove all stop signals from all queues, wake all threads.
*/
siginitset(&flush, SIG_KERNEL_STOP_MASK);
flush_sigqueue_mask(&flush, &signal->shared_pending);
for_each_thread(p, t) {
flush_sigqueue_mask(&flush, &t->pending);
task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
if (likely(!(t->ptrace & PT_SEIZED))) {
t->jobctl &= ~JOBCTL_STOPPED;
wake_up_state(t, __TASK_STOPPED);
} else
ptrace_trap_notify(t);
}
// ...
}

Related

SIGTERM signal from parent does not invoke signal handler in child?

I'm writing a program where both the child process and the parent process can send a SIGTERM signal to the child.
The signal handler is something like this:
void custom_signal_handler(int signum, siginfo_t* info, void* ptr) {
if (signum == SIGTERM) {
printf("1\n");
}
else if (signum == SIGCONT) {
printf("2\n");
}
}
(I have simplified the printing in the ifs to keep the code here simpler).
For the SIGCONTsignal - only the parent can call this signal with kill(childPid, SIGCONT). When this is happening, the signal handler for the child prints the "2" as intended.
However, for the SIGTERM signal - both the parent can invoke it by sending kill(childPid, SIGTERM) and the child by calling raise(SIGTERM). The problem is that "1" is printed only when the child raises the SIGTERM signal, but not when the parent calls it.
I have regiestered the signal handler to the child:
// set up signal handler
struct sigaction custom_action;
memset(&custom_action, 0, sizeof(custom_action));
custom_action.sa_sigaction = custom_signal_handler;
custom_action.sa_flags = SA_SIGINFO;
// assign signal handlers
if (0 != sigaction(SIGCONT, &custom_action, NULL)) {
printf("Signal registration failed: %s\n",strerror(errno));
return -1;
}
if (0 != sigaction(SIGTERM, &custom_action, NULL)) {
printf("Signal registration failed: %s\n",strerror(errno));
return -1;
}
Any ideas? Thanks!
In a comment to the question, OP states
I am sending the SIGTERM from the parent while the relevant child is at "raise(SIGSTOP)". I think that because the child is in SIGSTOP it doesn't run the signal handler.
Correct. When a process is stopped, it does not receive signals other than SIGCONT and SIGKILL (plus SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU are ignored). All other signals should become pending, delivered when the process is continued. (Standard POSIX signals are not queued, though, so you can rely on only one standard POSIX signal becoming pending.)
However, I do need to send the SIGTERM only when the child is in SIGSTOP, without sending SIGCONT before.
The target process will receive SIGTERM only after it is continued. That is how stopped processes behave.
Is there a workaround?
Perhaps; it depends on the requirements. But do note that your intended use case involves behaviour that does not comply with POSIX (i.e., you want a stopped process to react to something other than just being continued or killed outright); and that is the direct reason for the problems you have encountered.
The simplest is to use a variant of SIGCONT instead of SIGTERM, to control the terminating of the process; for example, via sigqueue(), providing a payload identifier that tells the SIGCONT signal handler to treat it as a SIGTERM signal instead (and thus distinguishing between normal SIGCONT signals, and those that are stand-ins for SIGTERM).
A more complicated one is to have the process fork a special monitoring child process, that regularly sends special "check for pending SIGTERM signals" SIGCONT signals, and dies when the parent dies. The child process can be connected to the parent via a pipe (parent having the write end, child the read end), so that when the parent dies, a read() on the child end returns 0, and the child can exit too. The parent process SIGCONT handler just needs to detect if the signal was sent by the child process — the si_pid field of the siginfo_t structure should only match the child process ID if sent by the child —, and if so, check if a SIGTERM is pending, handle it if yes; otherwise just raise SIGSTOP. This approach is very fragile, due to the many possibilities of race windows — especially raising SIGSTOP just after receiving SIGCONT. (Blocking SIGCONT in the signal handler is essential. Also, the monitoring child process should probably be in a separate process group, not attached to any terminal, to avoid being stopped by a SIGSTOP targeted at the entire process group.)
Note that one should only use async-safe functions in signal handlers, and retain errno unchanged, to keep everything working as expected.
For printing messages to standard error, I often use
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
static int wrerr(const char *msg)
{
const int saved_errno = errno;
const char *end = msg;
ssize_t count;
int retval = 0;
/* Find end of string. strlen() is not async-signal safe. */
if (end)
while (*end)
end++;
while (msg < end) {
count = write(STDERR_FILENO, msg, (size_t)(end - msg));
if (count > 0)
msg += count;
else
if (count != -1) {
retval = EIO;
break;
} else
if (errno != EINTR) {
retval = errno;
break;
}
}
errno = saved_errno;
return retval;
}
which not only is async-signal safe, but also keeps errno unchanged. It returns 0 if success, and an errno error code otherwise.
If we expand the prints a bit for clarity, OP's custom signal handler becomes for example
void custom_signal_handler(int signum, siginfo_t* info, void* context) {
if (signum == SIGTERM) {
wrerr("custom_signal_handler(): SIGTERM\n");
} else
if (signum == SIGCONT) {
wrerr("custom_signal_handler(): SIGCONT\n");
}
}
Do note that when this is used, ones program should not use stderr (from <stdio.h>) at all, to avoid confusion.

Abort function in C

Program 1:
#include<stdio.h>
#include<signal.h>
void handler(int sig);
void main()
{
printf("PID: %d\n",getpid());
signal(SIGABRT,handler);
while(1){
printf("Hai\n");
sleep(1);
abort();
}
}
void handler(int sig)
{
printf("Signal handled\n");
}
Output 1:
$ ./a.out
PID: 32235
Hai
Signal handled
Aborted (core dumped)
$
As per the reference, the abort function works like raise(SIGABRT). So, the signal generated by abort() function is SIGABRT. So for that I created the above program.
In that program, SIGABRT signal is handled. After the execution of signal handler, it doesn't return to the main function from where it is called. Why does it not return to the main function after the handler is completed?
Program 2:
#include<stdio.h>
#include<signal.h>
void handler(int sig);
void main()
{
printf("PID: %d\n",getpid());
signal(SIGABRT,handler);
while(1){
printf("Hai\n");
sleep(1);
}
}
void handler(int sig)
{
printf("Signal handled\n");
}
Output 2:
$ ./a.out
PID: 32247
Hai
Hai
Hai
Signal handled
Hai
Signal handled
Hai
Hai
^C
$
Unlike program 1, program 2 executes as expected. In the above program, the signals are sent to the process via command line through the kill command as shown below.
$ kill -6 32247
$ kill -6 32247
So once the signal occurred, the handler function executed and then it returns to the main function. But it does not happen in program 1. Why does it behave like this? The abort function and SIGABRT are different?
See this piece of documentation from man 3 abort:
This results in the abnormal termination of the process unless the SIGABRT signal is caught and the signal handler does not return (see longjmp(3)).
And also this:
If the SIGABRT signal is ignored, or caught by a handler that returns, the abort() function will still terminate the process. It does this by restoring the default disposition for SIGABRT and then raising the signal for a second time.
So the only way you can prevent abort() from aborting your program is by longjmp()-ing from the signal handler.
Libc implements abort(). In their implementation, abort() checks to see if the process is still alive, because abort() is executing after the raise(SIGABRT). If it is, then it knows that the user has handled SIGABRT. According to the documentation, it doesn't matter, because the process will still exit:
You can see the exact implementation in the GLIBC source code (stdlib/abort.c):
/* Cause an abnormal program termination with core-dump. */
void
abort (void)
{
struct sigaction act;
sigset_t sigs;
/* First acquire the lock. */
__libc_lock_lock_recursive (lock);
/* Now it's for sure we are alone. But recursive calls are possible. */
/* Unlock SIGABRT. */
if (stage == 0)
{
++stage;
if (__sigemptyset (&sigs) == 0 &&
__sigaddset (&sigs, SIGABRT) == 0)
__sigprocmask (SIG_UNBLOCK, &sigs, (sigset_t *) NULL);
}
/* Flush all streams. We cannot close them now because the user
might have registered a handler for SIGABRT. */
if (stage == 1)
{
++stage;
fflush (NULL);
}
/* Send signal which possibly calls a user handler. */
if (stage == 2)
{
/* This stage is special: we must allow repeated calls of
`abort' when a user defined handler for SIGABRT is installed.
This is risky since the `raise' implementation might also
fail but I don't see another possibility. */
int save_stage = stage;
stage = 0;
__libc_lock_unlock_recursive (lock);
raise (SIGABRT);
__libc_lock_lock_recursive (lock);
stage = save_stage + 1;
}
/* There was a handler installed. Now remove it. */
if (stage == 3)
{
++stage;
memset (&act, '\0', sizeof (struct sigaction));
act.sa_handler = SIG_DFL;
__sigfillset (&act.sa_mask);
act.sa_flags = 0;
__sigaction (SIGABRT, &act, NULL);
}
/* Now close the streams which also flushes the output the user
defined handler might has produced. */
if (stage == 4)
{
++stage;
__fcloseall ();
}
/* Try again. */
if (stage == 5)
{
++stage;
raise (SIGABRT);
}
/* Now try to abort using the system specific command. */
if (stage == 6)
{
++stage;
ABORT_INSTRUCTION;
}
/* If we can't signal ourselves and the abort instruction failed, exit. */
if (stage == 7)
{
++stage;
_exit (127);
}
/* If even this fails try to use the provided instruction to crash
or otherwise make sure we never return. */
while (1)
/* Try for ever and ever. */
ABORT_INSTRUCTION;
}
The abort function sends the SIGABRT signal that's true, but it doesn't matter if you catch (or ignore) that signal, the abort function will still exit your process.
From the linked manual page:
RETURN VALUE
The abort() function never returns.
According to the standard it's not entirely specified what should happen if you handle SIGABRT:
The abort function causes abnormal program termination to occur,
unless the signal SIGABRT is being caught and the signal handler does
not return. Whether open streams with unwritten buffered data are
flushed, open streams are closed, or temporary files are removed is
implementation-defined. An implementation-defined form of the status
unsuccessful termination is returned to the host environment by means
of the function call raise(SIGABRT) .
However it's specified what should not happen:
The abort function does not return to its caller.
So the correct behavior is to ensure that an "abnormal termination" occurs. This ensured by the abort function doing it's very best to terminate the program abnormally, it does this by trying to terminate in various ways and if nothing seem to do the trick it enters an infinite loop (and at least ensure that it does not return to the caller).
They are not the same. The abort function calls raise(SIGABRT) twice. If you defined a handler for the SIGABRT, it will call your handler first and call the default one after that.

Proper signal handling and interrupts

Have a question regarding the interruption of a running process that listens to signals. Below is my handler. SIGHUP is used for reloading my config file, SIGCHLD is used to waitpid with nohang on a process it spawns and the others to terminate the process.
void sig_handler( int sig, siginfo_t *siginfo, void *ucontext )
{
if ( sig == SIGHUP ) {
reload = 1;
} else if( sig == SIGCHLD) {
// TODO
} else if ( sig == SIGTERM || sig == SIGKILL || sig == SIGINT ) {
done = 1;
}
}
do {
if(reload) {
// opening files, doing file descriptor stuff
... // processing...
**SIHUP OCCURS! WHAT HAPPENS?** <<<<<<<<<<< Line: 505 <<<<<<<<<<<<<<<<<<
... // processing...
}
} while(!done);
My current understanding of signals:
signal occurs
complete current operation on line 505. ie: open("t.txt");
run signal handler code to completion
return to line 505 and continue
What I am worried about:
signal occurs
break out of current code
run signal handler code to completion
continue from break out code
Questions:
Should I enhance my code to block SIGHUP, SIGTERM, SIGCHLD while reloading the config so that I don't have unstable code if a signal occurs? Or is that over-design? (Assuming it doesn't resume after)
Say I am in the signal handler for a SIGHUP but then a SIGHUP signal occurs, what happens? (I assume it queues them and runs the handler twice)
Thanks!
Actually, if a signal occurs, your current operation isn't necessarily going to finish before the signal handler is called. However, upon completion of the signal handler, your code resumes from exactly where it was when the signal interrupted. And since all your signal handler does is set a flag variable, there's no effect on the code that currently in middle of whatever else it's doing.
Answers:
Why bother? Your code does resume after, and any properly designed signal handler isn't going to destabilize the code.
Documentation seems to indicate that the handling of the second signal will be deferred until the first handler completes. See this question for details.

How to handle/kill Zombie or <defunct> process for cgi script (C++) on "mongoose webserver" (Linux)?

I have a CGI script running on "mongoose webserver" written in C++ (independent of mongoose specific APIs for portability in the future) on Ubuntu 10.04. Whenever I invoke the script from web browser (Chrome), the process works fine but when I run ps -al I see
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 3567 8877 0 80 0 - 23309 hrtime pts/0 00:00:00 mongoose
4 Z 0 3585 3567 7 80 0 - 0 exit pts/0 00:00:00 test <defunct>
I use sudo kill -9 3567 in this case to kill the parent process. I have the following code in my script.
...
#include <sys/wait.h>
...
//==========================================================================
// Define the function to be called when ctrl-c (SIGINT) signal is sent to process
static void signal_callback_handler(int signum)
{
point_of_inspection( __FILE__, __func__, __LINE__, ENABLE_LOG); // Entered the routine
// Cleanup and close up stuff here
while(1)
{
if (signum == SIGTERM)
{
error_log_report("caught signal - premature exit",CAUGHT_SIGNAL_ERROR,ENABLE_LOG);
break;
}
}
clean_exit();
// Terminate program
exit(signum);
}
//======================= Zombies or <defunct> handler ========================
// Signal handler to process terminated children
static void mysig(int nsig)
{
int nStatus, nPid;
while(1)
{
if (nsig == SIGCHLD)
{
nPid = waitpid(-1, &nStatus, WNOHANG);
if(nPid<0)
{
error_log_report("waitpid (nPid<0)",CAUGHT_SIGNAL_ERROR,ENABLE_LOG);
break;
}
if(nPid==0)
{
error_log_report("Caught Signal - Zombies <defunct> (nPid==0)",CAUGHT_SIGNAL_ERROR,ENABLE_LOG);
break;
}
}
}
clean_exit();
exit(nsig);
}
In the main function
int main()
{
//some initialization variables
...
// Register signal and signal handler
signal(SIGTERM, signal_callback_handler);
// To clean up terminated children
signal(SIGCHLD, mysig);
...
return 0;
}
However, It seems to not catch any signal triggered when the user closes web browser or navigates to a different page as I do not see any logs. I am wondering if this is a bug in mongoose or my script (I do not use any fork() process or threads in my script. But mongoose does use threads. Also I do not use any mongoose webserver specific APIs in my script.).
I am referring the signal tutorial from here http://orchard.wccnet.org/~chasselb/linux275/ClassNotes/process/sigbasics.htm and
http://www.gnu.org/s/hello/manual/libc/Process-Completion.html
They updated the code in mongoose.c file to reap zombies. The following is the portion of the code.
#if !defined(_WIN32) && !defined(__SYMBIAN32__)
// Ignore SIGPIPE signal, so if browser cancels the request, it
// won't kill the whole process.
(void) signal(SIGPIPE, SIG_IGN);
// Also ignoring SIGCHLD to let the OS to reap zombies properly.
(void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32
zombie processes in Unix are processes which are terminated but not waited yet by the parent process. Their presence should be temporary or is denoting a bug in the parent, in the present case mongoose.

sigprocmask() issue

The code below is from the book Advanced programming in unix environment, W. Richard Stevens
And about this code book says;
"If the signal is sent to the process while it is blocked, the signal delivery will be deferred until the signal is unblocked. To the application, this can look as if the signal occurs between the unblocking and the pause (depending on how the kernel implements signals). If this happens, or if the signal does occur between the unblocking and the pause, we have a problem. Any occurrence of the signal in this window of time is lost in the sense that we might not see the signal again, in which case the pause will block indefinitely. This is another problem with the earlier unreliable signals."
And it recommands to use sigsuspend() before resetting signal mask instead of pause() since it resets the signal mask and put the process to sleep in a single atomic operation. But I don't want my process wait until signal came after stepping out of critical region. So is this problem valid for my case too? If so what should i use not to lose signal while reseting signal mask with sigprocmask()?
sigset_t newmask, oldmask;
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
/* critical region of code */
/* reset signal mask, which unblocks SIGINT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
/* window is open */
pause(); /* wait for signal to occur */
/* continue processing */
sigsuspend is used in order to avoid a course between sigprocmask and pause. If you don't need to halt your thread until a signal is received, there is nothing sigsuspend can do for you. You don't give enough information to know if there are other sources of trouble in your context or not.

Resources