I am currently trying to write a program that calls fork() to spawn a child process which sends a random number of signals to the parent process. Both the child and the parent process should show the same number, but I have an issue with blocking the signals when incrementing the counter.
I tried multiple methods of blocking the signals but I have failed. Anybody with a suggestion? Thanks a lot.
int nreceived = 0;
void handler(int sig)
{
nreceived++;
signal(SIGUSR1,handler);
}
int main()
{
int nsignals;
pid_t pid;
srand(time(NULL));
nsignals = rand() % 256;
signal(SIGUSR1,handler);
if((pid = fork()) > 0)
{
wait(NULL);
printf("Received %d signals from process %d\n",nreceived,pid);
}
else if (pid == 0)
{
for(int i = 0; i < nsignals; i++)
kill(getppid(),SIGUSR1);
printf("Sent %d signals to process %d\n", nsignals, getppid());
}
return 0;
}
As discussed extensively in the comments, it is important to use POSIX function sigaction() rather than the standard C function signal() because there are many implementation-defined aspects to signal() (primarily because there were many divergent implementations before the C standard was created, and the standard tried to accommodate existing implementations without breaking any of them).
However, the system is not obligated to queue signals that are not real-time signals (signal numbers in the range SIGRTMIN..SIGRTMAX). SIGUSR1 is not a real-time signal. Frankly, even with signal queueing, I'm not sure whether implementations would handle up to 255 pending signals of a specific type for a process — it isn't an area I've experimented with.
This is the best code I was able to come up with:
#include <assert.h>
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#ifndef SEND_SIGNAL
#define SEND_SIGNAL SIGUSR1
#endif
static const char *arg0;
static volatile sig_atomic_t nreceived = 0;
static _Noreturn void err_syserr(const char *syscall);
static void handler(int sig)
{
assert(sig == SEND_SIGNAL);
nreceived++;
}
int main(int argc, char **argv)
{
if (argc != 1)
{
fprintf(stderr, "Usage: %s\n", argv[0]);
exit(EXIT_FAILURE);
}
arg0 = argv[0];
struct sigaction sa = { .sa_handler = handler, .sa_flags = SA_RESTART };
/* Block all blockable signals */
if (sigfillset(&sa.sa_mask) != 0)
err_syserr("sigfillset");
if (sigaction(SEND_SIGNAL, &sa, 0) != 0)
err_syserr("sigaction");
pid_t pid = fork();
if (pid > 0)
{
int status;
int corpse = wait(&status);
if (corpse != -1)
printf("Child process %d exited with status 0x%.4X\n", corpse, status);
else
fprintf(stderr, "%s: wait() failed: (%d) %s\n", argv[0], errno, strerror(errno));
printf("Caught %d signals from process %d\n", nreceived, pid);
}
else if (pid == 0)
{
srand(time(NULL));
int nsignals = rand() % 256;
for (int i = 0; i < nsignals; i++)
kill(getppid(), SEND_SIGNAL);
printf("Sent %d signals to process %d\n", nsignals, getppid());
}
else
err_syserr("fork");
return 0;
}
static _Noreturn void err_syserr(const char *syscall)
{
fprintf(stderr, "%s: %s() failed: (%d) %s\n", arg0, syscall, errno, strerror(errno));
exit(EXIT_FAILURE);
}
When run as program sig53 (source code sig53.c) on a Mac running macOS Monterey 12.3.1, I got variable numbers of signals received:
$ sig53
Sent 50 signals to process 37973
Child process 37974 exited with status 0x0000
Caught 14 signals from process 37974
$: sig53
Sent 39 signals to process 38442
Child process 38443 exited with status 0x0000
Caught 16 signals from process 38443
$: sig53
Sent 28 signals to process 38478
Child process 38479 exited with status 0x0000
Caught 6 signals from process 38479
$
Sometimes, the number received reached near 100, but never very near to all the signals sent.
YMMV on Linux. There may be alternative mechanisms for handling signals on Linux. But for portable code, sending a myriad signals to a single process at full tilt is not a reliable way of communicating between processes. Some of the signals will be delivered, but it may not be all of them.
Related
I'm trying to communicate between two processes in C using a pipe. Everything works fine until it is supposed to print "hi\n". The output is
(8841) Child here stopping self
(8841) SAYS: 19
DATA WRITED
C: 8
(8841) CONTINUING
This is a simplified version of the program. I know for a fact the reading part works, but it seems that the writing call does not, because it never prints "hi\n". Any clues on why is that?
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
volatile sig_atomic_t sigchld = 0;
void sigchldHandler(){
sigchld = 1;
return;
}
int main(){
sigset_t mask,prev;
signal(SIGCHLD, sigchldHandler);
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
int pid = fork();
int fd[2];
pipe(fd);
sigprocmask(SIG_BLOCK, &mask, &prev);
if (pid == 0){
dup2(STDIN_FILENO,fd[0]);
printf("(%d) Child here stopping self\n",getpid());
raise(SIGSTOP);
printf("(%d) CONTINUING\n",getpid());
char* hello = malloc(sizeof("hi\n"));
read(STDIN_FILENO,hello,sizeof("hi\n"));
printf("%s",hello);
exit(0);
}
sleep(0.1);
sigprocmask(SIG_SETMASK, &prev,NULL);
while(1){
if (sigchld){
int status;
int p = waitpid(-1,&status,WNOHANG|WUNTRACED);
if (WIFSTOPPED(status)){
if (WSTOPSIG(status) == SIGSTOP){
printf("(%d) SAYS: %d\n",p, WSTOPSIG(status));
kill(pid,SIGCONT);
printf("DATA WRITED\n");
char* h = "hi\n";
int c=write(fd[1],h,sizeof(h));
printf("C: %i\n",c);
break;
}
}
sigchld = 0;
}
}
}
Primary problem
Your key problem is that you call pipe() after you've called fork(). That means the two processes have completely separate pipes; they are not talking to each other.
Secondary issues
There are other issues too, of course.
You have (in the parent): int c=write(fd[1],h,sizeof(h));. You're writing 8 bytes (your output includes C: 8 because the variable h is a pointer of size 8 (you're on a 64-bit system). However, the string only points to 4 bytes — you should be using strlen() or thereabouts to limit the amount of data written.
You aren't closing enough file descriptors for comfort.
You have the arguments to dup2() reversed. This too is crucial.
It seems weird to be using dynamic allocation for just 4 bytes of data, but it should work.
You should print the PID along with the value in hello in the child (for consistency, if nothing else). It's good you do that with the other printing.
The parent should probably wait for the child after the loop (after closing the pipe).
The sleep() function takes an integer; calling sleep(0.1) sleeps for zero seconds. For sub-second sleeping, you need nanosleep() or maybe. usleep() (older, no longer part of POSIX, but widely available and easier to use).
Here's working code:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
volatile sig_atomic_t sigchld = 0;
static void sigchldHandler(int signum)
{
sigchld = signum;
}
int main(void)
{
sigset_t mask, prev;
signal(SIGCHLD, sigchldHandler);
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
int fd[2];
pipe(fd);
int pid = fork();
sigprocmask(SIG_BLOCK, &mask, &prev);
if (pid == 0)
{
/* Child */
dup2(fd[0], STDIN_FILENO);
close(fd[0]);
close(fd[1]);
printf("(%d) Child here stopping self\n", getpid());
raise(SIGSTOP);
printf("(%d) CONTINUING\n", getpid());
char *hello = malloc(sizeof("hi\n"));
int nbytes = read(STDIN_FILENO, hello, sizeof("hi\n"));
printf("(%d) received %d bytes: %.*s\n", getpid(), nbytes, nbytes, hello);
exit(0);
}
/* Parent */
close(fd[0]);
nanosleep(&(struct timespec){.tv_sec = 0, .tv_nsec = 100000000}, NULL);
sigprocmask(SIG_SETMASK, &prev, NULL);
while (1)
{
if (sigchld)
{
int status;
int p = waitpid(-1, &status, WNOHANG | WUNTRACED);
if (WIFSTOPPED(status))
{
if (WSTOPSIG(status) == SIGSTOP)
{
printf("(%d) SAYS: %d\n", p, WSTOPSIG(status));
kill(pid, SIGCONT);
char *h = "hi\n";
int c = write(fd[1], h, strlen(h));
printf("DATA WRITTEN: %i\n", c);
close(fd[1]);
break;
}
}
sigchld = 0;
}
}
int corpse;
int status;
while ((corpse = wait(&status)) > 0)
printf("PID %d exited with status 0x%.4X\n", corpse, status);
return 0;
}
Sample output:
(66589) Child here stopping self
(66589) SAYS: 17
DATA WRITTEN: 3
(66589) CONTINUING
(66589) received 3 bytes: hi
PID 66589 exited with status 0x0000
The difference between 17 (on a Mac running macOS Mojave 10.14.6) and 19 (on a Linux box) is normal; the actual values for signal numbers is not standardized by POSIX (though signals 1 SIGHUP through 15 SIGTERM are the same across systems because they were standard in 7th Edition Unix).
I'm creating several child processes which send a signal to their parent process and die. I simply count them. But I never get the right count. Some signals never get caught by the handler.
How should I code this?
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int ended = 0;
void handler(int sig){
ended++;
}
int main(int argc, char **argv){
int i;
pid_t pid, ppid;
if (signal(SIGUSR1, handler) < 0) {
fprintf(stderr, "signal failed.\n");
exit (-1);
}
ppid = getpid();
for (i = 0; i < 50; i++){
if ((pid = fork()) < 0){
fprintf(stderr, "fork failed.\n");
exit(-1);
}
if (pid == 0){
kill(ppid, SIGUSR1);
exit(0);
}
}
while (wait(NULL) > 0);
printf("ended = %d\n", ended);
return 0;
}
The output for this program is sometimes 47, others 39... but never 50
The problem here is that a signal acts as a hardware interruption where your handler function would be the ISR (Interrupt Service Routine). Then if multiple signals of the same value happens "at the same time" linux kernel treat them as only one signal. Signal are not designed to be used in this manner. A signal should be used to inform of the state of a process to another. To achieve communications between processes you should use IPC (InterProcess Communications) mechanisms such as queue, sockets, or pipes.
Thanks,
I found the problem can be solved using Real Time Signals. Just changing SIGUSR1 with SIGRTMIN. Real Time Signals are queued (http://man7.org/linux/man-pages/man7/signal.7.html).
Are there any negative side effects in this solution?
it was working..
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int ended = 0;
void handler(int sig)
{
ended++;
}
int main(int argc, char **argv)
{
int i;
pid_t pid, ppid;
if (signal(SIGUSR1, handler) < 0)
{
fprintf(stderr, "signal failed.\n");
exit (-1);
}
ppid = getpid();
for (i = 0; i < 50; i++)
{
if ((pid = fork()) < 0)
{
fprintf(stderr, "fork failed.\n");
exit(-1);
}
if (pid == 0)
{
kill(ppid, SIGUSR1);
exit(0);
}
}
while (wait(NULL) > 0);
printf("ended = %d\n", ended);
return 0;
}
Main program: Start a certain amount of child processes then send SIGINT right away.
int main()
{
pid_t childs[CHILDS];
char *execv_argv[3];
int n = CHILDS;
execv_argv[0] = "./debugging_procs/wait_time_at_interrupt";
execv_argv[1] = "2";
execv_argv[2] = NULL;
for (int i = 0; i < n; i++)
{
childs[i] = fork();
if (childs[i] == 0)
{
execv(execv_argv[0], execv_argv);
if (errno != 0)
perror(strerror(errno));
_exit(1);
}
}
if (errno != 0)
perror(strerror(errno));
// sleep(1);
for (int i = 0; i < n; i++)
kill(childs[i], SIGINT);
if (errno != 0)
perror(strerror(errno));
// Wait for all children.
while (wait(NULL) > 0);
return 0;
}
Forked program: Wait for any signal, if SIGINT is sent, open a certain file and write SIGINT and the current pid to it and wait the amount specified of seconds (in this case, I send 2 from the main program).
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
void sigint_handler(int signum)
{
int fd = open("./aux/log1", O_WRONLY | O_APPEND);
char buf[124];
(void)signum;
sprintf(buf, "SIGINT %d\n", getpid());
write(fd, buf, strlen(buf));
close(fd);
}
int main(int argc, char **argv)
{
int wait_time;
wait_time = (argv[1]) ? atoi(argv[1]) : 5;
signal(SIGINT, &sigint_handler);
// Wait for any signal.
pause();
sleep(wait_time);
return 0;
}
The problem is, that the log file that the children should write, doesn't have n lines, meaning that not all children wrote to it. Sometimes nobody writes anything and the main program doesn't wait at all (meaning that sleep() isn't called in this case).
But if I uncomment sleep(1) in the main program, everything works just as I expected.
I suspect that the child processes don't get enough time to listen to SIGINT.
The program I'm working on is a task control and when I run a command like:
restart my_program; restart my_program I get an unstable behaviour. When I call restart, a SIGINT is sent, then a new fork() is called then another SIGINT is sent, just like the example above.
How can I make sure all children will parse SIGINT without the sleep(1) line? I'm testing my program if it can handle programs that don't exit right away after SIGINT is sent.
If I add for example, printf("child process started\n"); at the top of the child program, it doesn't get printed and the main program doesn't wait for anything, unless I sleep for a second. This happens even with only 1 child process.
Everything is working as it should. Some of your child processes get killed by the signal, before they set up the signal handler, or even before they start executing the child binary.
In your parent process, instead of just wait()ing until there are no more child processes, you could examine the identity and exit status of each of the processes reaped. Replace while (wait(NULL) > 0); with
{
pid_t p;
int status;
while ((p = wait(&status)) > 0) {
if (WIFEXITED(status))
printf("Child %ld exit status was %d.\n", (long)p, WEXITSTATUS(status));
else
if (WIFSIGNALED(status))
printf("Child %ld was killed by signal %d.\n", (long)p, WTERMSIG(status));
else
printf("Child %ld was lost.\n", (long)p);
fflush(stdout);
}
}
and you'll see that the "missing" child processes were terminated by the signals. This means that the child process was killed before it was ready to catch the signal.
I wrote my own example program pairs, with complete error checking. Instead of a signal handler, I decided to use sigprocmask() and sigwaitinfo(), just to show another way to do the same thing (and to not be limited to async-signal safe functions in a signal handler).
parent.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
const char *signal_name(const int signum)
{
static char buffer[32];
switch (signum) {
case SIGINT: return "INT";
case SIGHUP: return "HUP";
case SIGTERM: return "TERM";
default:
snprintf(buffer, sizeof buffer, "%d", signum);
return (const char *)buffer;
}
}
static int compare_pids(const void *p1, const void *p2)
{
const pid_t pid1 = *(const pid_t *)p1;
const pid_t pid2 = *(const pid_t *)p2;
return (pid1 < pid2) ? -1 :
(pid1 > pid2) ? +1 : 0;
}
int main(int argc, char *argv[])
{
size_t count, r, i;
int status;
pid_t *child, *reaped, p;
char dummy;
if (argc < 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s COUNT PATH-TO-BINARY [ ARGS ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will fork COUNT child processes,\n");
fprintf(stderr, "each child process executing PATH-TO-BINARY.\n");
fprintf(stderr, "Immediately after all child processes have been forked,\n");
fprintf(stderr, "they are sent a SIGINT signal.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
if (sscanf(argv[1], " %zu %c", &count, &dummy) != 1 || count < 1) {
fprintf(stderr, "%s: Invalid count.\n", argv[1]);
return EXIT_FAILURE;
}
child = malloc(count * sizeof child[0]);
reaped = malloc(count * sizeof reaped[0]);
if (!child || !reaped) {
fprintf(stderr, "%s: Count is too large; out of memory.\n", argv[1]);
return EXIT_FAILURE;
}
for (i = 0; i < count; i++) {
p = fork();
if (p == -1) {
if (i == 0) {
fprintf(stderr, "Cannot fork child processes: %s.\n", strerror(errno));
return EXIT_FAILURE;
} else {
fprintf(stderr, "Cannot fork child %zu: %s.\n", i + 1, strerror(errno));
count = i;
break;
}
} else
if (!p) {
/* Child process */
execvp(argv[2], argv + 2);
{
const char *errmsg = strerror(errno);
fprintf(stderr, "Child process %ld: Cannot execute %s: %s.\n",
(long)getpid(), argv[2], errmsg);
exit(EXIT_FAILURE);
}
} else {
/* Parent process. */
child[i] = p;
}
}
/* Send all children the INT signal. */
for (i = 0; i < count; i++)
kill(child[i], SIGINT);
/* Reap and report each child. */
r = 0;
while (1) {
p = wait(&status);
if (p == -1) {
if (errno == ECHILD)
break;
fprintf(stderr, "Error waiting for child processes: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (r < count)
reaped[r++] = p;
else
fprintf(stderr, "Reaped an extra child process!\n");
if (WIFEXITED(status)) {
switch (WEXITSTATUS(status)) {
case EXIT_SUCCESS:
printf("Parent: Reaped child process %ld: EXIT_SUCCESS.\n", (long)p);
break;
case EXIT_FAILURE:
printf("Parent: Reaped child process %ld: EXIT_FAILURE.\n", (long)p);
break;
default:
printf("Parent: Reaped child process %ld: Exit status %d.\n", (long)p, WEXITSTATUS(status));
break;
}
fflush(stdout);
} else
if (WIFSIGNALED(status)) {
printf("Parent: Reaped child process %ld: Terminated by %s.\n", (long)p, signal_name(WTERMSIG(status)));
fflush(stdout);
} else {
printf("Parent: Reaped child process %ld: Lost.\n", (long)p);
fflush(stdout);
}
}
if (r == count) {
/* Sort both pid arrays. */
qsort(child, count, sizeof child[0], compare_pids);
qsort(reaped, count, sizeof reaped[0], compare_pids);
for (i = 0; i < count; i++)
if (child[i] != reaped[i])
break;
if (i == count)
printf("Parent: All %zu child processes were reaped successfully.\n", count);
}
return EXIT_SUCCESS;
}
child.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
const char *signal_name(const int signum)
{
static char buffer[32];
switch (signum) {
case SIGINT: return "INT";
case SIGHUP: return "HUP";
case SIGTERM: return "TERM";
default:
snprintf(buffer, sizeof buffer, "%d", signum);
return (const char *)buffer;
}
}
int main(void)
{
const long mypid = getpid();
sigset_t set;
siginfo_t info;
int result;
printf("Child: Child process %ld started!\n", mypid);
fflush(stdout);
sigemptyset(&set);
sigaddset(&set, SIGINT);
sigaddset(&set, SIGHUP);
sigaddset(&set, SIGTERM);
sigprocmask(SIG_BLOCK, &set, NULL);
result = sigwaitinfo(&set, &info);
if (result == -1) {
printf("Child: Child process %ld failed: %s.\n", mypid, strerror(errno));
return EXIT_FAILURE;
}
if (info.si_pid == 0)
printf("Child: Child process %ld terminated by signal %s via terminal.\n", mypid, signal_name(result));
else
if (info.si_pid == getppid())
printf("Child: Child process %ld terminated by signal %s sent by the parent process %ld.\n",
mypid, signal_name(result), (long)info.si_pid);
else
printf("Child: Child process %ld terminated by signal %s sent by process %ld.\n",
mypid, signal_name(result), (long)info.si_pid);
return EXIT_SUCCESS;
}
Compile both using e.g.
gcc -Wall -O2 parent.c -o parent
gcc -Wall -O2 child.c -o child
and run them using e.g.
./parent 100 ./child
where the 100 is the number of child processes to fork, each running ./child.
Errors are output to standard error. Each line from parent to standard output begins with Parent:, and each line from any child to standard output begins with Child:.
On my machine, the last line in the output is always Parent: All # child processes were reaped successfully., which means that every child process fork()ed, was reaped and reported using wait(). Nothing was lost, and there were no issues with fork() and kill().
(Do note that if you specify more child processes than you are allowed to fork, the parent program does not consider that an error, and just uses the allowed number of child processes for the test.)
On my machine, forking and reaping 100 child processes is enough work for the parent process, so that every child process gets to the part where it is ready to catch the signal.
On the other hand, the parent can handle 10 child processes (running ./parent 10 ./child) so fast that every one of the child processes gets killed by the INT signal before they are ready to handle the signal.
Here is the output from a pretty typical case when running ./parent 20 ./child:
Child: Child process 19982 started!
Child: Child process 19983 started!
Child: Child process 19984 started!
Child: Child process 19982 terminated by signal INT sent by the parent process 19981.
Child: Child process 19992 started!
Child: Child process 19983 terminated by signal INT sent by the parent process 19981.
Child: Child process 19984 terminated by signal INT sent by the parent process 19981.
Parent: Reaped child process 19982: EXIT_SUCCESS.
Parent: Reaped child process 19985: Terminated by INT.
Parent: Reaped child process 19986: Terminated by INT.
Parent: Reaped child process 19984: EXIT_SUCCESS.
Parent: Reaped child process 19987: Terminated by INT.
Parent: Reaped child process 19988: Terminated by INT.
Parent: Reaped child process 19989: Terminated by INT.
Parent: Reaped child process 19990: Terminated by INT.
Parent: Reaped child process 19991: Terminated by INT.
Parent: Reaped child process 19992: Terminated by INT.
Parent: Reaped child process 19993: Terminated by INT.
Parent: Reaped child process 19994: Terminated by INT.
Parent: Reaped child process 19995: Terminated by INT.
Parent: Reaped child process 19996: Terminated by INT.
Parent: Reaped child process 19983: EXIT_SUCCESS.
Parent: Reaped child process 19997: Terminated by INT.
Parent: Reaped child process 19998: Terminated by INT.
Parent: Reaped child process 19999: Terminated by INT.
Parent: Reaped child process 20000: Terminated by INT.
Parent: Reaped child process 20001: Terminated by INT.
Parent: All 20 child processes were reaped successfully.
Of the 20 child processes, 16 were killed by INT signal before they executed the first printf() (or fflush(stdout)) line. (We could add a printf("Child: Child process %ld executing %s\n", (long)getpid(), argv[2]); fflush(stdout); to parent.c just before the execvp() line, to see if any of the child processes get killed before they execute at all.)
Of the four remaining child processes (19982, 19983, 19984, and 19992), one (19982) was terminated after the first printf() or fflush(), but before it managed to run setprocmask(), which blocks the signal and prepares the child for catching it.
Only those three remaining child processes (19983, 19984, and 19992) caught the INT signal sent by the parent process.
As you can see, just adding complete error checking, and adding sufficient output (and fflush(stdout); where useful, as standard output is buffered by default), lets you run several test cases, and construct a much better overall picture of what is happening.
The program I'm working on is a task control and when I run a command like: restart my_program; restart my_program I get an unstable behaviour. When I call restart, a SIGINT is sent, then a new fork() is called then another SIGINT is sent, just like the example above.
In that case, you are sending the signal before the new fork is ready, so the default disposition of the signal (Termination, for INT) defines what happens.
The solutions to this underlying problem vary. Note that it is at the core of many init system issues. It is easy to solve if the child (my_program here) co-operates, but difficult in all other cases.
One simple co-operation method is to have the child send a signal to its parent process, whenever it is ready for action. To avoid killing parent processes that are unprepared for such information, a signal that is ignored by default (SIGWINCH, for example) can be used.
The option of sleeping for some duration, so that the new child process has enough time to become ready for action, is a common, but pretty unreliable method of mitigating this issue. (In particular, the required duration depends on the child process priority, and the overall load on the machine.)
Try using the waitpid() command in the for loop. This way the next child will only write once the first child is done
Here is my code to examine this:
void handler(int n) {
printf("handler %d\n", n);
int status;
if (wait(&status) < 0)
printf("%s\n", strerror(errno));
}
int main() {
struct sigaction sig;
sigemptyset(&sig.sa_mask);
sig.sa_handler = handler;
sig.sa_flags = 0;
sig.sa_restorer = NULL;
struct sigaction sigold;
sigaction(SIGCHLD, &sig, &sigold);
pid_t pid;
int status;
printf("before fork\n");
if ((pid = fork()) == 0) {
_exit(127);
} else if (pid > 0) {
printf("before waitpid\n");
if (waitpid(pid, &status, 0) < 0)
printf("%s\n", strerror(errno));
printf("after waitpid\n");
}
printf("after fork\n");
return 0;
}
The output is:
before fork
before waitpid
handler 17
No child processes
after waitpid
after fork
So, I think waitpid will block SIGCHLD and wait for child to terminate, once the child terminates, it will do something and the unblock the SIGCHLD before it returns, that's why we see "No child processes" error and "after waitpid" is after "handler 17", am I right? if not, what is the truth? How to explain the output sequence? Is there a specification for Linux or something like that to check?
The exit information for a process can only be collected once. Your output shows the signal handler being called while your code is in waitpid(), but the handler calls wait() and that collects the information of the child (which you throw away without reporting). Then when you get back to waitpid(), the child exit status has been collected, so there's nothing left for waitpid() to report on, hence the `no child processes' error.
Here's an adaptation of your program. It abuses things by using printf() inside the signal handler function, but it seems to work despite that, testing on a Mac running macOS Sierra 10.12.4 (compiling with GCC 7.1.0).
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>
static void handler(int n)
{
printf("handler %d\n", n);
int status;
int corpse;
if ((corpse = wait(&status)) < 0)
printf("%s: %s\n", __func__, strerror(errno));
else
printf("%s: child %d exited with status 0x%.4X\n", __func__, corpse, status);
}
int main(void)
{
struct sigaction sig = { 0 };
sigemptyset(&sig.sa_mask);
sig.sa_handler = handler;
sig.sa_flags = 0;
sigaction(SIGCHLD, &sig, NULL);
pid_t pid;
printf("before fork\n");
if ((pid = fork()) == 0)
{
_exit(127);
}
else if (pid > 0)
{
printf("before waitpid\n");
int status;
int corpse;
while ((corpse = waitpid(pid, &status, 0)) > 0 || errno == EINTR)
{
if (corpse < 0)
printf("loop: %s\n", strerror(errno));
else
printf("%s: child %d exited with status 0x%.4X\n", __func__, corpse, status);
}
if (corpse < 0)
printf("%s: %s\n", __func__, strerror(errno));
printf("after waitpid loop\n");
}
printf("after fork\n");
return 0;
}
Sample output:
before fork
before waitpid
handler 20
handler: child 29481 exited with status 0x7F00
loop: Interrupted system call
main: No child processes
after waitpid loop
after fork
The status value 0x7F00 is the normal encoding for _exit(127). The signal number is different for macOS from Linux; that's perfectly permissible.
To get the code to compile on Linux (Centos 7 and Ubuntu 16.04 LTS used for the test), using GCC 4.8.5 (almost antediluvian — the current version is GCC 7.1.0) and 5.4.0 respectively, using the command line:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wmissing-prototypes \
> -Wstrict-prototypes -Wold-style-definition sg59.c -o sg59
$
I added #define _XOPEN_SOURCE 800 before the first header, and used:
struct sigaction sig;
memset(&sig, '\0', sizeof(sig));
to initialize the structure with GCC 4.8.5. That sort of shenanigan is occasionally a painful necessity to avoid compiler warnings. I note that although the #define was necessary to expose POSIX symbols, the initializer (struct sigaction sig = { 0 };) was accepted by GCC 5.4.0 without problems.
When I then run the program, I get very similar output to what cong reports getting in a comment:
before fork
before waitpid
handler 17
handler: No child processes
main: child 101681 exited with status 0x7F00
main: No child processes
after waitpid loop
after fork
It is curious indeed that on Linux, the process is sent a SIGCHLD signal and yet wait() cannot wait for it in the signal handler. That is at least counter-intuitive.
We can debate how much it matters that the first argument to waitpid() is pid rather than 0; the error is inevitable on the second iteration of the loop since the first collected the information from the child. In practice, it doesn't matter here. In general, it would be better to be using waitpid(0, &status, WNOHANG) or thereabouts — depending on context, 0 instead of WNOHANG might be better.
UPDATE: This appears to be a timing issue. Adding a call to sleep before the call to kill makes everything work as expected.
I have been playing with clone(2) and trying to get a handle on how it works. I am currently having trouble sending signals to a cloned process. I have the following code:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <pthread.h>
volatile int keep_going = 1;
typedef void (*sighandler_t)(int);
void handler(int sig) {
printf("Signal Received\n");
keep_going = 0;
}
int thread_main(void* arg) {
struct sigaction usr_action;
sigset_t block_mask;
sigfillset(&block_mask);
usr_action.sa_handler = &handler;
usr_action.sa_mask = block_mask;
usr_action.sa_flags = 0;
sigaction(SIGUSR1, &usr_action, NULL);
printf("Hello from cloned thread\n");
while(keep_going);
}
int main(int argc, char **argv) {
void* stack = malloc(4096);
int flags = SIGCHLD;
int child_tid = clone(&thread_main, stack + 4096, flags, NULL);
if (child_tid < 0) {
perror("clone");
exit(EXIT_FAILURE);
}
printf("My pid: %d, child_tid: %d\n", (int) getpid(), (int) child_tid);
int kill_ret = kill(child_tid, SIGUSR1);
if (kill_ret < 0) {
perror("kill");
exit(EXIT_FAILURE);
}
int status = 0;
pid_t returned_pid = waitpid(child_tid, &status, 0);
if (returned_pid < 0) {
perror("waitpid");
exit(EXIT_FAILURE);
}
if (WIFEXITED(status)) {
printf("exited, status=%d\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf("killed by signal %d\n", WTERMSIG(status));
} else if (WIFSTOPPED(status)) {
printf("stopped by signal %d\n", WSTOPSIG(status));
} else if (WIFCONTINUED(status)) {
printf("continued\n");
}
exit(EXIT_SUCCESS);
}
Which yields the following output:
My pid: 14101, child_tid: 14102
killed by signal 10
The child was obviously killed as a result of the signal, why did the signal handler not get called?
To avoid the race condition, catch the signal on the parent, before the clone() call. The child inherits a copy of the parent's signal handlers. You can reset it later on the parent to SIG_DFL if you want. (Also, getpid() is async-signal-safe, if you want to emulate SIG_DFL behaviour on the parent).
The child is not receiving the signal because before the child has reached to the call to sigaction the parent is sending the signal and thats why it is getting killed. You should avoid setting the signal handler this way. Still if you want to do this way only then make sure is parent is waiting until the child sets up the signal handler. With this scenario you should see the expected result.
First what is strange is you didn't get this message :
"Hello from cloned thread\n"
therefore your child tread gets terminated before it manages to setup the signal handler.
EDIT:
I just saw your comment about sleep. Try to add another variable, which is set when the sigaction gets executed. The main thread should be blocked until this variable is not set.