I am writing a shell in C and I am trying to add signal handling. In the shell, fork() is called and the child process executes a shell command. The child process is put into its own process group. This way, if Ctrl-C is pressed when a child process is in the foreground, it closes all of the processes that share the same process group id. The shell executes the commands as expected.
The problem is the signals. When, for example, I execute "sleep 5", and then I press Ctrl-C for SIGINT, the "shell>" prompt comes up as expected but the process is still running in the background. If I quickly run "ps" after I press Ctrl-C, the sleep call is still there. Then after the 5 seconds are up and I run "ps" again, it's gone. The same thing happens when I press Ctrl-Z (SIGTSTP). With SIGTSTP, the process goes to the background, as expected, but it doesn't pause execution. It keeps running until it's finished.
Why are these processes being sent to the background like this and continuing to run?
Here is the gist of my code...
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int status;
void sig_handler_parent()
{
printf("\n");
}
void sig_handler_sigchild(int signum)
{
waitpid(-1, &status, WNOHANG);
}
int main()
{
signal(SIGCHLD, sig_handler_sigchild);
signal(SIGINT, sig_handler_parent);
signal(SIGQUIT, sig_handler_parent);
signal(SIGTERM, sig_handler_parent);
signal(SIGCONT, sig_handler_parent);
signal(SIGTSTP, sig_handler_parent);
while (1)
{
printf("shell> ");
// GET COMMAND INPUT HERE
pid = fork();
if (pid == 0)
{
setpgid(getpid(), getpid());
execvp(cmd[0], cmd);
printf("%s: unknown command\n", cmd[0]);
exit(1);
}
else
waitpid(0, &status, WUNTRACED);
}
return 0;
}
p.s. I have already tried setting all of the signal handlers to SIG_DFL before the exec command.
The code you provide does not compile, and an attempt to fix it shows
that you omitted a lot. I am only guessing.
In order to bring you forward, I'll point out a number of facts that
you might have misunderstood. Together with a couple of documentation
links, I hope this is helpful.
Error Handling
First: please make a habit of handling errors, especially when you
know there's something that you don't understand. For example, the
parent (your shell) waits until the child terminates,
waitpid(0, &status, WUNTRACED);
You say,
When, for example, I execute "sleep 5", and then I press Ctrl-C for
SIGINT, the "shell>" prompt comes up as expected but the process is
still running in the background.
What actually happens is that once you press Ctrl-C, the parent (not the
child; see below for why) receives SIGINT (the kernel's terminal
subsystem handles keyboard input, sees that someone holds "Ctrl" and
"C" at the same time, and concludes that all processes with that
controlling terminal must be sent SIGINT).
Change the parent branch to,
int error = waitpid(0, &status, WUNTRACED);
if (error != 0)
perror("waitpid");
With this, you'd see perror() print something like:
waitpid: interrupted system call
You want SIGINT to go to the child, so something must be wrong.
Signal Handlers, fork(), and exec()
Next, what happens to your signal handlers across fork() and
exec()?
The signal overview man
page states,
A child created via fork(2) inherits a copy of its parent's signal
dispositions. During an execve(2), the dispositions of handled
signals are reset to the default; the dispositions of ignored
signals are left unchanged.
So, ideally, what this means is that:
The parent (shell) sees SIGINT, as observed above, and prints
"interrupted system call".
The child's signal handlers are reset back to their defaults. For SIGINT,
this means to terminate.
You do not fiddle with the controlling terminal, so the child
inherits the controlling terminal of the parent. This means that
SIGINT is delivered to both parent and child. Given that the
child's SIGINT behavior is to terminate, I'd bet that no process is
left running.
Except when you use setpgid() to create a new process group.
Process Groups, Sessions, and Controlling Terminal
Someone once called me a UNIX greybeard. While this is true form a
visual point of view, I must reject that compliment because I rarely
hang around in one of the darkest corners of UNIX - the terminal
subsystem. Shell writers have to understand that too though.
In this context, it's the "NOTES" section of the setpgid() man
page. I suggest
you read that, especially where it says,
At any time, one (and only one) of the process groups in the session
can be the foreground process group for the terminal; (...)
The shell (bash maybe) from which you start your shell program has
done so for the foreground invocation of your program, and marked that
as "foreground process group". Effectively this means, "Please, dear
terminal, whenever someone presses Ctrl-C, send a SIGINT to all
processes in that group. I (your parent) just sit and wait (waitpid()) until all is over, and will take control again then.".
You create a process group for the child, but don't tell the terminal
about it. What you want is to
Detach the parent from the terminal.
Set the child process group as the foregroud process group of the terminal.
Wait for the child (you already do).
Regain terminal foreground.
Further down in the "NOTES" section of said man page, they give links
to how that is done. Follow those, read thoroughly, try out things,
and make sure you handle errors. In most cases, such errors are signs
of misunderstanding. And, in most cases, such errors are fixed by
re-reading the documentation.
Are you sure that your child process is actually receiving the signals from your tty? I believe you need to make a call to tcsetpgrp to actually tell the controlling terminal to send signals to the process group of your child process.
For example, after you call fork, and before exec, try this from within your child.
tcsetpgrp(STDIN_FILENO, getpid())
Here is the man page for tcsetpgrp(3)
Related
Tracing exec process from the parent to count the number of forks. In the parent I resume the execve and verify it stopped. I resume it (with ptrace(cont..)or SIGCONT). However its not resuming.
I read I should set PTRACE_O_TRACEEXEC option failing to do so results in SIGTRAP being sent to tracee upon a call to execve. Ive added into the child & parent, it doesnt seem to make a difference.
Im using Linux MX, it shouldnt be a problem as this should work in all recent Linux versions.
argument: /bin/bash -c "echo 'first test' | wc -c"
int main(int argc, char *argv[]) {
int status;
int counter = 0;
pid_t pid = fork();
if (pid < 0)
exit(1);
else if (pid == 0) {
ptrace(PTRACE_TRACEME, pid, NULL, NULL);
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACEEXEC);
raise(SIGSTOP);
execve(argv[1], &argv[1], NULL);
} else {
wait(&status);
if (WIFSTOPPED(status))
printf("child is stopped");
if (status == 0)
printf("The child process terminated normally.");
if (status == 1)
printf("The child process terminated with an error!.");
ptrace(PTRACE_CONT, pid, NULL, NULL); //<< Child should restart here, not sure if pid = childs or parents.
raise(SIGCONT); // if ptrace(cont) doesnt work then this should make the child start.
if (WIFSTOPPED(status))
printf("child is stopped"); // << this shouldnt print bc Ive continued the child process.
while (1) {
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL | PTRACE_O_TRACEFORK);
if (status >> 8 == (SIGTRAP | (PTRACE_O_TRACEFORK << 8)) {
printf("it works");
break;
}
}
}
return 0;
}
The question and some of the code comments indicate some uncertainty about the behavior of ptrace, that in turn suggests unfamiliarity with its documentation. Manual pages are not always easy to understand, but you should definitely start there.
There are several problems with the program presented. Among the more significant are:
PTRACE_TRACEME is the only ptrace command recognized from the tracee. All other ptrace commands must be executed by the tracer, which is the parent in this case. In particular, any PTRACE_SETOPTIONS commands must be executed by the tracer.
The tracer (parent) uses raise() to attempt to send a SIGCONT to the child. This is wrong for two reasons:
raise() sends the specified signal to the calling process, which is not the one that is intended to receive the signal. The kill() function should be used instead to send a signal (of your choice) to another process. But,
A traced process is stopped whenever it receives a signal, but this ptrace-stop is different from "stopped by a signal". The effect of SIGCONT to resume execution applies only to the latter kind of stoppage. Therefore, sending a ptrace-stopped process a SIGCONT is counterproductive if the objective is to make the tracee resume execution. Not only will it not resume the process from ptrace-stop, but when it ultimately is delivered, it will cause the process to reenter ptrace-stop.
The program seems to assume that the second WIFSTOPPED(status) might evaluate to a different value than the first, and in particular, to a value that reflects the child's status at the time of the evaluation of WIFSTOPPED. That is a faulty assumption. WIFSTOPPED and its brethren simply interpret the status integer provided by wait(), which is a static representation of the status of the waited-for process as of the time the wait returned. It will not reflect any status changes since that time.
The program pervasively assumes that all its function calls succeed normally. Generally speaking, this is not a safe assumption. The program should check the return values of its function calls to verify successful completion, and take appropriate action (error message, possibly program termination) in the event of unexpected failure.
Overall, the program does not reflect a good understanding about the usage paradigm of ptrace(), and especially about the role of signals. For example, I take this comment:
I read I should set PTRACE_O_TRACEEXEC option failing to do so results
in SIGTRAP being sent to tracee upon a call to execve.
Yes, if the PTRACE_O_TRACEEXEC is not enabled, then the traced process will be sent a SIGTRAP when it calls execve, but the whole point of that is so that the tracer can catch that and have the option to trace the tracee at that point. As with any signal (other than SIGKILL) delivered to the tracee, the tracer has complete control over whether the signal is actually acted upon by the tracee. Furthermore, if PTRACE_O_TRACEEXEC is in effect then the tracee still stops for tracing when it performs an execve -- the difference is that the status returned to the tracer can distinguish between this case and the one where a SIGTRAP is delivered to the tracee for some other reason (read the docs for details).
So here's a general outline:
The original process forks.
The child executes a PTRACE_TRACEME command to attach itself as tracee to its parent process. (It is not useful for your purposes for the child to execute any other ptrace commands.)
The child proceeds normally to whichever exec-family function is appropriate (all will eventually result in a call to execve). It's not particularly useful for the child to send itself a signal prior to the exec if the only point is to get the attention of the tracer.
The parent loops, repeatedly waiting for the child. It doesn't have to use the wait() function specifically; other functions from that family, such as waitpid(), may also be used.
Upon each successful return from a wait, the parent handles a ptrace or other event. The first of these is likely to be the SIGTRAP generated by the child's initial execve (though it cannot be ruled out that another event, for a different signal, is received first). The status code filled in by wait conveys information about the nature of the event, and in particular, which signal or traced event triggered it.
The parent sets whatever ptrace options it wants when the first ptrace event is received. This can include PTRACE_O_TRACEEXEC if you like, though again, exec events are traced regardless. If you want to count not only the forks of the initial child, but also any forks of that child's descendants, then you probably want to set PTRACE_O_TRACEFORK.
After every event handled, the parent performs a PTRACE_CONT command targeting the tracee. If you are tracing all the forked descendants, then the applicable tracee is not necessarily the initial child.
The parent exits the loop after it has handled a termination event for every traced descendant.
I need to spawn a long-running child process and then kill it from the parent code. At the moment I do like this:
int PID = fork();
if (PID == 0) {
execl("myexec", "myexec", nullptr);
perror("ERROR");
exit(1);
}
// do something in between and then:
kill(PID, SIGKILL);
This does the job in the sense that the child process is stopped, but then it remains as a zombie. I tried to completely remove it by adding:
kill(getpid(), SIGCHLD);
to no avail. I must be doing something wrong but I can't figure out what, so any help would be greatly appreciated. Thanks.
signal(SIGCHLD, SIG_IGN);
kill(getpid(), SIGCHLD);
Presto. No zombie.
By ignoring SIGCHLD we tell the kernel we don't care about exit codes so the zombies just go away immediately.
You have been answered with:
signal(SIGCHLD, SIG_IGN);
to ignore the signal sent to the parent when a child dies. This is an old mechanism to avoid zombies, but zombies are your friends, as my answer will explain.
The zombies are not a bug, but a feature of the system. They are there to complete the fork(2), wait(2), exit(2), kill(2) group of system calls.
When you wait(2) for a child to die, the kernel tests if there's a child running with the characteristics you state in the wait(2). If it exists, the wait(2) will block, because the wait(2) system call is the one used in unix to give the parent the exit status of the waited child. If you use wait() and you have done no fork() a new child previously, wait() should give you an error, because you are calling wait with no fork (i'll stop boldening the system calls in this discussion from here on) but what happens if the parent did a fork but the child died before the parent was capable of making a wait. Should this be taken as an error? No. The system maintains the process table entry for the child proces, until one of two things happen: The parent dies (then all children processess get orphaned, being adopted by process id 1 ---init/systemd--- which is continously blocked in wait calls; or the parent does a wait, in which case the status of one (or the one requested) of the children is reported.
So in a proper usage of the system, it is possible (or necessary) to make a wait call for each fork you make. if you do more waits than forks, you get errors... if you make more forks than waits, you get zombies. In order to compensate this, your code should be changed to make the appropiate wait call.
kill(PID, SIGINT); /* don't use SIGKILL in the first time, give time to your process to arrange its last will before dying */
res = waitpid(PID, &status, 0);
And this will allow the child to die normally. The child is going to die, because you killed it (except if the child has decided to ignore the signal you send to it)
The reason for no race condition here (the child could die before is is wait()ed for) is the zombie process. Zombie processes are not proper processes, they don't accept signals, it is impossible to kill them, because they are dead already (no pun here ;)). They only occupy the process table slot, but no resource is allocated to them. When a parent does a wait call, if there's a zombie, it will be freed and the accounting info will be transferred to the parent (this is how the accounting is done), including the exit status, and if there isn't (because it died prematurely and you had invoked the above behaviour) you will get an error from wait, and the accounting info will be tranferred to init/systemd, which will cope for this. If you decide to ignore the SIGCHLD signal, you are cancelling the production of zombies, but the accounting is being feed in the wron way to init/systemd, and not accounted in the parent. (no such process can be waited for) you cannot distinguish if the wait fails because the child process died or because you didn't spawn it correctly. More is to come.
Let's say that the child cannot exec a new program and it dies (calling exit()). When you kill it, nothing happens, as there's no target process (well, you should receive an error from kill call, but I assume you are not interested in the kill result, as it is irrelevant for this case, you are interested in the status of the child, or how did the child died. This means you need to wait for it. if you get a normal exit, with a status of 1, (as you did an exit in case exec() fails) you will know that the child was not able to exec (you still need to distinguish if the 1 exit code was produced by the child or by the program later run by the child). If you successfully killed the child, you should get a status telling you that the child was killed with signal (the one you sent) and you will know that your code is behaving properly.
In case you don't want to block your parent process in the wait system call (well, your child program could have decided to ignore signals and the kill had no effect), then you can substitute the above by this:
kill(PID, SIGINT);
res = waitpid(PID, &status, WNOHANG);
that will not block the parent, in the case the child program has decided to ignore the signal you send to it. In this case, if wait returns -1 and errno value EINTR, then you know that your child has decided to ignore the signal you sent to it, and you need help from the operator (or be more drastic, e.g. killing it with SIGKILL).
A good approach should be
void alarm_handler()
{
}
...
kill(PID, SIGINT); /* kill it softly (it's your child, man!!) */
void *saved = signal(SIGALRM, alarm_handler);
alarm(3); /* put an awakener, you will be interrupted in 3s. */
res = waitpid(PID, &status, 0);
signal(SIGALRM, saaved); /* restore the previous signal handler */
if (res == -1 && errno == EINTR) {
/* we where interrupted by the alarm, and child didn't die. */
kill(PID, SIGKILL); /* be more rude */
}
I encountered this problem when doing shell lab from the book CSAPP, the lab ask you to implement your own version of shell with some specification,one of them is
Typing ctrl-c (ctrl-z) should cause a SIGINT (SIGTSTP) signal to be
sent to the current foreground job, as well as any descendents of that
job (e.g., any child processes that it forked). If there is no
foreground job, then the signal should have no effect.
so you should complete one of the given functions called sigint_handler which supposed to catch SIGINT signal and send it along to the foreground job. below is a piece of code I find online(the code passed the correctness check)
void sigint_handler(int sig)
{
int olderrno = errno;
pid_t pid = fgpid(jobs);
if (pid != 0)
kill(-pid, sig);
errno = olderrno;
return;
}
what I don't understand is if SIGINT is sent using kill,then the descendents of foreground job will also use this handler to catch SIGINT signal right? so it's kind of a recursive call to me.so how does this actually work? thanks for helping me.
handler is installed in the main function
signal(SIGINT, sigint_handler); /* ctrl-c */
and fgpid return PID of current foreground job, 0 if no such job
Once a child process calls execve(), the child's (usually short-lived) initial address space is freed/released and replaced with space for the specified executable image; now the child no longer has a copy or access to the parent's data or text, like signal handlers.
Now consider a process-group associated with the control-terminal (tty). When a user types a CTRL-C (or CTRL-\ or CTRL-Z), the tty driver posts a signal to 1+ processes as members of the associated process-group. The result of delivering a signal would be the system default action unless a process established a different signal disposition (signal(), sigaction(), or related).
The posted code excerpt indicates a relayed event: user types a CTRL-C, tty driver posts a SIGINT to the shell, shell's handler looks for a foreground job, calls kill() with a negative pid to post a signal to members of that process-group.
For related info see these man pages:
man setpgrp
man tty_ioctl (symbols: TIOCSCTTY, TIOCGPGRP, TIOCSPGRP)
alternates:
man tcgetpgrp tcsetpgrp
I am trying to implement a shell program and I want the shell program to ignore SIG_INT(ctrl + c). But in my program the child also ignores the SIG_INT signal, which it should not because exec should take the child process to another program and that program should handle the SIG_INT signal by default. What should I do so that the child process terminates when ctrl + c is pressed.
Newly edited: after I put signal(certain_signal, SIG_DFL) in the block my child process, my code works fine. But I am still confused about how this work. Does this mean that signals as well as signal disposition can both propagate through execute command?
int main(void){
signal(SIG_INT, SIG_IGN);
int result = fork();
if(result == 0){
//child:
//exec some programs
}
else{
waitpid(result);
//do something
}
}
I believe you have misunderstood slightly how exec modifies signal dispositions. In the Linux exec man page, for example(1), it states that (my emphasis):
All process attributes are preserved during an execve(), except the following:
The dispositions of any signals that are being caught are reset to the default (signal(7)).
<Lots of other irrelevant stuff, in the context of this question>
Signals that are being caught are not the same as signals that are being ignored, as evidenced by the signal man page:
Using these system calls, a process can elect one of the following behaviors to occur on delivery of the signal:
perform the default action;
ignore the signal; or
catch the signal with a signal handler, a programmer-defined function that is automatically invoked when the signal is delivered.
That actually makes some sense because, while ignoring a signal can propagate through the exec call, a signal handler cannot - the function that is meant to handle the signal has been replaced by the exec call, so attempting to call it would most likely be catastrophic.
You can see this behaviour of inheriting the "ignore" disposition by compiling the following two programs, qqp.c:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
int main (void) {
signal (SIGINT, SIG_IGN);
puts("Parent start");
if (fork() == 0)
execl ("./qqc", 0);
wait(0);
sleep (1);
puts("Parent end");
return 0;
}
and qqc.c:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
int main (void) {
//signal (SIGINT, SIG_DFL);
puts("Child start");
sleep (60);
puts("Child end");
return 0;
}
Note that you could also change the disposition in the first code sample, between the fork and the exec. This would be preferable in cases where you don't actually control what the second code sample will do (such as if you're calling an executable you didn't compile).
Running qqp, neither the parent nor child will exit prematurely no matter how many times you press CTRL-C. But, uncomment out the line that reverts to default behaviour and you can break out of the child easily.
So, if you want your child to revert to the default behaviour, you need to do that in the child itself, with something like:
signal (SIG_INT, SIG_DFL);
(1) POSIX has a little more detail on what happens:
Signals set to the default action (SIG_DFL) in the calling process image shall be set to the default action in the new process image. Except for SIGCHLD, signals set to be ignored (SIG_IGN) by the calling process image shall be set to be ignored by the new process image. Signals set to be caught by the calling process image shall be set to the default action in the new process image (see <signal.h>). If the SIGCHLD signal is set to be ignored by the calling process image, it is unspecified whether the SIGCHLD signal is set to be ignored or to the default action in the new process image.
And, just on your edit that my proposed solution works, but that it raises another question for you:
Does this mean that signals as well as signal disposition can both propagate through execute command?
Signals themselves do not propagate through the exec call, a signal is actually the "interrupt" being generated. That's different to a signal handler (code to handle a signal) or the signal disposition (what to do when the signal occurs). As shown above, dispositions may survive the exec call but handlers cannot. Signals also do not.
What you're seeing when you press CTRL-C and multiple processes are affected has nothing to do with inheriting signals across the exec boundary, it's more to do with the terminal stuff.
A signal delivered to an individual process will not affect any of its child processes. However, pressing CTRL-C does not send a signal to an individual process. The POSIX terminal interface has a concept of controlling terminals and process groups:
Each process also is a member of a process group. Each terminal device records a process group that is termed its foreground process group. The process groups control terminal access and signal delivery. Signals generated at the terminal are sent to all processes that are members of the terminal's foreground process group.
I am writing a mini-shell(no, not for school :P; for my own enjoyment) and most of the basic functionality is now done but I am stuck when trying to handle SIGTSTP.
Supposedly, when a user presses Ctrl+Z, SIGTSTP should be sent to the Foreground process of the shell if it exists, and Shell should continue normally.
After creating each process(if it's a Foreground process), the following code waits:
if(waitpid(pid, &processReturnStatus, WUNTRACED)>0){//wait stopped too
if(WIFEXITED(processReturnStatus) || WIFSIGNALED(processReturnStatus))
removeFromJobList(pid);
}
And I am handling the signal as follows:
void sigtstpHandler(int signum)
{
signum++;//Just to remove gcc's warning
pid_t pid = findForegroundProcessID();
if(pid > -1){
kill(-pid, SIGTSTP);//Sending to the whole group
}
}
What happens is that when I press Ctrl+Z, the child process does get suspended indeed(using ps -all to view the state of the processes) but my shell hangs at waitpid it never returns even though I passed WUNTRACED flag which as far as I understood is supposed to make waitpid return when the process is stopped too.
So what could I have possible done wrong? or did I understand waitpid's behavior incorrectly?
Notes:
-findForegroundProcessID() returns the right pid; I double checked that.
-I am changing each process's group when right after I fork
-Handling Ctrl+C is working just fine
-If I use another terminal to send SIGCONT after my shell hangs, the child process resumes its work and the shell reaps it eventually.
-I am catching SIGTSTP which as far as I read(and tested) can be caught.
-I tried using waitid instead of waitpid just in case, problem persisted.
EDIT:
void sigchldHandler(int signum)
{
signum++;//Just to remove the warning
pid_t pid;
while((pid = waitpid(-1, &processReturnStatus, 0)) > 0){
removeFromJobList(pid);
}
if(errno != ECHILD)
unixError("kill error");
}
My SIGCHLD handler.
SIGCHLD is delivered for stopped children. The waitpid() call in the signal handler - which doesn't specify WUNTRACED - blocks forever.
You should probably not have the removeFromJobList() processing in two different places. If I had to guess, it sounds like it touches global data structures, and doesn't belong in a signal handler.
Waitpid is not returning because you are not not setting a sigchld handler (which I sent you earlier). You have child processess that are not getting reaped. Furthermore, waitpid needs to be in a while loop, not an if (also sent you that).
The only signal you are supposed to catch is SIGCHLD. The reason being is that if your processes are forked properly, the kernel will send that signal to the foreground process and it will terminate it or stop it or do whatever the signal is properly.
When process groups are not set correctly, signals will get sent to the wrong process. One way to test that is by running a foreground process and hitting Ctrl-Z. If your entire shell exists, then Ctrl-Z signal is getting sent to the entire shell. This means you did not set the new process in a new process group and gave it a terminal.
Now here's what you need to do if your Ctrl-Z signal is stopping your entire shell. Once you fork a process, in the child process:
- Set the process in its own group using setpgid.
- Give it a sane terminal by blocking SIGTTOU and then giving it the terminal using tcsetpgrp.
In the parent:
- Also set its child process using setpgid. This is because you have no idea if the child or the parent will execute first, so this avoids a race condition. It doesn't hurt to set it twice.