I am writing a mini-shell(no, not for school :P; for my own enjoyment) and most of the basic functionality is now done but I am stuck when trying to handle SIGTSTP.
Supposedly, when a user presses Ctrl+Z, SIGTSTP should be sent to the Foreground process of the shell if it exists, and Shell should continue normally.
After creating each process(if it's a Foreground process), the following code waits:
if(waitpid(pid, &processReturnStatus, WUNTRACED)>0){//wait stopped too
if(WIFEXITED(processReturnStatus) || WIFSIGNALED(processReturnStatus))
removeFromJobList(pid);
}
And I am handling the signal as follows:
void sigtstpHandler(int signum)
{
signum++;//Just to remove gcc's warning
pid_t pid = findForegroundProcessID();
if(pid > -1){
kill(-pid, SIGTSTP);//Sending to the whole group
}
}
What happens is that when I press Ctrl+Z, the child process does get suspended indeed(using ps -all to view the state of the processes) but my shell hangs at waitpid it never returns even though I passed WUNTRACED flag which as far as I understood is supposed to make waitpid return when the process is stopped too.
So what could I have possible done wrong? or did I understand waitpid's behavior incorrectly?
Notes:
-findForegroundProcessID() returns the right pid; I double checked that.
-I am changing each process's group when right after I fork
-Handling Ctrl+C is working just fine
-If I use another terminal to send SIGCONT after my shell hangs, the child process resumes its work and the shell reaps it eventually.
-I am catching SIGTSTP which as far as I read(and tested) can be caught.
-I tried using waitid instead of waitpid just in case, problem persisted.
EDIT:
void sigchldHandler(int signum)
{
signum++;//Just to remove the warning
pid_t pid;
while((pid = waitpid(-1, &processReturnStatus, 0)) > 0){
removeFromJobList(pid);
}
if(errno != ECHILD)
unixError("kill error");
}
My SIGCHLD handler.
SIGCHLD is delivered for stopped children. The waitpid() call in the signal handler - which doesn't specify WUNTRACED - blocks forever.
You should probably not have the removeFromJobList() processing in two different places. If I had to guess, it sounds like it touches global data structures, and doesn't belong in a signal handler.
Waitpid is not returning because you are not not setting a sigchld handler (which I sent you earlier). You have child processess that are not getting reaped. Furthermore, waitpid needs to be in a while loop, not an if (also sent you that).
The only signal you are supposed to catch is SIGCHLD. The reason being is that if your processes are forked properly, the kernel will send that signal to the foreground process and it will terminate it or stop it or do whatever the signal is properly.
When process groups are not set correctly, signals will get sent to the wrong process. One way to test that is by running a foreground process and hitting Ctrl-Z. If your entire shell exists, then Ctrl-Z signal is getting sent to the entire shell. This means you did not set the new process in a new process group and gave it a terminal.
Now here's what you need to do if your Ctrl-Z signal is stopping your entire shell. Once you fork a process, in the child process:
- Set the process in its own group using setpgid.
- Give it a sane terminal by blocking SIGTTOU and then giving it the terminal using tcsetpgrp.
In the parent:
- Also set its child process using setpgid. This is because you have no idea if the child or the parent will execute first, so this avoids a race condition. It doesn't hurt to set it twice.
Related
I need to spawn a long-running child process and then kill it from the parent code. At the moment I do like this:
int PID = fork();
if (PID == 0) {
execl("myexec", "myexec", nullptr);
perror("ERROR");
exit(1);
}
// do something in between and then:
kill(PID, SIGKILL);
This does the job in the sense that the child process is stopped, but then it remains as a zombie. I tried to completely remove it by adding:
kill(getpid(), SIGCHLD);
to no avail. I must be doing something wrong but I can't figure out what, so any help would be greatly appreciated. Thanks.
signal(SIGCHLD, SIG_IGN);
kill(getpid(), SIGCHLD);
Presto. No zombie.
By ignoring SIGCHLD we tell the kernel we don't care about exit codes so the zombies just go away immediately.
You have been answered with:
signal(SIGCHLD, SIG_IGN);
to ignore the signal sent to the parent when a child dies. This is an old mechanism to avoid zombies, but zombies are your friends, as my answer will explain.
The zombies are not a bug, but a feature of the system. They are there to complete the fork(2), wait(2), exit(2), kill(2) group of system calls.
When you wait(2) for a child to die, the kernel tests if there's a child running with the characteristics you state in the wait(2). If it exists, the wait(2) will block, because the wait(2) system call is the one used in unix to give the parent the exit status of the waited child. If you use wait() and you have done no fork() a new child previously, wait() should give you an error, because you are calling wait with no fork (i'll stop boldening the system calls in this discussion from here on) but what happens if the parent did a fork but the child died before the parent was capable of making a wait. Should this be taken as an error? No. The system maintains the process table entry for the child proces, until one of two things happen: The parent dies (then all children processess get orphaned, being adopted by process id 1 ---init/systemd--- which is continously blocked in wait calls; or the parent does a wait, in which case the status of one (or the one requested) of the children is reported.
So in a proper usage of the system, it is possible (or necessary) to make a wait call for each fork you make. if you do more waits than forks, you get errors... if you make more forks than waits, you get zombies. In order to compensate this, your code should be changed to make the appropiate wait call.
kill(PID, SIGINT); /* don't use SIGKILL in the first time, give time to your process to arrange its last will before dying */
res = waitpid(PID, &status, 0);
And this will allow the child to die normally. The child is going to die, because you killed it (except if the child has decided to ignore the signal you send to it)
The reason for no race condition here (the child could die before is is wait()ed for) is the zombie process. Zombie processes are not proper processes, they don't accept signals, it is impossible to kill them, because they are dead already (no pun here ;)). They only occupy the process table slot, but no resource is allocated to them. When a parent does a wait call, if there's a zombie, it will be freed and the accounting info will be transferred to the parent (this is how the accounting is done), including the exit status, and if there isn't (because it died prematurely and you had invoked the above behaviour) you will get an error from wait, and the accounting info will be tranferred to init/systemd, which will cope for this. If you decide to ignore the SIGCHLD signal, you are cancelling the production of zombies, but the accounting is being feed in the wron way to init/systemd, and not accounted in the parent. (no such process can be waited for) you cannot distinguish if the wait fails because the child process died or because you didn't spawn it correctly. More is to come.
Let's say that the child cannot exec a new program and it dies (calling exit()). When you kill it, nothing happens, as there's no target process (well, you should receive an error from kill call, but I assume you are not interested in the kill result, as it is irrelevant for this case, you are interested in the status of the child, or how did the child died. This means you need to wait for it. if you get a normal exit, with a status of 1, (as you did an exit in case exec() fails) you will know that the child was not able to exec (you still need to distinguish if the 1 exit code was produced by the child or by the program later run by the child). If you successfully killed the child, you should get a status telling you that the child was killed with signal (the one you sent) and you will know that your code is behaving properly.
In case you don't want to block your parent process in the wait system call (well, your child program could have decided to ignore signals and the kill had no effect), then you can substitute the above by this:
kill(PID, SIGINT);
res = waitpid(PID, &status, WNOHANG);
that will not block the parent, in the case the child program has decided to ignore the signal you send to it. In this case, if wait returns -1 and errno value EINTR, then you know that your child has decided to ignore the signal you sent to it, and you need help from the operator (or be more drastic, e.g. killing it with SIGKILL).
A good approach should be
void alarm_handler()
{
}
...
kill(PID, SIGINT); /* kill it softly (it's your child, man!!) */
void *saved = signal(SIGALRM, alarm_handler);
alarm(3); /* put an awakener, you will be interrupted in 3s. */
res = waitpid(PID, &status, 0);
signal(SIGALRM, saaved); /* restore the previous signal handler */
if (res == -1 && errno == EINTR) {
/* we where interrupted by the alarm, and child didn't die. */
kill(PID, SIGKILL); /* be more rude */
}
I am writing a shell in C and I am trying to add signal handling. In the shell, fork() is called and the child process executes a shell command. The child process is put into its own process group. This way, if Ctrl-C is pressed when a child process is in the foreground, it closes all of the processes that share the same process group id. The shell executes the commands as expected.
The problem is the signals. When, for example, I execute "sleep 5", and then I press Ctrl-C for SIGINT, the "shell>" prompt comes up as expected but the process is still running in the background. If I quickly run "ps" after I press Ctrl-C, the sleep call is still there. Then after the 5 seconds are up and I run "ps" again, it's gone. The same thing happens when I press Ctrl-Z (SIGTSTP). With SIGTSTP, the process goes to the background, as expected, but it doesn't pause execution. It keeps running until it's finished.
Why are these processes being sent to the background like this and continuing to run?
Here is the gist of my code...
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int status;
void sig_handler_parent()
{
printf("\n");
}
void sig_handler_sigchild(int signum)
{
waitpid(-1, &status, WNOHANG);
}
int main()
{
signal(SIGCHLD, sig_handler_sigchild);
signal(SIGINT, sig_handler_parent);
signal(SIGQUIT, sig_handler_parent);
signal(SIGTERM, sig_handler_parent);
signal(SIGCONT, sig_handler_parent);
signal(SIGTSTP, sig_handler_parent);
while (1)
{
printf("shell> ");
// GET COMMAND INPUT HERE
pid = fork();
if (pid == 0)
{
setpgid(getpid(), getpid());
execvp(cmd[0], cmd);
printf("%s: unknown command\n", cmd[0]);
exit(1);
}
else
waitpid(0, &status, WUNTRACED);
}
return 0;
}
p.s. I have already tried setting all of the signal handlers to SIG_DFL before the exec command.
The code you provide does not compile, and an attempt to fix it shows
that you omitted a lot. I am only guessing.
In order to bring you forward, I'll point out a number of facts that
you might have misunderstood. Together with a couple of documentation
links, I hope this is helpful.
Error Handling
First: please make a habit of handling errors, especially when you
know there's something that you don't understand. For example, the
parent (your shell) waits until the child terminates,
waitpid(0, &status, WUNTRACED);
You say,
When, for example, I execute "sleep 5", and then I press Ctrl-C for
SIGINT, the "shell>" prompt comes up as expected but the process is
still running in the background.
What actually happens is that once you press Ctrl-C, the parent (not the
child; see below for why) receives SIGINT (the kernel's terminal
subsystem handles keyboard input, sees that someone holds "Ctrl" and
"C" at the same time, and concludes that all processes with that
controlling terminal must be sent SIGINT).
Change the parent branch to,
int error = waitpid(0, &status, WUNTRACED);
if (error != 0)
perror("waitpid");
With this, you'd see perror() print something like:
waitpid: interrupted system call
You want SIGINT to go to the child, so something must be wrong.
Signal Handlers, fork(), and exec()
Next, what happens to your signal handlers across fork() and
exec()?
The signal overview man
page states,
A child created via fork(2) inherits a copy of its parent's signal
dispositions. During an execve(2), the dispositions of handled
signals are reset to the default; the dispositions of ignored
signals are left unchanged.
So, ideally, what this means is that:
The parent (shell) sees SIGINT, as observed above, and prints
"interrupted system call".
The child's signal handlers are reset back to their defaults. For SIGINT,
this means to terminate.
You do not fiddle with the controlling terminal, so the child
inherits the controlling terminal of the parent. This means that
SIGINT is delivered to both parent and child. Given that the
child's SIGINT behavior is to terminate, I'd bet that no process is
left running.
Except when you use setpgid() to create a new process group.
Process Groups, Sessions, and Controlling Terminal
Someone once called me a UNIX greybeard. While this is true form a
visual point of view, I must reject that compliment because I rarely
hang around in one of the darkest corners of UNIX - the terminal
subsystem. Shell writers have to understand that too though.
In this context, it's the "NOTES" section of the setpgid() man
page. I suggest
you read that, especially where it says,
At any time, one (and only one) of the process groups in the session
can be the foreground process group for the terminal; (...)
The shell (bash maybe) from which you start your shell program has
done so for the foreground invocation of your program, and marked that
as "foreground process group". Effectively this means, "Please, dear
terminal, whenever someone presses Ctrl-C, send a SIGINT to all
processes in that group. I (your parent) just sit and wait (waitpid()) until all is over, and will take control again then.".
You create a process group for the child, but don't tell the terminal
about it. What you want is to
Detach the parent from the terminal.
Set the child process group as the foregroud process group of the terminal.
Wait for the child (you already do).
Regain terminal foreground.
Further down in the "NOTES" section of said man page, they give links
to how that is done. Follow those, read thoroughly, try out things,
and make sure you handle errors. In most cases, such errors are signs
of misunderstanding. And, in most cases, such errors are fixed by
re-reading the documentation.
Are you sure that your child process is actually receiving the signals from your tty? I believe you need to make a call to tcsetpgrp to actually tell the controlling terminal to send signals to the process group of your child process.
For example, after you call fork, and before exec, try this from within your child.
tcsetpgrp(STDIN_FILENO, getpid())
Here is the man page for tcsetpgrp(3)
I'm writting a toy shell for university and I have to update the status of a background process when it ends. So I came up with the idea of making the handler of SIGCHLD do that, since that signal is sent when the process ends. The problem is that in order to implement the command jobs, I have to update the status from "running" to "terminated" and first I have to find that specific process in the array I have dedicated to it, and one way to do it is by searching by pid, since the array stores the information that is display in jobs. Each entry stores the process pid, the status (which is a string) and the command itself.
Now the question is:
Is there a way to get the pid of the process that called the signal when it ended?
Right now this is what my handler function looks like:
void handler(int sig){
int child_pid;
child_pid = wait(NULL);
//finds the process with a pid identical
//to child_pid in the list and updates its status
...
}
Since wait(NULL) returns the pid of the first process that ends since it's called, the status is only updated when another background process ends and therefore the wrong process status is updated.
We haven't been tought many things from the wait() and waitpid()functions apart from that they waits for a process to end, so any insight may be helpful.
While it is not a good idea to use wait in a signal handler, you can do the following to accomplish what you are trying to do.
void handler(int sig){
int child_pid;
int status;
child_pid = waitpid(-1, &status, WUNTRACED | WNOHANG);
if(child_pid > 0)
{
// your code
// make sure to deal with the cases of child_pid < 0 and child_pid == 0
}
}
What you are doing is not technically wrong, however, it would be better to use waitpid. When you use waitpid(-1,...) it works similarly to if you used wait(...) and will not only wait for a specified process but any process that terminates. The main difference is that you can specify WUNTRACED which will suspend execution until a process in the wait set becomes either terminated or stopped. The WNOHANG will tell waitpid to not suspend the execution of the process. You don't want your handler to be suspended.
If multiple signals are sent to the same process i.e. due to multiple children terminating at the same time, then it will seem like only one signal was sent because the signal handler will not be executed again. This is due to how signals are sent; when a signal is created it is put into an exception table for the process to then "receive" it; signals do not use queues. To account for this, you will need to iterate over the waitpid(-1,...) call until it returns 0 to make sure you are reaping all of the terminated children.
Additionally, do watch out for where else you are reaping the child (note that if the child has already been reaped then waitpid will return 0 if you use the WNOHANG flag). I would assume that this is what is causing the behavior you are seeing of the status only updating when another background process ends. For example, because you are making a toy shell I assume that you are waiting for the foreground processes somewhere, and if you use the wait function there as well as in your handler you could get 1 of two things to happen. The child gets reaped in the 'wait for foreground process' method and then when the handler is executed there is nothing for it to reap. And 2, the child gets reaped in the handler method, and then the 'wait for foreground process' never exits.
I'm writing a basic unix shell in C and I would like to catch Cntrl-C signals in the shell and pass them to foreground processes only, but not to background processes. The shell itself should keep running (and it does), and the Background processes should ignore a Cntrl-C and only be killed by a kill signal sent specifically to them, perhaps via a command-line "kill pid". Both foreground and background processes, however, should trigger the handler with SIGCHLD. Right now, however, the shell catches the Cntrl-C signal, and seems to correctly identify that there is no foreground process to pass the signal to, but the background process still dies.
I tried setting the group id of the background process to something else, and that solves the problem, but it creates a new problem. When I do that, my signal handler no longer catches the signal when the background process completes.
So far, I've looked at the man pages for SIGINT, I've read about 20 SO answers, I've tried setting the group id of the child to something different than the parent (solves the problem, but now child can no longer send SIGCHLD to parent), and I've checked that childid != foreground process and foregroundProcess == 0 when I'm running a background process. But still the background process gets killed. Any ideas?
I think my problem is somewhere in my signal handler, but not really sure:
in main:
struct sigaction sa;
sa.sa_handler = &handleSignal; /*passing function ref. to handler */
sa.sa_flags = SA_RESTART;
sigfillset(&sa.sa_mask); /*block all other signals while handling sigs */
sigaction(SIGUSR1, &sa, NULL);
sigaction(SIGINT, &sa, NULL);
sigaction(SIGCHLD, &sa, NULL);
sigaction(SIGTERM, &sa, NULL);
handleSignal looks like this:
void handleSignal(int signal){
int childid;
switch (signal) {
/*if the signal came from a child*/
case SIGCHLD:
/*get the child's id and status*/
childid = waitpid(-1,&childStatus,0);
/*No action for foreground processes that exit w/status 0 */
/*otherwise show pid & showStatus */
if ((childid != foregroundProcess)){
printf("pid %i:",childid);
showStatus(childStatus);
fflush(stdout);
}
break;
/* if signal came from somewhere else, pass it to foreground child */
/* if one exists. */
default:
printf("Caught signal: %i and passing it", signal);
printf(" to child w/pid: %i\n\n:", foregroundProcess);
fflush(stdout);
/*If there is a child, send signal to it. */
if (foregroundProcess){
printf("trying to kill foreground.\n");
fflush(stdout);
kill(foregroundProcess, signal);
}
}
}
Found an answer to my own question. I had already tried changing the group id of the background child process using setpid(0,0); and this worked, but created a different problem. After that call, I was no longer catching SIGCHLD signals from the child in the parent. This is because once the process group of the child is changed, it is essentially no longer connected to the parent for signalling purposes. This solved the problem of the background processes catching Cntrl-C (SIGINT) signals from the parent (undesired behavior), but prevented the background process from signalling the parent when complete. Solved one problem only to create another.
Instead, the solution was to detect whether a child was about to be created as a foreground or a background process, and if background, tell it to ignore the SIGINT signal: signal(SIGINT, SIG_IGN);
Since it's probably the first process on the tty, the kernel will send the signal to all its children when it sends a kernel-initiated SIGINT, SIGHUP, or SIGQUIT. See Terminate sudo python script when the terminal closes for some more details on that, and ideas for tracing/debugging what's happening to make sure you have it right.
An alternative to starting bg child processes out ignoring SIGINT (and SIGQUIT, and maybe SIGHUP?) is to avoid having the kernel deliver those signals to the shell.
I forget, but I think the kernel only delivers SIGINT to the current foreground process. Your shell won't get it if cat or something is running in the foreground. So you only have to worry about what happens while the shell is in the foreground. (I'm only about 80% sure of this, though, and this idea is useless if the shell (and thus all its children) always receive the SIGINT.)
If you disable the tty's signal-sending when the user is editing a command prompt, you can avoid having the shell receive SIGINT. Probably the best way is to do the equivalent of stty -isig, using tcsetattr(3)
// disable interrupts
struct termios term_settings;
tcgetattr(fd, &term_settings); // TODO: check errors
term_settings.c_lflag &= ~ISIG; // clear the interactive signals bit
tcsetattr(fd, TCSANOW, &term_settings);
// do the reverse (settings |= ISIG) after forking, before exec
If you strace that, you'll just see ioctl system calls, because that's the system call the termios library functions are implemented on top of.
I guess this leaves a tiny time window between a child process ending, and wait() returning and the shell disabling terminal interrupts.
IIRC, I read something about it being tricky for bash to keep track of when to swap the terminal between raw (for line editing) and cooked (for commands to be run), but I think that's only because of job control. (^z / fg)
I'm want to create a lot of child processes using the fork > exec procedure. Many processes are ending very fast (in less than two minutes, some even earlier).
My first problem is, I put the spawn process into the background with
./spawnbot > logging.txt
[CTRL+Z]
bg 1
disown
So far so good. Now I don't see any of the spawnbot's messages anymore and they go straight into the logging.txt. However, whenever a new child is created I see all the info about that child in my console again.. I now wanted to start each child with it's own pipe - is there a better way to not have children post their output messages all over the console? Should I just redirect it to /dev/null or is this done with some flag in C?
Secondly, all the children don't really get killed. I have a lot of processes in my ps -ef. What can I do about that? How do I d
First your second question!
Your children stay in 'zombie' mode because the kernel thinks you might still want to retrieve a return value from them..
If you have no intention to get return values from your child processes, you should set the SIGCHLD signal handler in the parent process to SIG_IGN to have the kernel automatically reap your children.
signal(SIGCHLD, SIG_IGN);
The first question depends a it on your implementation..
But general speaking, just after you fork() you should use close() to close the old file descriptors for 0 and 1 and then use dup2() to set them to your wanted values.. No time for an example right now, but hope this pushes you in the right direction..
Your child processes are getting killed. Defunct processes are also called zombie processes; zombies are dead! A zombie process is nothing but an entry in the process table, it doesn't have any code or memory.
When a process dies (by calling _exit, or killed by a signal), it must be reaped by its parent. Every resource used by the process other than the entry in the process table disappears. The parent must call wait or waitpid. Once the parent has been notified of the child process's death, and has had the opportunity to read the child's exit status, the child's entry in the process table disappears as well: the zombie is reaped.
If you never want to be notified of your children's death, ignore the SIGCHLD signal; this tells the kernel that you're not interested in knowing the fate of your children and the zombie will be reaped automatically.
signal(SIGCHLD, SIG_IGN)
If you only want to be notified of your children's deaths in specific circumstances, call sigaction with the SA_NOCLDWAIT flag. When a child dies, if the parent is executing one of the wait family of functions, it'll be notified of the child's death and be told the exit status; otherwise the child's exit status will be discarded.
struct sigaction sa;
sa.sa_handler = &my_sigchld_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_NOCLDWAIT;
sigaction(SIGCHLD, &sa, NULL);
Regarding the output, your children write to the same places as the parent unless you've explicitly redirected them (with close and open, or dup, or a number of other possibilities). Your children are probably printing diagnostic messages to standard error (that's what it's for, after all).
./spawnbot >logging.txt 2>&1
In addition, since you seem to want to detach the children from the terminal, you probably want to make sure they don't receive a SIGHUP if you kill the terminal. So use nohup:
nohup ./spawnbot >logging.txt 2>&1 &
disown