C: Exec/fork > Defunct processes - c

I'm want to create a lot of child processes using the fork > exec procedure. Many processes are ending very fast (in less than two minutes, some even earlier).
My first problem is, I put the spawn process into the background with
./spawnbot > logging.txt
[CTRL+Z]
bg 1
disown
So far so good. Now I don't see any of the spawnbot's messages anymore and they go straight into the logging.txt. However, whenever a new child is created I see all the info about that child in my console again.. I now wanted to start each child with it's own pipe - is there a better way to not have children post their output messages all over the console? Should I just redirect it to /dev/null or is this done with some flag in C?
Secondly, all the children don't really get killed. I have a lot of processes in my ps -ef. What can I do about that? How do I d

First your second question!
Your children stay in 'zombie' mode because the kernel thinks you might still want to retrieve a return value from them..
If you have no intention to get return values from your child processes, you should set the SIGCHLD signal handler in the parent process to SIG_IGN to have the kernel automatically reap your children.
signal(SIGCHLD, SIG_IGN);
The first question depends a it on your implementation..
But general speaking, just after you fork() you should use close() to close the old file descriptors for 0 and 1 and then use dup2() to set them to your wanted values.. No time for an example right now, but hope this pushes you in the right direction..

Your child processes are getting killed. Defunct processes are also called zombie processes; zombies are dead! A zombie process is nothing but an entry in the process table, it doesn't have any code or memory.
When a process dies (by calling _exit, or killed by a signal), it must be reaped by its parent. Every resource used by the process other than the entry in the process table disappears. The parent must call wait or waitpid. Once the parent has been notified of the child process's death, and has had the opportunity to read the child's exit status, the child's entry in the process table disappears as well: the zombie is reaped.
If you never want to be notified of your children's death, ignore the SIGCHLD signal; this tells the kernel that you're not interested in knowing the fate of your children and the zombie will be reaped automatically.
signal(SIGCHLD, SIG_IGN)
If you only want to be notified of your children's deaths in specific circumstances, call sigaction with the SA_NOCLDWAIT flag. When a child dies, if the parent is executing one of the wait family of functions, it'll be notified of the child's death and be told the exit status; otherwise the child's exit status will be discarded.
struct sigaction sa;
sa.sa_handler = &my_sigchld_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_NOCLDWAIT;
sigaction(SIGCHLD, &sa, NULL);
Regarding the output, your children write to the same places as the parent unless you've explicitly redirected them (with close and open, or dup, or a number of other possibilities). Your children are probably printing diagnostic messages to standard error (that's what it's for, after all).
./spawnbot >logging.txt 2>&1
In addition, since you seem to want to detach the children from the terminal, you probably want to make sure they don't receive a SIGHUP if you kill the terminal. So use nohup:
nohup ./spawnbot >logging.txt 2>&1 &
disown

Related

SIGKILL to a subprocess tree on parent termination

I have a daemon application that starts several 3rd party executables (all closed-sources and non modifiable).
I would like to have all the child processes to automatically terminate when the parent exits for any reason (including crashes).
Currently, I am using prctl to achieve this (see also this question):
int ret = fork();
if (ret == 0) {
//Setup other stuff
prctl (PR_SET_PDEATHSIG, SIGKILL);
if (execve( "childexecutable" ) < 0) { /*signal error*/}
}
However, if "childexecutable" also forks and spawns "grandchildren", then "grandchildren" is not killed when my process exits.
Maybe I could create an intermediate process that serves as subreaper, that would then kill "someexecutable" when my process dies, but then wait for SIGCHLD and continue to kill child processes until none is left, but it seems very brittle.
Are there better solutions?
Creating a subreaper is not useful in this case, your grandchildren would be reparented to and reaped by init anyway.
What you could do however is:
Start a parent process and fork a child immediately.
The parent will simply wait for the child.
The child will carry out all the work of your actual program, including spawning any other children via fork + execve.
Upon exit of the child for any reason (including deathly signals e.g. a crash) the parent can issue kill(0, SIGKILL) or killpg(getpgid(0), SIGKILL) to kill all the processes in its process group. Issuing a SIGINT/SIGTERM before SIGKILL would probably be a better idea depending on what child processes you want to run, as they could handle such signals and do a graceful cleanup of used resources (including children) before exiting.
Assuming that none of the children or grandchildren changes their process group while running, this will kill the entire tree of processes upon exit of your program. You could also keep the PR_SET_PDEATHSIG before any execve to make this more robust. Again depending on the processes you want to run a PR_SET_PDEATHSIG with SIGINT/SIGTERM could make more sense than SIGKILL.
You can issue setpgid(getpid(), 0) before doing any of the above to create a new process group for your program and avoid killing any parents when issuing kill(0, SIGKILL).
The logic of the "parent" process should be really simple, just a fork + wait in a loop + kill upon the right condition returned by wait. Of course, if this process crashes too then all bets are off, so take care in writing simple and reliable code.

Don't send SIGINT on CTRL+C to child processes but don't ignore the signal itself

I'm trying to write a Task control program, very much like Supervisor.
I run some programs from a config file and let them run in the background, while in the main process I read and execute other commands.
Before fork()-ing, in the main process I call:
sigaction(SIGINT, &the_handler, NULL);
Where the_handler stores the reference of a simple print function.
When CTRL+C is pressed, the child processes are interrupted as well (which I don't want).
I could run: signal(SIGINT, SIG_IGN); after fork in child process to ignore it, but I would like to still to be able to run this command in bash: $ kill -n 2 <child_pid>, meaning, I don't want to ignore it, right?
So, how to ignore SIGINT from CTRL+C to child processes, but still be able to receive the signal in other ways? Or am I missing something?
The traditional means of doing this is to fork twice. The grand parent forks its children and then waits for them. Each child then forks and exits straight away. Because their parents have exited, the grand children become parented by pid 1. Thus signals sent to the grand parent do not get propagated to the ex-grand children.
See this answer for a bit more detail
https://stackoverflow.com/a/26418006/169346
ETA: You need to call setsid() between the two forks otherwise the grandchild is still in the same process group as the grand parent and will still receive signals that the grand parent receives.

terminal goes down with all children even when SIGKILLed but normal processes don't do the same

I would like to have the same effect in my program as the bash(terminal) does when we kill it using SIGKILL. As we know that we cannot handle SIGKILL in our progams so when ever I kill my program its children are assigned to init process, there is no way to handle it so that I can kill all my child processes and then kill the parent itself. Though when ever we kill terminal all the process created through it are killed even if we kill our terminal by SIGKILL.
For this I did some research and found the below post:
[https://unix.stackexchange.com/questions/54963/how-can-terminal-emulators-kill-their-children-after-recieving-a-sigkill][1]
The post is bit confusing, still what I got out of the post is that if the process you are killing is the process leader of the process group then all its children will be killed.
So for simplicity, I implemented below program to test if it is so:
int main()
{
printf("Curent PID: %u\n", getpid());
// make a new session
pid_t pid = setsid();
printf("New session ID: %u\n", pid);
pid = fork();
switch(pid)
{
case -1:
perror("UNable to fork the process\n");
exit(EXIT_FAILURE);
case 0:
// child process
while (1)
{
sleep(1);
}
break;
}
while (1)
{
printf("Process Leader running\n");
sleep(1);
}
return 0;
}
After running the above program when I killed the parent process the child process didn't got killed. I also modified the above program so that it does not belong to any tty, I thought might be the process leader should not be associated with any tty. I did that by the following way:
Create a normal process (Parent process)
Create a child process from within the above parent process
The process hierarchy at this stage looks like : TERMINAL -> PARENT PROCESS -> CHILD PROCESS
Terminate the the parent process.
The child process now becomes orphan and is taken over by the init process.
Call setsid() function to run the process in new session and have a new group.
then the same above code repeats.
Still when i killed the process leader children were there.
Maybe I didn't got that post on unix.stackexchange or is it the deafult behaviour in LINUX.
One way that I can implement to make all children kill is by catching each TERMINATING SIGNAL like SIGTERM, SIGHUP etc. handle them and write logic inside those signal handlers to kill child first.But still on SIGKILL I can't do anything.
Also I am interested to know that if killing parent process doesn't affect child process even if the parent is process leader or whatever then how bash(terminal) manages to kill all the child processes even if we send SIGKILL to it. Is there some extra logic written for terminals in LINUX kernel.
If there is a way to kill all child processes when parent is killed even using SIGKILL signal, I would be happy to know that too.
Manual page of kill says:
Negative PID values may be used to choose whole process groups
In my understanding to kill the whole group of processes, you have to send negative PID.
Another mechanism is causing that killing a terminal kills its child processes. Processes running from a terminal have their stdin/stdout attached to the terminal. When killing the terminal, those connections are closed and a signal (SIG_HUP) is sent to those processes. A usual program does not handle this signal and is terminated by default.
The advice from Marian is quite correct and well worth researching, but if you choose to follow that route you will likely end up with an implementation of what might be called the "hostage trick".
The hostage trick consists of your root process spawning an artificial child process which spends all its time in the stopped state. This "hostage" will be spawned immediately before the first child process which does real work in your (multi-process) program.
The hostage process is made the leader of its own process group and then enters a loop in which it stops itself with "raise(SIGSTOP)". If it is ever continued, it checks to see whether its parent has terminated (i.e. whether it has been re-parented or cannot signal its parent with the null signal (ESRCH)). If the parent has terminated, then the hostage should terminate, otherwise it should re-suspend with another "raise(SIGSTOP)".
You need to be careful about race conditions: e.g. for the re-parenting test take care to cache the parent-process-id for the hostage as the return value from "getpid()" before "fork()"-ing the hostage and also make "setpgid()" calls downstream of "fork()" in both parent and child. You then need to consider what you do if someone "kill(., SIGKILL)"s the hostage!
True, you can put a SIGCHLD handler in the parent to re-spawn it, but that requires considerable care to preserve the continuity of the identity of the hostage's process group; maybe there were other child processes at the time of the SIGKILL and the replacement hostage should go in the original process group, maybe there weren't and the original process group has evaporated.
Even if you get that right, the fact that you have put a "fork()" call in a handler for an asynchronous signal (SIGCHLD) will likely open another can of worms if your main process uses multiple threads.
Because of these difficulties I would advise against using the hostage trick unless the child processes run code over which you have no control (and to think seriously about the costs in complexity and maintainability even then). If you have control of the code of the child processes, then it is much simpler to use a "pipe()".
You create the pipe in the parent process and manage the file descriptors to ensure that the parent process is the sole writer and that each child process allocates one file descriptor to the read-side. If you do this, then the termination of the parent process (whether due to SIGKILL or any other cause) is communicated to the child processes by the EoF condition on the read side of the pipe as the last writer terminates.
If you want to treat SIGKILL specially, then you can use a protocol on the pipe whereby the parent process sends a termination message advising the children of its termination status on all normal terminations and on catchable fatal signals, and leaves the children to infer that the parent was killed by SIGKILL in the event that the read-side of the pipe delivers an EoF without a preceding termination message.
On Linux prctl(PR_SET_PDEATHSIG... will arrange for a process to receive a signal when it's parent dies, this setting is preserved over exec but not inherited by child processes.

waitpid blocking when it shouldn't

I am writing a mini-shell(no, not for school :P; for my own enjoyment) and most of the basic functionality is now done but I am stuck when trying to handle SIGTSTP.
Supposedly, when a user presses Ctrl+Z, SIGTSTP should be sent to the Foreground process of the shell if it exists, and Shell should continue normally.
After creating each process(if it's a Foreground process), the following code waits:
if(waitpid(pid, &processReturnStatus, WUNTRACED)>0){//wait stopped too
if(WIFEXITED(processReturnStatus) || WIFSIGNALED(processReturnStatus))
removeFromJobList(pid);
}
And I am handling the signal as follows:
void sigtstpHandler(int signum)
{
signum++;//Just to remove gcc's warning
pid_t pid = findForegroundProcessID();
if(pid > -1){
kill(-pid, SIGTSTP);//Sending to the whole group
}
}
What happens is that when I press Ctrl+Z, the child process does get suspended indeed(using ps -all to view the state of the processes) but my shell hangs at waitpid it never returns even though I passed WUNTRACED flag which as far as I understood is supposed to make waitpid return when the process is stopped too.
So what could I have possible done wrong? or did I understand waitpid's behavior incorrectly?
Notes:
-findForegroundProcessID() returns the right pid; I double checked that.
-I am changing each process's group when right after I fork
-Handling Ctrl+C is working just fine
-If I use another terminal to send SIGCONT after my shell hangs, the child process resumes its work and the shell reaps it eventually.
-I am catching SIGTSTP which as far as I read(and tested) can be caught.
-I tried using waitid instead of waitpid just in case, problem persisted.
EDIT:
void sigchldHandler(int signum)
{
signum++;//Just to remove the warning
pid_t pid;
while((pid = waitpid(-1, &processReturnStatus, 0)) > 0){
removeFromJobList(pid);
}
if(errno != ECHILD)
unixError("kill error");
}
My SIGCHLD handler.
SIGCHLD is delivered for stopped children. The waitpid() call in the signal handler - which doesn't specify WUNTRACED - blocks forever.
You should probably not have the removeFromJobList() processing in two different places. If I had to guess, it sounds like it touches global data structures, and doesn't belong in a signal handler.
Waitpid is not returning because you are not not setting a sigchld handler (which I sent you earlier). You have child processess that are not getting reaped. Furthermore, waitpid needs to be in a while loop, not an if (also sent you that).
The only signal you are supposed to catch is SIGCHLD. The reason being is that if your processes are forked properly, the kernel will send that signal to the foreground process and it will terminate it or stop it or do whatever the signal is properly.
When process groups are not set correctly, signals will get sent to the wrong process. One way to test that is by running a foreground process and hitting Ctrl-Z. If your entire shell exists, then Ctrl-Z signal is getting sent to the entire shell. This means you did not set the new process in a new process group and gave it a terminal.
Now here's what you need to do if your Ctrl-Z signal is stopping your entire shell. Once you fork a process, in the child process:
- Set the process in its own group using setpgid.
- Give it a sane terminal by blocking SIGTTOU and then giving it the terminal using tcsetpgrp.
In the parent:
- Also set its child process using setpgid. This is because you have no idea if the child or the parent will execute first, so this avoids a race condition. It doesn't hurt to set it twice.

Using waitpid or sigaction?

I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.

Resources