I am writing a project for class that finds zombies and reaps them in a Linux kernel.
I have found code that will create a single zombie, which gets reaped after a wait(), but my program must reap many, on the order of 1000.
I am very new to kernel manipulation/multi-threading and the resources I have found online dealing with zombies are either too technical, or ambiguous.
This is the code I am using:
pid_t child_pid;
child_pid = fork ();
if (child_pid > 0) {
sleep (60);
} else {
exit (0);
}
Once again, my question is: How should I go about creating multiple zombies, for my program to reap?
Much thanks -Jared
A zombie is no more than one terminated process who got a parent that didn't read his exit status (in a nutshell: parent didn't call wait() after the child exit) and keep memory and resources busy.
To achieve what you need just fork a lot of processes (use a loop for example) and never call wait()
Related
I need to spawn a long-running child process and then kill it from the parent code. At the moment I do like this:
int PID = fork();
if (PID == 0) {
execl("myexec", "myexec", nullptr);
perror("ERROR");
exit(1);
}
// do something in between and then:
kill(PID, SIGKILL);
This does the job in the sense that the child process is stopped, but then it remains as a zombie. I tried to completely remove it by adding:
kill(getpid(), SIGCHLD);
to no avail. I must be doing something wrong but I can't figure out what, so any help would be greatly appreciated. Thanks.
signal(SIGCHLD, SIG_IGN);
kill(getpid(), SIGCHLD);
Presto. No zombie.
By ignoring SIGCHLD we tell the kernel we don't care about exit codes so the zombies just go away immediately.
You have been answered with:
signal(SIGCHLD, SIG_IGN);
to ignore the signal sent to the parent when a child dies. This is an old mechanism to avoid zombies, but zombies are your friends, as my answer will explain.
The zombies are not a bug, but a feature of the system. They are there to complete the fork(2), wait(2), exit(2), kill(2) group of system calls.
When you wait(2) for a child to die, the kernel tests if there's a child running with the characteristics you state in the wait(2). If it exists, the wait(2) will block, because the wait(2) system call is the one used in unix to give the parent the exit status of the waited child. If you use wait() and you have done no fork() a new child previously, wait() should give you an error, because you are calling wait with no fork (i'll stop boldening the system calls in this discussion from here on) but what happens if the parent did a fork but the child died before the parent was capable of making a wait. Should this be taken as an error? No. The system maintains the process table entry for the child proces, until one of two things happen: The parent dies (then all children processess get orphaned, being adopted by process id 1 ---init/systemd--- which is continously blocked in wait calls; or the parent does a wait, in which case the status of one (or the one requested) of the children is reported.
So in a proper usage of the system, it is possible (or necessary) to make a wait call for each fork you make. if you do more waits than forks, you get errors... if you make more forks than waits, you get zombies. In order to compensate this, your code should be changed to make the appropiate wait call.
kill(PID, SIGINT); /* don't use SIGKILL in the first time, give time to your process to arrange its last will before dying */
res = waitpid(PID, &status, 0);
And this will allow the child to die normally. The child is going to die, because you killed it (except if the child has decided to ignore the signal you send to it)
The reason for no race condition here (the child could die before is is wait()ed for) is the zombie process. Zombie processes are not proper processes, they don't accept signals, it is impossible to kill them, because they are dead already (no pun here ;)). They only occupy the process table slot, but no resource is allocated to them. When a parent does a wait call, if there's a zombie, it will be freed and the accounting info will be transferred to the parent (this is how the accounting is done), including the exit status, and if there isn't (because it died prematurely and you had invoked the above behaviour) you will get an error from wait, and the accounting info will be tranferred to init/systemd, which will cope for this. If you decide to ignore the SIGCHLD signal, you are cancelling the production of zombies, but the accounting is being feed in the wron way to init/systemd, and not accounted in the parent. (no such process can be waited for) you cannot distinguish if the wait fails because the child process died or because you didn't spawn it correctly. More is to come.
Let's say that the child cannot exec a new program and it dies (calling exit()). When you kill it, nothing happens, as there's no target process (well, you should receive an error from kill call, but I assume you are not interested in the kill result, as it is irrelevant for this case, you are interested in the status of the child, or how did the child died. This means you need to wait for it. if you get a normal exit, with a status of 1, (as you did an exit in case exec() fails) you will know that the child was not able to exec (you still need to distinguish if the 1 exit code was produced by the child or by the program later run by the child). If you successfully killed the child, you should get a status telling you that the child was killed with signal (the one you sent) and you will know that your code is behaving properly.
In case you don't want to block your parent process in the wait system call (well, your child program could have decided to ignore signals and the kill had no effect), then you can substitute the above by this:
kill(PID, SIGINT);
res = waitpid(PID, &status, WNOHANG);
that will not block the parent, in the case the child program has decided to ignore the signal you send to it. In this case, if wait returns -1 and errno value EINTR, then you know that your child has decided to ignore the signal you sent to it, and you need help from the operator (or be more drastic, e.g. killing it with SIGKILL).
A good approach should be
void alarm_handler()
{
}
...
kill(PID, SIGINT); /* kill it softly (it's your child, man!!) */
void *saved = signal(SIGALRM, alarm_handler);
alarm(3); /* put an awakener, you will be interrupted in 3s. */
res = waitpid(PID, &status, 0);
signal(SIGALRM, saaved); /* restore the previous signal handler */
if (res == -1 && errno == EINTR) {
/* we where interrupted by the alarm, and child didn't die. */
kill(PID, SIGKILL); /* be more rude */
}
I have a daemon application that starts several 3rd party executables (all closed-sources and non modifiable).
I would like to have all the child processes to automatically terminate when the parent exits for any reason (including crashes).
Currently, I am using prctl to achieve this (see also this question):
int ret = fork();
if (ret == 0) {
//Setup other stuff
prctl (PR_SET_PDEATHSIG, SIGKILL);
if (execve( "childexecutable" ) < 0) { /*signal error*/}
}
However, if "childexecutable" also forks and spawns "grandchildren", then "grandchildren" is not killed when my process exits.
Maybe I could create an intermediate process that serves as subreaper, that would then kill "someexecutable" when my process dies, but then wait for SIGCHLD and continue to kill child processes until none is left, but it seems very brittle.
Are there better solutions?
Creating a subreaper is not useful in this case, your grandchildren would be reparented to and reaped by init anyway.
What you could do however is:
Start a parent process and fork a child immediately.
The parent will simply wait for the child.
The child will carry out all the work of your actual program, including spawning any other children via fork + execve.
Upon exit of the child for any reason (including deathly signals e.g. a crash) the parent can issue kill(0, SIGKILL) or killpg(getpgid(0), SIGKILL) to kill all the processes in its process group. Issuing a SIGINT/SIGTERM before SIGKILL would probably be a better idea depending on what child processes you want to run, as they could handle such signals and do a graceful cleanup of used resources (including children) before exiting.
Assuming that none of the children or grandchildren changes their process group while running, this will kill the entire tree of processes upon exit of your program. You could also keep the PR_SET_PDEATHSIG before any execve to make this more robust. Again depending on the processes you want to run a PR_SET_PDEATHSIG with SIGINT/SIGTERM could make more sense than SIGKILL.
You can issue setpgid(getpid(), 0) before doing any of the above to create a new process group for your program and avoid killing any parents when issuing kill(0, SIGKILL).
The logic of the "parent" process should be really simple, just a fork + wait in a loop + kill upon the right condition returned by wait. Of course, if this process crashes too then all bets are off, so take care in writing simple and reliable code.
Child and parent process execution is parallel and which starts first depends on OS scheduling. But what can be done to start child always before the parent?
This is the pseudo code for my problem,
int start_test()
{
pid_t pid;
pid = fork();
if(pid == 0) {
execv("XXX", XXX);
} else if(pid > 0) {
pid = fork();
if(pid == 0) {
execv("XXX", XXX);
} else {
// Do something
}
}
return 0;
}
int main()
{
start_test();
return 0;
}
I wants to make first execv execute first than parent creates new process again. Every execv should be in sequence.
I don't really know why people keep telling not to rely on this behaviour, it's actually used a lot in tracing programs (strace, ldtrace, ...).
First, fork your process and get the child pid, stop the child, and resume it in the parent:
pid_t pid = fork();
if (pid == -1)
abort();
else if (pid == 0) {
raise(SIGSTOP); // stop the child
} else {
waitpid(pid, NULL, WUNTRACED); // wait until the child is stopped
kill(pid, SIGCONT); // resume the child
}
You can achieve this thing in case of pthread (POSIX thread), but not in case of process.
See, the process scheduling is always in the hands of kernel and that you cannot manipulate explicitly. In a parallel-processing system all processes (whether it is child process, parent process or other zombie process) all are executed in parallel, that you cannot change.
The sleep() method could work, but it is very poor approach to be followed.
1. By making use of signal handling.
When you fork() a new child process, just then you sleep() or pause() the parent process. Child process will be executed where as the parent process will be in waiting position. And then child process sends custom signal which will be handeled by parent process to continue the execution.
(This is also hectic, because you need to handle signal in program).
2. By using the system calls.
By making use of system calls you can handle the process state (ready, suspend, terminate, etc). There are certain shell commands that implicitly uses the system-signal-handling to change the process state/priority. If you know the processID (pid) then you can do:
kill -SIGSTOP [pid]
kill -SIGCONT [pid]
And in case of c-programming you can do:
system("kill -SIGSTOP [pid]"); //pause
and
system("kill -SIGCONT [pid]"); //resume
For more reference you can open this page.
Moreover, if you can you specify the actual problem where you are going to implement this thing, i could suggest you suitably.
Use a binary semaphore with initial value 0. After fork, parent should wait on the semaphore. After child starts, it can signal the semaphore (i.e., make it 1). Then, parent's wait would be over and it will progress.
In linux if you want the child process run first, you need to use kernel.sched_child_runs_first sysctl parameter
There is no guarantee for one process to be scheduled before another. Even if you put the parent to sleep(), it could very well happen that the child executes first, if other processes have preempted the parent right after the fork. The child and parent can very well run truly in parallel on two CPUs.
Actually, there is no value in doing so. If some kind of synchronization is required between the two processes, use an explicit mechanism like pipes/signals, etc.
In short: do not write code to rely on behaviour that is not guaranteed.
Threads provide more mechanisms to synchronize parallel code execution. You might have a look at pthread. Note that threads – different from processes – share resources like memory, etc. which may impose other problems.
Just put wait(0); inside the parent.
So parent will wait until child is done.
if(child){
//do whatever
}
if(parent{
wait(0);
// so whatever
}
This question already has answers here:
What is the reason for performing a double fork when creating a daemon?
(9 answers)
Closed 8 years ago.
Nagios lets me configure child_processes_fork_twice=<0/1>.
The documentation says
This option determines whether or not Nagios will fork() child processes twice when it executes host and service checks. By default, Nagios fork()s twice. However, if the use_large_installation_tweaks option is enabled, it will only fork() once.
As far as I know fork() will spawn a new child process. Why would I want to do that twice?
All right, so now first of all: what is a zombie process? It's a process that is dead, but its parent was busy doing some other work, hence it could not collect the child's exit status. In some cases, the child runs for a very long time, the parent cannot wait for that long, and will continue with it's work (note that the parent doesn't die, but continues its remaining tasks but doesn't care about the child). In this way, a zombie process is created. Now let's get down to business. How does forking twice help here? The important thing to note is that the grandchild does the work which the parent process wants its child to do. Now the first time fork is called, the first child simply forks again and exits. This way, the parent doesn't have to wait for a long time to collect the child's exit status (since the child's only job is to create another child and exit). So, the first child doesn't become a zombie. As for the grandchild, its parent has already died. Hence the grandchild will be adopted by the init process, which always collects the exit status of all its child processes. So, now the parent doesn't have to wait for very long, and no zombie process will be created. There are other ways to avoid a zombie process; this is just a common technique. Hope this helps!
In Linux, a daemon is typically created by forking twice with the intermediate process exiting after forking the grandchild. This has the effect of orphaning the grandchild process. As a result, it becomes the responsibility of the OS to clean up after it if it terminates. The reason has to do with what are known as zombie processes which continue to live and consume resources after exiting because their parent, who'd normally be responsible for the cleaning up, has also died.
Also from the documentation,
Normally Nagios will fork() twice when it executes host and service checks. This is done to (1) ensure a high level of resistance against plugins that go awry and segfault and (2) make the OS deal with cleaning up the grandchild process once it exits.
Unix Programming Faq §1.6.2:
1.6.2 How do I prevent them from occuring?
You need to ensure that your parent process calls wait() (or
waitpid(), wait3(), etc.) for every child process that terminates;
or, on some systems, you can instruct the system that you are
uninterested in child exit states.
Another approach is to fork() twice, and have the immediate child
process exit straight away. This causes the grandchild process to be
orphaned, so the init process is responsible for cleaning it up. For
code to do this, see the function fork2() in the examples section.
To ignore child exit states, you need to do the following (check your
system's manpages to see if this works):
struct sigaction sa;
sa.sa_handler = SIG_IGN;
#ifdef SA_NOCLDWAIT
sa.sa_flags = SA_NOCLDWAIT;
#else
sa.sa_flags = 0;
#endif
sigemptyset(&sa.sa_mask);
sigaction(SIGCHLD, &sa, NULL);
If this is successful, then the wait() functions are prevented from
working; if any of them are called, they will wait until all child
processes have terminated, then return failure with errno == ECHILD.
The other technique is to catch the SIGCHLD signal, and have the
signal handler call waitpid() or wait3(). See the examples section
for a complete program.
This code demonstrates how to use the double fork method to allow the grandchild process to become adopted by init, without risk of zombie processes.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
int main()
{
pid_t p1 = fork();
if (p1 != 0)
{
printf("p1 process id is %d", getpid());
wait();
system("ps");
}
else
{
pid_t p2 = fork();
int pid = getpid();
if (p2 != 0)
{
printf("p2 process id is %d", pid);
}
else
{
printf("p3 process id is %d", pid);
}
exit(0);
}
}
The parent will fork the new child process, and then wait for it to finish. The child will fork a grandchild process, and then exit(0).
In this case, the grandchild doesn't do anything except exit(0), but could be made to do whatever you'd like the daemon process to do. The grandchild may live long and will be reclaimed by the init process, when it is complete.
How the below program works and create a Zombie process under linux?
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main ()
{
pid_t child_pid;
child_pid = fork ();
if (child_pid > 0) {
sleep (60);
}
else {
exit (0);
}
return 0;
}
It creates children and doesn't wait (with one of the wait* system call) for them. And zombies are just that: children that the parents hasn't waited yet, the kernel has to maintain some information for them -- mainly the exit status -- in order to be able to return it to the parent.
The setsid() command is missing.
Every *nix process produces an exit status that must be reaped. This is supposed to be reaped by the parent process using a wait() statement, if the child is supposed to terminate first.
The setsid() command switches the parent process to init when the parent terminates before the child process.
Root should be able to remove zombies from the process list using kill -9. Inexperienced programmers sometimes omit setsid(), which will hide bugs that produce errors that would otherwise clog the disk drive.
In days of old, the system administrator would use zombies to identify inexperienced programmers that need additional training to produce good code.
The exit status harvested by init is sent to syslog when the kernel terminates a program prematurely. That exit status is used to identify the nature of the bug that caused the early termination (error conditions not handled by the programmer).
Exit status reported in this way becomes part of the syslog or klog files, which are commonly used to debug code.