Monitoring and restarting child process when fails/exits

Monitoring and restarting child process when fails/exits - c

I've created a rudimentary example of monitoring a child process and restarting it when it fails or exits. What is the preferred/more robust method of doing this? Is it good practice to continuously poll for a change in status? My understanding is that I should utilize something like SIGCHLDbut have been unable to find any good examples.
I'm an embedded C coder mainly and this is my first attempt at trying to understand fork().The purpose of this code will eventually be to monitor a call to another program using exec() and restart this program if and when it fails or finishes.
Edit:
After comments from #cnicutar I have rewritten the example below in a way I think makes more sense in the hope that it is of use to somebody later. I would like the parent to monitor a child process whilst foing other things and make a new call to exec on a new fork when the child process fails/finishes. I want to try and use unix signals to achieve this
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
int main(int argc, char *argv[])
{
pid_t cpid;
int status;
cpid = fork();
for (;;)
{
switch (cpid)
{
case -1 : //Fork failure
exit(EXIT_FAILURE);
break;
case 0 : //Child Process
exec(some function);
return 0; //exit child process
break;
default : //Parent process
if (waitpid(-1, &status, WNOHANG) != 1) cpid = fork(); //restart
//Do parent stuff here...
break;
}
}
}

Adding a SIGCHLD handler won't buy you much since you already wait all the time and only do that - there's nothing else to "interrupt".
One thing I would suggest is a threshold such that the process doesn't die/start too often.
My understanding is that I should utilize something like SIGCHLD but
have been unable to find any good examples
You use SIGCHLD to know when to wait. A typical SIGCHLD handler just does waitpid in a loop until no children are left. In your case you don't need that since your main code is a loop stopped on waitpid.
EDIT
You can find plenty of examples for SIGCHLD handling. One such example is How can I handle sigchld in C. In that code, after the while you can just fork:
while((pid = waitpid(-1, &status, WNOHANG)) > 0)
;
pid = fork();
switch (pid)
...
To reiterate, if you do this SIGCHLD will be called every time a child dies and after you properly wait for it you can just fork another. This only makes sense if the parent has better stuff to do in the meantime than to just block on waitpid.
Word to the wise. There are certain functions that must not be called from a signal handler lest you add difficult bugs to your program. Look up "async signal safe" functions: these and only these can be called from a signal handler. Some of the most common functions (like printf and malloc) cannot be called from a signal handler.

Related

Can a fork child determine whether it is a fork or a vfork?

Within the child process, is there any way that it determine whether it was launched as a fork with overlay memory, or a vfork with shared memory?
Basically, our logging engine needs to be much more careful (and not log some classes of activity) in vfork. In fork it needs to cooperate with the parent process in ways that it doesn't in vfork. We know how to do those two things, but not how to decide.
I know I could probably intercept the fork/vfork/clone calls, and store the fork/vfork/mapping status as a flag, but it would make life somewhat simpler if there was an API call the child could make to determine its own state.
Extra marks: Ideally I also need to pick up any places in libraries that have done a fork or vfork and then called back into our code. And how that can happen? At least one of the libraries we have offers a popen-like API where a client call-back is called from the fork child before the exec. Clearly the utility of that call-back is significantly restricted in vfork.

All code not specifically designed to work under vfork() doesn't work under vfork().
Technically, you can check if you're in a vfork() child by calling mmap() and checking if the memory mapping was inherited by the parent process under /proc. Do not write this code. It's a really bad idea and nobody should be using it. Really, the best way to tell if you're in a vfork() child or not is to be passed that information. But here comes the punchline. What are you going to do with it?
The things you can't do as a vfork() child include calling fprintf(), puts(), fopen(), or any other standard I/O function, nor malloc() for that matter. Unless the code is very carefully designed, you're best off not calling into your logging framework at all, and if it is carefully designed you don't need to know. A better design would most likely be log your intent before calling vfork() in the first place.
You ask in comments about a library calling fork() and then back into your code. That's already kind of bad. But no library should ever ever call vfork() and back into your code without being explicitly documented as doing so. vfork() is a constrained environment and calling things not expected to be in that environment really should not happen.

A simple solution could use pthread_atfork(). The callbacks registered with this service are triggered only upon fork(). So, the 3rd parameter of the function, which is called in the child process right after the fork, could update a global variable. The child can check the variable and if it is modified, then it has been forked:
/*
Simple program which demonstrates a solution to
make the child process know if it has been forked or vforked
*/
#include <pthread.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
pid_t forked;
void child_hdl(void)
{
forked = getpid();
}
int main(void)
{
pid_t pid;
pthread_atfork(0, 0, child_hdl);
pid = fork();
if (pid == 0) {
if (forked != 0) {
printf("1. It is a fork()\n");
}
exit(0);
}
// Father continues here
wait(NULL);
pid = vfork();
if (pid == 0) {
if (forked != 0) {
printf("2. It is a fork()\n");
}
_exit(0);
}
// Father continues here
wait(NULL);
return 0;
}
Build/execution:
$ gcc fork_or_vfork.c
$ ./a.out
1. It is a fork()

I came across kcmp today, which looks like it can answer the basic question - i.e. do two tids or pids share the same VM. If you know they represent forked parent/child pids, this can perhaps tell you if they are vfork()ed.
Of course if they are tids in the same process group then they will by definition share VM.
https://man7.org/linux/man-pages/man2/kcmp.2.html
int syscall(SYS_kcmp, pid_t pid1, pid_t pid2, int type,
unsigned long idx1, unsigned long idx2);
KCMP_VM
Check whether the processes share the same address space.
The arguments idx1 and idx2 are ignored. See the
discussion of the CLONE_VM flag in clone(2).

If you were created by vfork, your parent will be waiting for you to terminate. Otherwise, it's still running. Here's some very ugly code:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
void doCheck()
{
char buf[512];
sprintf(buf, "/proc/%d/wchan", (int) getppid());
int j = open(buf, O_RDONLY);
if (j < 0) printf("No open!\n");
int k = read(j, buf, 500);
if (k <= 0) printf("k=%d\n", k);
close(j);
buf[k] = 0;
char *ptr = strstr(buf, "vfork");
if (ptr != NULL)
printf("I am the vfork child!\n");
else
printf("I am the fork child!\n");
}
int main()
{
if (fork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
if (vfork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
}
This is not perfect, the parent might be waiting for a subsequent vfork call to complete.

Why does waitpid in a signal handler need to loop?

I read in an ebook that waitpid(-1, &status, WNOHANG) should be put under a while loop so that if multiple child process exits simultaniously , they are all get reaped.
I tried this concept by creating and terminating 2 child processes at the same time and reaping it by waitpid WITHOUT using loop. And the are all been reaped .
Question is , is it very necessary to put waitpid under a loop ?
#include<stdio.h>
#include<sys/wait.h>
#include<signal.h>
int func(int pid)
{
if(pid < 0)
return 0;
func(pid - 1);
}
void sighand(int sig)
{
int i=45;
int stat, pid;
printf("Signal caught\n");
//while( (
pid = waitpid(-1, &stat, WNOHANG);
//) > 0){
printf("Reaped process %d----%d\n", pid, stat);
func(pid);
}
int main()
{
int i;
signal(SIGCHLD, sighand);
pid_t child_id;
if( (child_id=fork()) == 0 ) //child process
{
printf("Child ID %d\n",getpid());
printf("child exiting ...\n");
}
else
{
if( (child_id=fork()) == 0 ) //child process
{
printf("Child ID %d\n",getpid());
printf("child exiting ...\n");
}
else
{
printf("------------Parent with ID %d \n",getpid());
printf("parent exiting ....\n");
sleep(10);
sleep(10);
}
}
}

Yes.
Okay, I'll elaborate.
Each call to waitpid reaps one, and only one, child. Since you put the call inside the signal handler, there is no guarantee that the second child will exit before you finish executing the first signal handler. For two processes that is okay (the pending signal will be handled when you finish), but for more, it might be that two children will finish while you're still handling another one. Since signals are not queued, you will miss a notification.
If that happens, you will not reap all children. To avoid that problem, the loop recommendation was introduced. If you want to see it happen, try running your test with more children. The more you run, the more likely you'll see the problem.
With that out of the way, let's talk about some other issues.
First, your signal handler calls printf. That is a major no-no. Very few functions are signal handler safe, and printf definitely isn't one. You can try and make your signal handler safer, but a much saner approach is to put in a signal handler that merely sets a flag, and then doing the actual wait call in your main program's flow.
Since your main flow is, typically, to call select/epoll, make sure to look up pselect and epoll_pwait, and to understand what they do and why they are needed.
Even better (but Linux specific), look up signalfd. You might not need the signal handler at all.
Edited to add:
The loop does not change the fact that two signal deliveries are merged into one handler call. What it does do is that this one call handles all pending events.
Of course, once that's the case, you must use WNOHANG. The same artifacts that cause signals to be merged might also cause you to handle an event for which a signal is yet to be delivered.
If that happens, then once your first signal handler exists, it will get called again. This time, however, there will be no pending events (as the events were already extracted by the loop). If you do not specify WNOHANG, your wait block, and the program will be stuck indefinitely.

Detect death of parent process from `setuid` process

I write C application that calls fork() to create child processes. The application runs as root. In the parent process, I use wait() for waiting terminated child processes. In child processes, I use prctl() with PR_SET_PDEATHSIG option to detect the death of the parent. It works fine. To reduce the risk of security issues, child processes call setuid() to change UID. The problem is: child processes can not detect the death of the parent one any more.
I have searched around to find the answer and found some useful links, but it does not help:
Detect death of parent process
Enforcing process hierarchies (prctl related) : although this link contains a clear answer, there is no solution.
How to do that correctly?

I just stumbled upon the same issue, the kernel resets the PDEATH signal on credential change:
https://github.com/torvalds/linux/blob/master/kernel/cred.c#L450
This can be verified with the following code and strace -f:
#include <sys/prctl.h>
#include <unistd.h>
#include <signal.h>
int main(int argc, char *argv[])
{
if (fork() == 0) {
// This works as expected
setgid(1000);
setuid(1000);
prctl(PR_SET_PDEATHSIG, SIGTERM);
// This doesn't work since pdeath_signal will be reset
// setgid(1000);
// setuid(1000);
pause();
}
sleep(1);
kill(getpid(), SIGTERM);
return (0);
}

Waitpid and fork/exec's non-blocking advantage over a syscall?

I always hear that you should never use system() and instead fork/exec because system() blocks the parent process.
If so, am I doing something wrong by calling waitpid(), which also blocks the parent process when I do a fork/exec? Is there a way around calling waitpid...I always thought it was necessary when doing a fork/exec.
pid_t pid = fork();
if (pid == -1)
{
// failed to fork
}
else if (pid > 0)
{
int status;
waitpid(pid, &status, 0);
}
else
{
execve(...);
}

The WNOHANG flag (set in the options argument) will make the call to waitpid() non-blocking.
You'll have to call it periodically to check if the child is finished yet.
Or you could setup SIGCHLD to take care of the children.

If you want to do other stuff whilst the child process is off doing it's thing, you can set up a trap for SIGCHLD that handles the child finishing/exiting. Like in this very simple example.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
pid_t pid;
int finished=0;
void zombie_hunter(int sig)
{
int status;
waitpid(pid, &status, 0);
printf("Got status %d from child\n",status);
finished=1;
}
int main(void)
{
signal(SIGCHLD,zombie_hunter);
pid = fork();
if (pid == -1)
{
exit(1);
}
else if (pid == 0)
{
sleep(10);
exit(0);
}
while(!finished)
{
printf("waiting...\n");
sleep(1);
}
}

I always hear that you should never use system() and instead fork/exec because system() blocks the parent process.
Never say never. If system() has the semantics you want, including, but not limited to, blocking the calling process, then by all means, use it! Do be sure that you understand all those semantics, though.
If your objective is to avoid blocking the parent process, then it is important to understand that the parent can perform an unbounded amount of work between forking a child and collecting it via one of the wait() family of functions. This is very much analogous to starting a new thread, proceeding on with other work, and then eventually joining the thread.
Moreover, if the parent doesn't need to know or care when the child terminates, then it is possible to avoid any need to wait for it at all, ever.

How to prevent creation of zombie processes while using fork() and exec() in Linux?

Is there any way to prevent creation of zombie processes while I am using fork() and exec() to run an application in background? The parent should not wait() for the child to complete. Also I cannot use sigaction() and sigaction.sa_handler because it affects all child processes which I don't want. I want something that will reap that particular child only, or that will prevent from spawning any zombie. Please help.

If you want to create a "detached" process that you don't have to wait for, the best way is to fork twice so that it's a "grandchild" process. Immediately waitpid on the direct child process, which should call _exit immediately after forking again (so this waitpid does not block forward progress in the parent). The grandchild process will be orphaned so it gets inherited by the init process and you never have to deal with it again.
Alternatively you can install a signal handler for SIGCHLD with SA_NOCLDWAIT. However this is a really bad idea in general since its effects are global. It will badly break any library code you use that needs to be able to wait for child processes, including standard library functions like popen, possibly wordexp, possibly grantpt, etc.

To prevent of zombie processes you need to tell the parent to wait for the child, until the child's terminates the process.
You need to use the waitpid() function that is included in the library 'sys/wait.h'
Down here you have an example code that you can use the waitpid() function.
#include <unistd.h>
#include <sys/types.h>
#include <errno.h>
#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>
int main()
{
pid_t child_pid;
int status;
int local = 0;
/* now create new process */
child_pid = fork();
if (child_pid >= 0) /* fork succeeded */
{
if (child_pid == 0) /* fork() returns 0 for the child process */
{
printf("child process!\n");
// Increment the local and global variables
printf("child PID = %d, parent pid = %d\n", getpid(), getppid());
}
else /* parent process */
{
printf("parent process!\n");
printf("parent PID = %d, child pid = %d\n", getpid(), child_pid);
wait(&status); /* wait for child to exit, and store child's exit status */
}
//code ..

#R: In fairness, there ARE usercases where one might fork a job, and where there is absolutely no need to react on the result of the spawned child.
Any call of a wait() function may eventually block the parent if there is no answer, may it? This might crash an airplane...

You can register a signal handler mechanism to prevent the child process to get zombie,
this
Link will be helpful to resolution of your problem.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight