Daemon X spawns process Y. Sometimes daemon X could die abruptly and in that case it did not have a chance to properly terminate its child process Y (that is, process Y would remain running in the background). How to make sure that Y gets always terminated whenever X abruptly died?
Currently I have implemented daemon X in a such way that, if it abruptly died, then it gets restarted; reads process' Y pid file and terminates Y by using kill(pid, SIGTERM). This solution, however, has its drawbacks - before killing process Y, I need to make sure that it is indeed process Y (because some other newer process could be reusing the same pid that was in Y's pid file). Even, if X checks process' Y name against /proc/<pid>/, then there is still a small window where theoretically X could be killing wrong process.
Since process Y is not developed by me, I can't use prctl(PR_SET_PDEATHSIG, SIGTERM) from Y.
Also, system("killall Y") is too broad for my use case.
Is there a better way to solve this problem than what I currently have?
The process groups/sessions are probably the way to go. How about this:
If X does setsid(), remove it from the code. Write a very simple wrapper shell script to run the daemon:
#!/bin/sh
/usr/bin/xd || pkill -TERM -s 0
and run it with setsid command. If the daemon process exits abnormally, pkill will send fatal signal to all processes in the current session. The session is inherited through fork()/exec(), and the session leader (shell process) still exists when pkill is executed, so unless child does setsid() as well, there is no chance of child escaping or killing the wrong process.
For Linux only:
Using the function prctl() with option PR_SET_PDEATHSIG allows the parent child to set a signal that is send to their children to it in case the parent dies. The applies only to children created after this option had been set.
Verbatim from the Linux man page for ptctl():
PR_SET_PDEATHSIG (since Linux 2.1.57)
Set the parent process death signal of the calling process to
arg2 (either a signal value in the range 1..maxsig, or 0 to
clear). This is the signal that the calling process will get
when its parent dies. This value is cleared for the child of
a fork(2) and (since Linux 2.4.36 / 2.6.23) when executing a
set-user-ID or set-group-ID binary.
Related
In this code (run on linux):
void child_process()
{
int count=0;
for(;count<1000;count++)
{
printf("Child Process: %04d\n",count);
}
printf("Child's process id: %d\n",getpid());
}
void parent_process()
{
int count=0;
for(;count<1000;count++)
{
printf("Parent Process: %04d\n",count);
}
}
int main()
{
pid_t pid;
int status;
if((pid = fork()) < 0)
{
printf("unable to create child process\n");
exit(1);
}
if(pid == 0)
child_process();
if(pid > 0)
{
printf("Return value of wait: %d\n",wait();
parent_process();
}
return 0;
}
If the wait() were not present in the code, one of the process (child or parent) would finish it's execution and then the control is given to the linux terminal and then finally the process left (child or parent) would run. The output of such a case is:
Parent Process: 0998
Parent Process: 0999
guest#debian:~/c$ Child Process: 0645 //Control given to terminal & then child process is again picked for processing
Child Process: 0646
Child Process: 0647
In case wait() is present in the code, what should be the flow of execution?
When fork() is called then a process tree must be created containing parent and child process. In above code when the processing of child process ends, the parent is informed about the death of child zombie process via wait() system call, but parent and child being two separate processes, is it mandatory that the control is passed the directly to the parent after child process is over? (no control given to other process like terminal at all) - if yes then it is like child process is a part of parent process (like a function called from another function).
This comment is, at least, misleading:
//Control given to terminal & then child process is again picked for processing
The "terminal" process doesn't really enter into the equation. It's always running, assuming that you are using a terminal emulator to interact with your program. (If you're using the console, then there is no terminal process. But that's unlikely these days.)
The process in control of the user interface is whatever shell you're using. You type some command-line like
$ ./a.out
and the shell arranges for your program to run. (The shell is an ordinary user program without special privileges, by the way. You could write your own.)
Specifically, the shell:
Uses fork to create a child process.
Uses waitpid to wait for that child process to finish.
The child process sets up any necessary redirects and then uses some exec system call, typically execve, to replace itself with the ./a.out program, passing execve (or whatever) the command line arguments you specified.
That's it.
Your program, in ./a.out, uses fork to create a child and then possibly waits for the child to finish before terminating. As soon as your parent process terminates, the shell's waitpid() can return, and as soon as it returns, the shell prints a new command prompt.
So there are at least three relevant processes: the shell, your parent process, and your child process. In the absence of synchronisation functions like waitpid(), there are no guarantees about ordering. So when your parent process calls fork(), the created child could start executing immediately. Or not. If it does start executing immediately, it does not necessarily preempt your parent process, assuming your computer is reasonably modern and has more than one core. They could both be executing at the same time. But that's not going to last very long because your parent process will either immediately call exit or immediately call wait.
When a process calls wait (or waitpid), it is suspended and becomes runnable again when the process it is waiting for terminates. But again there are no guarantees. The mere fact that a process is runnable doesn't mean that it will immediately start running. But generally, in the absence of high load, the operating system will start running it pretty soon. Again, it might be running at the same time as another process, such as your child process (if your parent didn't wait for it to finish).
In short, if you performed your experiment a million times, and your parent waits for your child, then you will see the same result a million times; the child must finish before the parent is unsuspended, and your parent must finish before the shell is unsuspended. (If your parent process printed something before waiting, you would see different results; the parent and child outputs could be in any order, or even overlapped.)
If, on the other hand, your parent does not wait for the child, then you could see any of a number of results, and in a million repetitions you're likely to see more than one of them (but not with the same probability). Since there is no synchronisation between parent and child, the outputs could appear in either order (or be interleaved). And since the child is not synchronised with the shell, its output could appear before or after the shell's prompt, or be interleaved with the shell's prompt. No guarantees, other than that the shell will not resume until your parent is done.
Note that the terminal emulator, which is a completely independent process, is runnable the entire time. It owns a pseudo-terminal ("pty") which is how it emulates a terminal. The pseudo-terminal is a kind of pipe; at one end of the pipe is the process which thinks it's communicating with a console, and at the other end is the terminal emulator which interprets whatever is being written to the pty in order to render it in the GUI, and which sends any keystrokes it receives, suitably modified as a character stream back through the pipe. Since the terminal emulator is never suspended and its execution is therefore interleaved with whatever other processes are active on your computer, it will (more or less) immediately show you any output which is sent by your shell or the processes it starts up. (Again, assuming the machine is not overloaded.)
" Thus, the common method for launching a daemon involves forking once or twice, and making the parent processes die while the child process begins performing its normal function."
I was going through OS concepts and I didn't understand the above said lines.
Why the parent process will be made to exit( or parent dying ),in the process of creating a Daemon?
Can someone pls explain me.
Traditionally, a daemon process is defined as a process whose parent is the system's init process and which runs in the background. For instance, if you were to execute some program in your terminal, your shell would create a process (either in the foreground or background) and the program would run with your shell as its parent. This is an example of a non-daemon process because its parent is your shell process.
So how do you produce a process whose parent is the init process? Well, a process whose parent process dies before it (the child) has exited becomes an orphan process. An orphan process will in turn be re-parented to the init process. Voila, the process now meets the definition of a daemon.
Tying this back to your quote, if you were to fork once and then kill the parent, you achieve the desired effect. Likewise, if you fork once and then have that child fork another process, followed by killing the first child, you also achieve the desired effect while keeping the (now grandparent) process alive.
This is not a requirement, as any background process could be a daemon. Technically a daemon process in one that runs to operate some general non interactive task. In Unix environment, a daemon is generally set as a process that have some characteristics: no controlling terminal, no umask, particular working directory, etc. Forking twice is a common way to obtain the grandchild to be inherited by init process and have the former properties, in some way to get a process fully detached of any user control (except root of course).
This applies only if a standard user want to create a daemon. Some other standard daemons are created almost normally (see init, launchd, etc)
If the parent exits while the daemon continues running, the daemon is orphaned, and the init process typically adopts it (i.e. becomes the parent).
There are some exceptions, but it is normally expected that a daemon process will be descended from the init process (e.g. the init process will launch daemons during system startup). So, if another process launches a daemon and terminates, it achieves the desired effect.
Note that some other actions are also needed, such as disassociating the daemon from any tty window.
Other answers already explained what happens when parent dies i.e. child is adopted by init process.
But why above is required to make a process daemon? A daemon by definition is non-interacting program i.e. it should not be associated with a terminal. That ensures that daemon continues to work in background even when user sends signals by Control-C, hangup etc. Now, how to prevent a process from ever attaching to a terminal? Make init it's parent by killing original parent.
init is a special process because:
It's not attached to any terminal.
It's first process (pid 1) after booting OS, and that makes it leader of it's session. Note that every UNIX process belongs a process group and that in turn belongs to a session. First process in the session becomes session leader.
In UNIX, only session leader can attach to (or control) terminal. As soon as you make init parent of your process, it joins init's session. Since init is the session leader, your process can never be the leader and hence can never attach to a terminal. That's what we wanted, right?
There are other ways to detach terminal e.g. calling setsid but that's not part of this discussion.
I have a PID of process that may contain childs. How can I get the PID of all child processes? I make my own PTY handler, so when user run a shell in this handler he may run anymore programs ( directly from shell ), every ran program becomes a child of shell. So, when I press Ctrl+C I need to send signal to the most new process, so need to know PID of that last one.
You should keep explicitly all the pids (result of fork(2)...) of your child processes (and remove a pid once you waited it successfully with wait(2) etc...)
It is up to you to choose the data structures to keep these pids.
Any other approach (e.g. using proc(5)... which is what ps and pstree are doing.) is not very portable and inefficient.
So the basic rule is that every time you call fork you should explicitly keep its result (and test for the 3 cases: 0 if in child process, >0 if in parent process, <0 on error) and use that at wait time.
Read Advanced Linux Programming; it has many pages relevant to that subject.
You might also be interested by process groups and sessions. See setpgrp(2), setsid(2), daemon(3), credentials(7) etc. Notice that with a negative or zero pid kill(2) can send a signal to a process group, and that you could also use killpg(2) for that purpose.
When I call kill() on a process, it returns immediately, because it just send a signal. I have a code where I am checking some (foreign, not written nor modifiable by me) processes in a loop infinitely and if they exceed some limits (too much ram eaten etc) it kills them (and write to a syslog etc).
Problem is that when processes are heavily swapped, it takes many seconds to kill them, and because of that, my process executes the same check against same processes multiple times and attempts to send the signal many times to same process, and write this to syslog as well. (this is not done on purpose, it's just a side effect which I am trying to fix)
I don't care how many times it send a signal to process, but I do care how many times it writes to syslog. I could keep a list of PID's that were already sent the kill signal, but in theory, even if there is low probability, there could be another process spawned with same pid as previously killed one had, which might also be supposed to be killed and in this case, the log would be missing.
I don't know if there is unique identifier for any process, but I doubt so. How could I kill a process either synchronously, or keep track of processes that got signal and don't need to be logged again?
Even if you could do a "synchronous kill", you still have the race condition where you could kill the wrong process. It can happen whenever the process you want to kill exits by its own volition, or by third-party action, after you see it but before you kill it. During this interval, the PID could be assigned to a new process. There is basically no solution to this problem. PIDs are inherently a local resource that belongs to the parent of the identified process; use of the PID by any other process is a race condition.
If you have more control over the system (for example, controlling the parent of the processes you want to kill) then there may be special-case solutions. There might also be (Linux-specific) solutions based on using some mechanisms in /proc to avoid the race, though I'm not aware of any.
One other workaround may be to use ptrace on the target process as if you're going to debug it. This allows you to partially "steal" the parent role, avoiding invalidation of the PID while you're still using it and allowing you to get notification when the process terminates. You'd do something like:
Check the process info (e.g. from /proc) to determine that you want to kill it.
ptrace it, temporarily stopping it.
Re-check the process info to make sure you got the process you wanted to kill.
Resume the traced process.
kill it.
Wait (via waitpid) for notification that the process exited.
This will make the script wait for process termination.
kill $PID
while [ kill -0 $PID 2>/dev/null ]
do
sleep 1
done
kill -0 [pid] tests the existence of a process
The following solution works for most processes that aren't debuggers or processes being debugged in a debugger.
Use ptrace with argument PTRACE_ATTACH to attach to the process. This stops the process you want to kill. At this point, you should probably verify that you've attached to the right process.
Kill the target with SIGKILL. It's now gone.
I can't remember whether the process is now a zombie that you need to reap or whether you need to PTRACE_CONT it first. In either case, you'll eventually have to call waitpid to reap it, at which point you know it's dead.
If you are writing this in C you are sending the signal with the kill system call. Rather than repeatedly sending the terminating signal just send it once and then loop (or somehow periodically check) with kill(pid, 0); The zero value of signal will just tell you if the process is still alive and you can act appropriately. When it dies kill will return ESRCH.
when you spawn these processes, the classical waitpid(2) family can be used
when not used anywhere else, you can move the processes going to be killed into an own cgroup; there can be notifiers on these cgroups which get triggered when process is exiting.
to find out, whether process has been killed, you can chdir(2) into /proc/<pid> or open(2) this directory. After process termination, the status files there can not be accessed anymore. This method is racy (between your check and the action, the process can terminate and a new one with the same pid be spawned).
I'm running child processes in C and I want to pause and then run the same child process. Not really sure how to describe my problem in a better way since I'm new at this but here's a shot.
So I know that you can run a process after another process exits by using waitpid. But what if the process I'm waiting on doesn't exist at the creation of the process that does the waiting. So in this case, I'm thinking of pausing the process that does the waiting and when the process that is waited is created and then finishes, it would call on the process that does the waiting to run again. So how would you do this? Again, I'm not familiar with this, so I don't know if this is the proper way to do this.
edit: What I'm trying to do
I'm using child processes to run command via execvp() in parallel so if I have a sequence sleep 1; sleep 1;, the total sleep time will be 1 second. However there are cases where I try to parallel echo blah > file; cat < file; in which case I'm assuming cat reads the file after echo inputs blah into file. Therefore, I have to wait for echo to finish to do cat. There are more specifics to this, but generally assume that for any command with an output to a file must be waited on by any command that reads the file later in the script.
In Linux: You can set an alarm() before you waitpid() so you can wakeup after a certain number of seconds and waitpid() should return EINTR so you would know the situation and can kill the misbehaving one. Another way would be to use a mutex and having a block like this in the waiting process:
if (pthread_mutex_trylock(&mutex) {
sleep(some seconds);
if (pthread_mutex_trylock(&mutex) {
kill the process
}
}
and the process that is monitored:
ENTRY-POINT:
pthread_mutex_lock(&mutex);
do_stuff();
pthread_mutex_unlock(&mutex);
Any application (process) can only wait with waitpid() on its own direct children. It can't wait on grandchildren or more distant descendants, and it can wait on neither siblings nor ancestors nor on unrelated processes.
If your application is single-threaded, you can't wait on a process that will be created after the waitpid() call starts because there is nothing to do the necessary fork() to create the child.
In a multi-threaded process, you could have one thread waiting for dying children and another thread could be creating the children. For example, you could then have the waitpid() call in thread 1 start at time T0, then have thread 2 create a child at T1 (T1 > T0), and then the child dies at T2, and the waitpid() would pick up the corpse of the child at T3, even though the child was created after the waitpid() started.
Your higher level problem is probably not completely tractable. You can't tell which processes are accessing a given file just by inspecting the command lines in a 'shell script'. You can see those that probably are using it (because the file name appears on the command line); but there may be other processes that have the name hardwired into them and you can't see that by inspecting the command line.