C - Atomic killpg

C - Atomic killpg - c

I have process which forks a lot. Child processes do lot of stuff and another system calls.
When ANY child process gets error from system call, it prints error description to stderr and send SIGUSR1 to group leader (main parent process).
SIGUSR1 tells parent to kill all child processes, free resources and exit program execution (to avoid zombie processes).
I need to kill all children at once. Atomically. So when any error happens in ANY child process, all child processes stops with their work immediately.
Currently parent process kills all child processes with SIGUSR2 - It sends this signal to all process group members (killpg) - all of them have signal handler installed which kills them (exit) - group leader won't get killed though (it still needs to free resources).
The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.
How can I achieve this?

Signals are delivered in a async fashion, since both parent and child processes are running, you cannot expect the child process will handle the signal immediately when parent send the signal.
The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.
Your problem is more of a coordination and synchronization between processes, rather than signal handles. There are two ways I can think of:
Use synchronized signals. That is when each child send SIGUSR1 to the parent, they stop working, and wait on SIGUSR2 signal by the waiting functions, like sigtimedwait, or sigwait, in this way, they will not run any additional code before exiting.
Use pipe or socketpair to create communication channels between parent and children, that is, parent send kill instruction to children, and each child will free necessary resources and kill themselves. This requires children to listen on the channel while doing work.

Do you mean that all child processes must stop working as soon as the faulty child send SIGUSR1 ?
If this is what you want, I don't think you can achieve this the way you are doing: when the faulty child sends SIGUSR1 to the leader, the other childs will continue execution until the SIGUSR1 is processed by the leader.
Do you really need the faulty process to send SIGUSR1 first to the leader ? Would not this be possible that the faulty process directly sends SIGUSR2 to the group, which signal can just be ignored by the leader (or, at least, not processed as a termination signal) ?

Related

Don't send SIGINT on CTRL+C to child processes but don't ignore the signal itself

I'm trying to write a Task control program, very much like Supervisor.
I run some programs from a config file and let them run in the background, while in the main process I read and execute other commands.
Before fork()-ing, in the main process I call:
sigaction(SIGINT, &the_handler, NULL);
Where the_handler stores the reference of a simple print function.
When CTRL+C is pressed, the child processes are interrupted as well (which I don't want).
I could run: signal(SIGINT, SIG_IGN); after fork in child process to ignore it, but I would like to still to be able to run this command in bash: $ kill -n 2 <child_pid>, meaning, I don't want to ignore it, right?
So, how to ignore SIGINT from CTRL+C to child processes, but still be able to receive the signal in other ways? Or am I missing something?

The traditional means of doing this is to fork twice. The grand parent forks its children and then waits for them. Each child then forks and exits straight away. Because their parents have exited, the grand children become parented by pid 1. Thus signals sent to the grand parent do not get propagated to the ex-grand children.
See this answer for a bit more detail
https://stackoverflow.com/a/26418006/169346
ETA: You need to call setsid() between the two forks otherwise the grandchild is still in the same process group as the grand parent and will still receive signals that the grand parent receives.

terminal goes down with all children even when SIGKILLed but normal processes don't do the same

I would like to have the same effect in my program as the bash(terminal) does when we kill it using SIGKILL. As we know that we cannot handle SIGKILL in our progams so when ever I kill my program its children are assigned to init process, there is no way to handle it so that I can kill all my child processes and then kill the parent itself. Though when ever we kill terminal all the process created through it are killed even if we kill our terminal by SIGKILL.
For this I did some research and found the below post:
[https://unix.stackexchange.com/questions/54963/how-can-terminal-emulators-kill-their-children-after-recieving-a-sigkill][1]
The post is bit confusing, still what I got out of the post is that if the process you are killing is the process leader of the process group then all its children will be killed.
So for simplicity, I implemented below program to test if it is so:
int main()
{
printf("Curent PID: %u\n", getpid());
// make a new session
pid_t pid = setsid();
printf("New session ID: %u\n", pid);
pid = fork();
switch(pid)
{
case -1:
perror("UNable to fork the process\n");
exit(EXIT_FAILURE);
case 0:
// child process
while (1)
{
sleep(1);
}
break;
}
while (1)
{
printf("Process Leader running\n");
sleep(1);
}
return 0;
}
After running the above program when I killed the parent process the child process didn't got killed. I also modified the above program so that it does not belong to any tty, I thought might be the process leader should not be associated with any tty. I did that by the following way:
Create a normal process (Parent process)
Create a child process from within the above parent process
The process hierarchy at this stage looks like : TERMINAL -> PARENT PROCESS -> CHILD PROCESS
Terminate the the parent process.
The child process now becomes orphan and is taken over by the init process.
Call setsid() function to run the process in new session and have a new group.
then the same above code repeats.
Still when i killed the process leader children were there.
Maybe I didn't got that post on unix.stackexchange or is it the deafult behaviour in LINUX.
One way that I can implement to make all children kill is by catching each TERMINATING SIGNAL like SIGTERM, SIGHUP etc. handle them and write logic inside those signal handlers to kill child first.But still on SIGKILL I can't do anything.
Also I am interested to know that if killing parent process doesn't affect child process even if the parent is process leader or whatever then how bash(terminal) manages to kill all the child processes even if we send SIGKILL to it. Is there some extra logic written for terminals in LINUX kernel.
If there is a way to kill all child processes when parent is killed even using SIGKILL signal, I would be happy to know that too.

Manual page of kill says:
Negative PID values may be used to choose whole process groups
In my understanding to kill the whole group of processes, you have to send negative PID.
Another mechanism is causing that killing a terminal kills its child processes. Processes running from a terminal have their stdin/stdout attached to the terminal. When killing the terminal, those connections are closed and a signal (SIG_HUP) is sent to those processes. A usual program does not handle this signal and is terminated by default.

The advice from Marian is quite correct and well worth researching, but if you choose to follow that route you will likely end up with an implementation of what might be called the "hostage trick".
The hostage trick consists of your root process spawning an artificial child process which spends all its time in the stopped state. This "hostage" will be spawned immediately before the first child process which does real work in your (multi-process) program.
The hostage process is made the leader of its own process group and then enters a loop in which it stops itself with "raise(SIGSTOP)". If it is ever continued, it checks to see whether its parent has terminated (i.e. whether it has been re-parented or cannot signal its parent with the null signal (ESRCH)). If the parent has terminated, then the hostage should terminate, otherwise it should re-suspend with another "raise(SIGSTOP)".
You need to be careful about race conditions: e.g. for the re-parenting test take care to cache the parent-process-id for the hostage as the return value from "getpid()" before "fork()"-ing the hostage and also make "setpgid()" calls downstream of "fork()" in both parent and child. You then need to consider what you do if someone "kill(., SIGKILL)"s the hostage!
True, you can put a SIGCHLD handler in the parent to re-spawn it, but that requires considerable care to preserve the continuity of the identity of the hostage's process group; maybe there were other child processes at the time of the SIGKILL and the replacement hostage should go in the original process group, maybe there weren't and the original process group has evaporated.
Even if you get that right, the fact that you have put a "fork()" call in a handler for an asynchronous signal (SIGCHLD) will likely open another can of worms if your main process uses multiple threads.
Because of these difficulties I would advise against using the hostage trick unless the child processes run code over which you have no control (and to think seriously about the costs in complexity and maintainability even then). If you have control of the code of the child processes, then it is much simpler to use a "pipe()".
You create the pipe in the parent process and manage the file descriptors to ensure that the parent process is the sole writer and that each child process allocates one file descriptor to the read-side. If you do this, then the termination of the parent process (whether due to SIGKILL or any other cause) is communicated to the child processes by the EoF condition on the read side of the pipe as the last writer terminates.
If you want to treat SIGKILL specially, then you can use a protocol on the pipe whereby the parent process sends a termination message advising the children of its termination status on all normal terminations and on catchable fatal signals, and leaves the children to infer that the parent was killed by SIGKILL in the event that the read-side of the pipe delivers an EoF without a preceding termination message.

On Linux prctl(PR_SET_PDEATHSIG... will arrange for a process to receive a signal when it's parent dies, this setting is preserved over exec but not inherited by child processes.

How to kill the parent process and its children on ctrl+C or ctrl+Z

I have the main process in my program that fork() some children processes and then goes into endless loop (Also, the children processes are endless). Now, I want to kill all processes, close a socket, de-attach shared memory, and clean all similar stuff on terminating the program with Ctrl+C or Ctrl+Z. I search the internet and I found that I could do that by sending some signals like SIGSTOP and SIGINT, but I don't know how to do it.So, how can I accomplish this in my program?

From outside the program, you can send any process a signal using the kill command.
By default, kill will send the SIGTERM signal, which will terminate a process, and free its allocated resources. You can use the ps command to find the process ids of your program's processes. Using CTRL-C will only terminate the parent process. It will not kill the child processes. If you just forked, and didn't exec a new program, then all of your child processes will have the same name as the parent, which means you can use the killall command to terminate them all in one go. If you are logged in remotely, then logging out will cause a SIGHUP signal to be sent to all of the processes you spawned during the session, which will terminate them by default.
From inside the program, there is a kill() function that operates similar to the command. You will need the process ids still, so it's important that your parent code remembers the child process id returned by fork.

When your process exits brutally, all resources are certainly freed.
However, if you want to control the behaviour (what order, etc, I don't know what) then you should install a signal handler. See sigaction(2).

kill - does it kill the process right away?

what does kill exactly do?
I have a parent process which is creating 100 (as an example) child processes one after another. At the end of any child's job, I kill the child with kill(pid_of_child, SIGKILL) and I cannot see that in ps output. But if something goes wrong with the parent process and I exit from the parent process with exit(1) (at this point only 1 child is there - I can check tht in ps), at that point I see a lot of <defunct> processes whose ppid is pid of parent process.
How is that possible? did kill not kill the child processes entirely?

kill doesn't kill anything. It sends signals to the target process. SIGKILL is just a signal. Now, the standard action for SIGKILL -- the only action, actually, since SIGKILL can't be handled or ignored by a process -- is to exit, that's true.
The "<defunct>" process is a child that hasn't been reaped, meaning that the parent hasn't called wait() to retrieve the exit status of the child. Until the parent calls wait(), the defunct (or "zombie") process will hang around.

Whenever a process ends, no matter how it ends (kill or otherwise), it will stay in the kernel's process table until its parent process retrieves its exit status (with wait and friends). Leaving it in the process table avoids a number of nasty race conditions.
If your parent process has exited, the children should get reassigned to init, which periodically reaps its children.

Yes, SIGKILL terminates the process, but in any case (either normal exit or terminated), processes have an exit status, which needs to be kept around for potential readers - as such an entry in the process table may remain until this is done. See http://en.wikipedia.org/wiki/Zombie_process .

How to detect defunct processes on Linux?

I have a parent and a child process written in C language. Somewhere in the parent process HUP signal is sent to the child. I want my parent process to detect if the child is dead. But when I send SIGHUP, the child process becomes a zombie. How can I detect if the child is a zombie in the parent process? I try the code below, but it doesn't return me the desired result since the child process is still there but it is defunct.
kill(childPID, 0);
One more question; can I kill the zombie child without killing the parent?
Thanks.

from wikipedia:
On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution but still has an entry in the process table. This entry is still needed to allow the process that started the (now zombie) process to read its exit status.
If the parent fetches the exit status by calling wait, waitpid or the like, the zombie should disappear.
You can detect whether a process is alive through the wait functions (man wait).