There is a way to look when pid/tid status change with waitpid but this is blocking function.
I want to monitor all threads in specific pid and get signal when one of them change and print the tid.
For now I open threads as count of threads in that process and each 1 make waitpid on 1 tid and after that blocking function finish I print that tid that changed.
How can I get a signal that tid change so I can monitor all tid's in 1 thread.
I didn't want to monitor all pid in system only specific pid/tid.
Those tids/pids are not children of my process.
You can call
int status;
pid_t waitpid(-1, &status, 0);
to wait for any child process change.
So you do not have to specify in advance, which pid to monitor, and can react on any status change. This way you do not need to start one thread for each pid.
As to the signal part of your question: A SIGCHLD is sent to your process when a child process exits. This signal is ignored by default, but you can install a custom signal handler for it, of course.
If you only want to reap specific pids, linux provides the option WNOWAIT, which only reports the state, but does not really reap the child process. Now you can check, if the pid is one of those you want to monitor, and if so, call waitpid() again without the option.
If the processes are not children, waitpid() cannot be used in general. One option is, to attach with ptrace() to these 40 processes to get signalled, if one of these processes exit. This might have unwanted side-effects, however.
If you're using POSIX threads, then you could use pthread_cleanup_push and pthread_cleanup_pop to call a "cleanup" function when your thread is exiting.
This "cleanup" function could then send one of the user signals (SIGUSR1 or SIGUSR2) to the process which then catches it and treats it as a signal about thread termination.
If you use sigqueue you can add the thread-id for the signal handler so it knows which thread just exited.
You can use pthread_sigmask to block the user signal in all threads, to make sure it's only delivered to the main process thread (or use pthread_sigqueue to send to the main process thread specifically).
Related
I am reading https://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551, and there the author handle the sigchld in handler that calls waitpid rather then wait.
In Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are
running children that have not yet terminated.
The handler is as follows:
void sig_chld(int signo)
{
pid_t pid;
int stat;
// while ((pid = wait(&stat)) > 0)
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0)
{
printf("child %d terminated\n", pid);
}
}
The question is, even if I use the blocking version of wait (as is commented out), the child are terminated anyway (which is what I want in order to not have zombies), so why to even bother whether it is in blocking way or non-blocking?
I assume when it is non-blocking way (i.e. with waitpid), then I can call the handler multiple times? (when some childs are terminated, and other are still running). But still I can just block and wait in that handler for all child to terminate. So no difference between calling the handler multiple times or just once. Or is there any other reason for non-blocking and calling the handler multiple times?
The while loop condition will run one more time than there are zombie child processes that need to be waited for. So if you use wait() instead of waitpid() with the WNOHANG flag, it'll potentially block forever if you have another still running child - as wait() only returns early with an ECHLD error if there are no child processes at all. A robust generic handler will use waitpid() to avoid that.
Picture a case where the parent process starts multiple children to do various things, and periodically sends them instructions about what to do. When the first one exits, using wait() in a loop in the SIGCHLD handler will cause it to block forever while the other child processes are hanging around waiting for more instructions that they'll never receive.
Or, say, an inetd server that listens for network connections and forks off a new process to handle each one. Some services finish quickly, some can run for hours or days. If it uses a signal handler to catch exiting children, it won't be able to do anything else until the long-lived one exits if you use wait() in a loop once that handler is triggered by a short-lived service process exiting.
It took me a second to understand what the problem was, so let me spell it out. As everyone else has pointed out, wait(2) may block. waitpid(2) will not block if the WNOHANG option is specified.
You are correct to say that the first call to wait(2) should not block since the the signal handler will only be called if a child exited. However, when the signal handler is called there may be several children that may have exited. Signals are delivered asynchronously, and can be consolidated. So, if two children exit at close to the same time, it's possible that the operating system only sends one signal to the parent process. Thus, a loop must be used to iteratively check if more than one child has exited.
Now that we've established that the loop is necessary for checking if multiple children have exited, it is clear that we can't use wait(2) since it will block on the second iteration if there is a child that has not exited.
TL;DR The loop is necessary, and hence using waitpid(2) is necessary.
I'm writting a toy shell for university and I have to update the status of a background process when it ends. So I came up with the idea of making the handler of SIGCHLD do that, since that signal is sent when the process ends. The problem is that in order to implement the command jobs, I have to update the status from "running" to "terminated" and first I have to find that specific process in the array I have dedicated to it, and one way to do it is by searching by pid, since the array stores the information that is display in jobs. Each entry stores the process pid, the status (which is a string) and the command itself.
Now the question is:
Is there a way to get the pid of the process that called the signal when it ended?
Right now this is what my handler function looks like:
void handler(int sig){
int child_pid;
child_pid = wait(NULL);
//finds the process with a pid identical
//to child_pid in the list and updates its status
...
}
Since wait(NULL) returns the pid of the first process that ends since it's called, the status is only updated when another background process ends and therefore the wrong process status is updated.
We haven't been tought many things from the wait() and waitpid()functions apart from that they waits for a process to end, so any insight may be helpful.
While it is not a good idea to use wait in a signal handler, you can do the following to accomplish what you are trying to do.
void handler(int sig){
int child_pid;
int status;
child_pid = waitpid(-1, &status, WUNTRACED | WNOHANG);
if(child_pid > 0)
{
// your code
// make sure to deal with the cases of child_pid < 0 and child_pid == 0
}
}
What you are doing is not technically wrong, however, it would be better to use waitpid. When you use waitpid(-1,...) it works similarly to if you used wait(...) and will not only wait for a specified process but any process that terminates. The main difference is that you can specify WUNTRACED which will suspend execution until a process in the wait set becomes either terminated or stopped. The WNOHANG will tell waitpid to not suspend the execution of the process. You don't want your handler to be suspended.
If multiple signals are sent to the same process i.e. due to multiple children terminating at the same time, then it will seem like only one signal was sent because the signal handler will not be executed again. This is due to how signals are sent; when a signal is created it is put into an exception table for the process to then "receive" it; signals do not use queues. To account for this, you will need to iterate over the waitpid(-1,...) call until it returns 0 to make sure you are reaping all of the terminated children.
Additionally, do watch out for where else you are reaping the child (note that if the child has already been reaped then waitpid will return 0 if you use the WNOHANG flag). I would assume that this is what is causing the behavior you are seeing of the status only updating when another background process ends. For example, because you are making a toy shell I assume that you are waiting for the foreground processes somewhere, and if you use the wait function there as well as in your handler you could get 1 of two things to happen. The child gets reaped in the 'wait for foreground process' method and then when the handler is executed there is nothing for it to reap. And 2, the child gets reaped in the handler method, and then the 'wait for foreground process' never exits.
I am making a program that creates numerous processes using fork(), which then calls an exec function to the same program (this is required by the professor).
I need it to react to CTRL+C (SIGINT) and ask the user if he/she wants to leave. The problem is that the signal handler is implemented in all the child processes too, so, when the signal is sent, the user has to answer the same amount of times as the number of processes.
I only want it to ask the user once per CTRL+C.
What solutions can I implement?
When you call fork(), the parent process will get back the pid of the child. You can send a SIGTERM or SIGKILL signal to the children through the kill syscall when the parent receives the SIGINT signal.
You can set a global variable pid and populate it with the result of getpid() on launch. And inside the signal handler test getpid() against pid then execute your code. Something like, if you are the main process please proceed, if not exit!
End result: You will have a signal handler that is run once by the main process....
I have process which forks a lot. Child processes do lot of stuff and another system calls.
When ANY child process gets error from system call, it prints error description to stderr and send SIGUSR1 to group leader (main parent process).
SIGUSR1 tells parent to kill all child processes, free resources and exit program execution (to avoid zombie processes).
I need to kill all children at once. Atomically. So when any error happens in ANY child process, all child processes stops with their work immediately.
Currently parent process kills all child processes with SIGUSR2 - It sends this signal to all process group members (killpg) - all of them have signal handler installed which kills them (exit) - group leader won't get killed though (it still needs to free resources).
The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.
How can I achieve this?
Signals are delivered in a async fashion, since both parent and child processes are running, you cannot expect the child process will handle the signal immediately when parent send the signal.
The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.
Your problem is more of a coordination and synchronization between processes, rather than signal handles. There are two ways I can think of:
Use synchronized signals. That is when each child send SIGUSR1 to the parent, they stop working, and wait on SIGUSR2 signal by the waiting functions, like sigtimedwait, or sigwait, in this way, they will not run any additional code before exiting.
Use pipe or socketpair to create communication channels between parent and children, that is, parent send kill instruction to children, and each child will free necessary resources and kill themselves. This requires children to listen on the channel while doing work.
Do you mean that all child processes must stop working as soon as the faulty child send SIGUSR1 ?
If this is what you want, I don't think you can achieve this the way you are doing: when the faulty child sends SIGUSR1 to the leader, the other childs will continue execution until the SIGUSR1 is processed by the leader.
Do you really need the faulty process to send SIGUSR1 first to the leader ? Would not this be possible that the faulty process directly sends SIGUSR2 to the group, which signal can just be ignored by the leader (or, at least, not processed as a termination signal) ?
I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.