I am trying to create a wrapper on Linux which controls how many concurrent executions of something are allowed at once. To do so, I am using a system wide counting semaphore. I create the semaphore, do a sem_wait(), launch the child process and then do a sem_post() when the child terminates. That is fine.
The problem is how to safely handle signals sent to this wrapper. If it doesn't catch signals, the command might terminate without doing a sem_post(), causing the semaphore count to permanently decrease by one. So, I created a signal handler which does the sem_post(). But still, there is a problem.
If the handler is attached before the sem_wait() is performed, a signal could arrive before the sem_wait() completes, causing a sem_post() to occur without a sem_wait(). The reverse is possible if I do the sem_wait() before setting up the signal handler.
The obvious next step was to block signals during the setup of the handler and the sem_wait(). This is pseudocode of what I have now:
void handler(int sig)
{
sem_post(sem);
exit(1);
}
...
sigprocmask(...); /* Block signals */
sigaction(...); /* Set signal handler */
sem_wait(sem);
sigprocmask(...); /* Unblock signals */
RunChild();
sem_post(sem);
exit(0);
The problem now is that the sem_wait() can block and during that time, signals are blocked. A user attempting to kill the process may end up resorting to "kill -9" which is behaviour I don't want to encourage since I cannot handle that case no matter what. I could use sem_trywait() for a small time and test sigpending() but that impacts fairness because there is no longer a guarantee that the process waiting on the semaphore the longest will get to run next.
Is there a truly safe solution here which allows me to handle signals during semaphore acquisition? I am considering resorting to a "Do I have the semaphore" global and removing the signal blocking but that is not 100% safe since acquiring the semaphore and setting the global isn't atomic but might be better than blocking signals while waiting.
Are you sure sem_wait() causes signals to be blocked? I don't think this is the case. The man page for sem_wait() says that the EINTR error code is returned from sem_wait() if it is interrupted by a signal.
You should be able to handle this error code and then your signals will be received. Have you run into a case where signals have not been received?
I would make sure you handle the error codes that sem_wait() can return. Although it may be rare, if you want to be 100% sure you want to cover 100% of your bases.
Are you sure you are approaching the problem correctly? If you want to wait for a child terminating, you may want to use the waitpid() system call. As you observed, it is not reliable to expect the child to do the sem_post() if it may receive signals.
I know this is old, but for the benefit of those still reading this courtesy of Google...
The simplest (and only?) robust solution to this problem is to use a System V semaphore, which allows the client to acquire the semaphore resource in a way which is automatically returned by the kernel NO MATTER HOW THE PROCESS EXITS.
Related
There are linux kernel threads that do some work every now and then, then either go to sleep or block on a semaphore. They can be in this state for several seconds - quite a long time for a thread.
If threads need to be stopped for some reason, at least if unloading the driver they belong to, I am looking for a way to get them out of sleep or out of the semaphore without waiting the whole sleep time or triggering the semaphore as often as required.
I found and read a lot about this but there are multiple advises and I am still not sure how things work. So if you could shed some light on that.
msleep_interruptible
What is able to interrupt that?
down_interruptible
This semaphore function implies interrupt-ability. Same here, what can interrupt this semaphore?
kthread_stop
It's described as sets kthread_should_stop to true and wakes it... but this function blocks until the sleep time is over (even if using msleep_interruptible) or the semaphore is triggered.
What am I understanding wrong?
Use a signal to unblock - really?
My search found a signal can interrupt the thread. Other hits say a signal is not the best way to operate on threads.
If a signal is the best choice - which signal do I use to unblock the thread but not mess it up too much?
SIGINT is a termination signal - I don't intend to terminate something, just make it go on.
More information
The threads run a loop that checks a termination flag, does some work and then block in a sleep or a semaphore. They are used for
Situation 1.
A producer-consumer scenario that uses semaphores to synchronize producer and consumer. They work perfectly to make threads wait for work and start running on setting the semaphore.
Currently I'm setting a termination flag, then setting the semaphore up. This unblocks the thread which then checks the flag and terminates. This isn't my major problem. Hovever of course I'd like to know about a better way.
Code sample
while (keep_running) {
do_your_work();
down_interruptible(&mysemaphore); // Intention: break out of this
}
Situation 2.
A thread that periodically logs things. This thread sleeps some seconds between doing it's work. After setting the flag this thread terminates at it's next run but this can take several seconds. I want to break the sleep if necessary.
Code sample
while (keep_running) {
do_your_work();
msleep(15000); // Intention: break out of this - msleep_interruptible?
}
(This question might be somewhat related to pthread_exit in signal handler causes segmentation fault) I'm writing a leadlock prevention library, where there is always a checking thread doing graph stuff and checks if there is deadlock, if so then it signals one of the conflicting threads. When that thread catches the signal it releases all mutex(es) it owns and exits. There are multiple resource mutexes (obviously) and one critical region mutex, all calls to acquire, release resource lock and do graph calculations must obtain this lock first. Now there goes the problem. With 2 competing (not counting the checking thread) threads, sometimes the program deadlocks after one thread gets killed. In gdb it's saying the dead thread owns critical region lock but never released it. After adding break point in signal handler and stepping through, it appears that lock belongs to someone else (as expected) right before pthread_exit(), but the ownership magically goes to this thread after pthread_exit()..The only guess I can think of is the thread to be killed was blocking at pthread_mutex_lock when trying to gain the critical region lock (because it wanted another resource mutex), then the signal came, interrupting the pthread_mutex_lock. Since this call is not signal-proof, something weird happened? Like the signal handler might have returned and that thread got the lock then exited? Idk.. Any insight is appreciated!
pthread_exit is not async-signal-safe, and thus the only way you can call it from a signal handler is if you ensure that the signal is not interrupting any non-async-signal-safe function.
As a general principle, using signals as a method of communication with threads is usually a really bad idea. You end up mixing two issues that are already difficult enough on their own: thread-safety (proper synchronization between threads) and reentrancy within a single thread.
If your goal with signals is just to instruct a thread to terminate, a better mechanism might be pthread_cancel. To use this safely, however, the thread that will be cancelled must setup cancellation handlers at the proper points and/or disable cancellation temporarily when it's not safe (with pthread_setcancelstate). Also, be aware that pthread_mutex_lock is not a cancellation point. There's no safe way to interrupt a thread that's blocked waiting to obtain a mutex, so if you need interruptability like this, you probably need either a more elaborate synchronization setup with condition variables (condvar waits are cancellable), or you could use semaphores instead of mutexes.
Edit: If you really do need a way to terminate threads waiting for mutexes, you could replace calls to pthread_mutex_lock with calls to your own function that loops calling pthread_mutex_timedlock and checking for an exit flag on each timeout.
I have a main thread, which stays in the main function, i.e. I do not create it specifically as in pthread_create, because it's not necessary. This thread opens a file, then creates other threads, waits for them to finish their work (i.e., does the join), cleans up everything (pointers, semaphores, conditional variables and so on...).
Now, I have to apply this code to block SIGINT:
sigset_t set;
int sig;
sigemptyset(&set);
sigaddset(&set, SIGINT);
pthread_sigmask(SIG_BLOCK, &set, NULL);
while (1) {
sigwait(&set, &sig);
switch (sig) {
case SIGINT:
/* handle interrupts */
break;
default:
/* unexpected signal */
pthread_exit((void *)-1);
}
}
and it says You must use the main() function to launch the N+1 threads and wait for their completion. If a SIGINT signal arrives at the program it should be handled by the main thread in order to shutdown the program and its threads a clean way
My doubt is how should I put this code? Is it wrong to put it on a background thread created in main() ? Because I already have a cicle, with an exit flag, that creates and join all the other threads, so I don't understand if this code goes exactly to the main function where all is done/called to initiate the program. If I put it on a thread, with this code and the handler to clean, is this considerated as busy waiting?
"It says"? What says? The homework assignment?
The first thing you should understand about programming with threads and signals is that you have very little control over which thread a signal is delivered to. If your main thread wants to be the one to get the signal, it should block the signal before creating any new threads and possible unblock it after it finishes creating them, to ensure that the signal is not delivered to them.
However, if you're following best practices for signal handlers, it probably doesn't matter which thread handles the signal. All the signal handler should do is set a global flag or write a byte to a pipe (whichever works best to get the main thread to notice that the signal happened. (Note that you cannot use condition variables or any locking primitives from signal handlers!) As in the code fragment in your question, blocking the signal and using sigwait is also possible (be aware, again, that it needs to be blocked in all threads), but most programs can't afford to stop and wait just for signals; they need to wait for condition variables and/or input from files as well. One way to solve this issue is to make a dedicated thread to call sigwait, but that's rather wasteful. A better solution, if you're already using select, would be to switch to pselect that can wait for signals as well as file descriptor events (at the same time).
Rather than asking us for the answers (which would be hard to give anyway without seeing the full program you're trying to make this work with), you'd be much better off trying to really understand the intricacies of signals with threads.
Without keeping a list of current threads, I'm trying to see that a realtime signal gets delivered to all threads in my process. My idea is to go about it like this:
Initially the signal handler is installed and the signal is unblocked in all threads.
When one thread wants to send the 'broadcast' signal, it acquires a mutex and sets a global flag that the broadcast is taking place.
The sender blocks the signal (using pthread_sigmask) for itself, and enters a loop repeatedly calling raise(sig) until sigpending indicates that the signal is pending (there were no threads remaining with the signal blocked).
As threads receive the signal, they act on it but wait in the signal handler for the broadcast flag to be cleared, so that the signal will remain masked.
The sender finishes the loop by unblocking the signal (in order to get its own delivery).
When the sender handles its own signal, it clears the global flag so that all the other threads can continue with their business.
The problem I'm running into is that pthread_sigmask is not being respected. Everything works right if I run the test program under strace (presumably due to different scheduling timing), but as soon as I run it alone, the sender receives its own signal (despite having blocked it..?) and none of the other threads ever get scheduled.
Any ideas what might be wrong? I've tried using sigqueue instead of raise, probing the signal mask, adding sleep all over the place to make sure the threads are patiently waiting for their signals, etc. and now I'm at a loss.
Edit: Thanks to psmears' answer, I think I understand the problem. Here's a potential solution. Feedback would be great:
At any given time, I can know the number of threads running, and I can prevent all thread creation and exiting during the broadcast signal if I need to.
The thread that wants to do the broadcast signal acquires a lock (so no other thread can do it at the same time), then blocks the signal for itself, and sends num_threads signals to the process, then unblocks the signal for itself.
The signal handler atomically increments a counter, and each instance of the signal handler waits until that counter is equal to num_threads to return.
The thread that did the broadcast also waits for the counter to reach num_threads, then it releases the lock.
One possible concern is that the signals will not get queued if the kernel is out of memory (Linux seems to have that issue). Do you know if sigqueue reliably informs the caller when it's unable to queue the signal (in which case I would loop until it succeeds), or could signals possibly be silently lost?
Edit 2: It seems to be working now. According to the documentation for sigqueue, it returns EAGAIN if it fails to queue the signal. But for robustness, I decided to just keep calling sigqueue until num_threads-1 signal handlers are running, interleaving calls to sched_yield after I've sent num_threads-1 signals.
There was a race condition at thread creation time, counting new threads, but I solved it with a strange (ab)use of read-write locks. Thread creation is "reading" and the broadcast signal is "writing", so unless there's a thread trying to broadcast, it doesn't create any contention at thread-creation.
raise() sends the signal to the current thread (only), so other threads won't receive it. I suspect that the fact that strace makes things work is a bug in strace (due to the way it works it ends up intercepting all signals sent to the process and re-raising them, so it may be re-raising them in the wrong way...).
You can probably get round that using kill(getpid(), <signal>) to send the signal to the current process as a whole.
However, another potential issue you might see is that sigpending() can indicate that the signal is pending on the process before all threads have received it - all that means is that there is at least one such signal pending for the process, and no CPU has yet become available to run a thread to deliver it...
Can you describe more details of what you're aiming to achieve? And how portable you want it to be? There's almost certainly a better way of doing it (signals are almost always a major headache, especially when mixed with threads...)
In multithreaded program raise(sig) is equivalent to pthread_kill(pthread_self(), sig).
Try kill(getpid(), sig)
Given that you can apparently lock thread creation and destruction, could you not just have the "broadcasting" thread post the required updates to thread-local-state in a per-thread queue, which each thread checks whenever it goes to use the thread-local-state? If there's outstanding update(s), it first applies them.
You are trying to synchronize a set of threads.
From a design pattern point of view the pthread native solution for your problem would be a pthread barrier.
I use pthread_create(&thread1, &attrs, //... , //...); and need if some condition occured need to kill this thread how to kill this ?
First store the thread id
pthread_create(&thr, ...)
then later call
pthread_cancel(thr)
However, this not a recommended programming practice! It's better to use an inter-thread communication mechanism like semaphores or messages to communicate to the thread that it should stop execution.
Note that pthread_kill(...) does not actually terminate the receiving thread, but instead delivers a signal to it, and it depends on the signal and signal handlers what happens.
There are two approaches to this problem.
Use a signal: The thread installs a signal handler using sigaction() which sets a flag, and the thread periodically checks the flag to see whether it must terminate. When the thread must terminate, issue the signal to it using pthread_kill() and wait for its termination with pthread_join(). This approach requires pre-synchronization between the parent thread and the child thread, to guarantee that the child thread has already installed the signal handler before it is able to handle the termination signal;
Use a cancellation point: The thread terminates whenever a cancellation function is executed. When the thread must terminate, execute pthread_cancel() and wait for its termination with pthread_join(). This approach requires detailed usage of pthread_cleanup_push() and pthread_cleanup_pop() to avoid resource leakage. These last two calls might mess with the lexical scope of the code (since they may be macros yielding { and } tokens) and are very difficult to maintain properly.
(Note that if you have already detached the thread using pthread_detach(), you cannot join it again using pthread_join().)
Both approaches can be very tricky, but either might be specially useful in a given situation.
I agree with Antti, better practice would be to implement some checkpoint(s) where the thread checks if it should terminate. These checkpoints can be implemented in a number of ways e.g.: a shared variable with lock or an event that the thread checks if it is set (the thread can opt to wait zero time).
Take a look at the pthread_kill() function.
pthread_exit(0)
This will kill the thread.