There are linux kernel threads that do some work every now and then, then either go to sleep or block on a semaphore. They can be in this state for several seconds - quite a long time for a thread.
If threads need to be stopped for some reason, at least if unloading the driver they belong to, I am looking for a way to get them out of sleep or out of the semaphore without waiting the whole sleep time or triggering the semaphore as often as required.
I found and read a lot about this but there are multiple advises and I am still not sure how things work. So if you could shed some light on that.
msleep_interruptible
What is able to interrupt that?
down_interruptible
This semaphore function implies interrupt-ability. Same here, what can interrupt this semaphore?
kthread_stop
It's described as sets kthread_should_stop to true and wakes it... but this function blocks until the sleep time is over (even if using msleep_interruptible) or the semaphore is triggered.
What am I understanding wrong?
Use a signal to unblock - really?
My search found a signal can interrupt the thread. Other hits say a signal is not the best way to operate on threads.
If a signal is the best choice - which signal do I use to unblock the thread but not mess it up too much?
SIGINT is a termination signal - I don't intend to terminate something, just make it go on.
More information
The threads run a loop that checks a termination flag, does some work and then block in a sleep or a semaphore. They are used for
Situation 1.
A producer-consumer scenario that uses semaphores to synchronize producer and consumer. They work perfectly to make threads wait for work and start running on setting the semaphore.
Currently I'm setting a termination flag, then setting the semaphore up. This unblocks the thread which then checks the flag and terminates. This isn't my major problem. Hovever of course I'd like to know about a better way.
Code sample
while (keep_running) {
do_your_work();
down_interruptible(&mysemaphore); // Intention: break out of this
}
Situation 2.
A thread that periodically logs things. This thread sleeps some seconds between doing it's work. After setting the flag this thread terminates at it's next run but this can take several seconds. I want to break the sleep if necessary.
Code sample
while (keep_running) {
do_your_work();
msleep(15000); // Intention: break out of this - msleep_interruptible?
}
Related
I'm trying code multi threaded worker and job giver program
Job giver thread pushes jobs to array with random delayed data it can be processed 1 second later or 10000... second later all depends to job giver.
Worker thread nanosleep()s till get shortest delayed job, then process it and remove it from job array.
All works fine except if job giver pushes shorter delayed job to array and worker thread still nanosleep()s to old shortest job, so its get delayed more than expected.
For now as quick fix I made signal handler with signal() that handling SIGUSR1 signal.
When job giver pushes new shortest it sends SIGUSR1 to whole program and cancels worker's nanosleep().
But I don't think it's best way to do it since it sends to whole program and I just want to cancel one thread's nanosleep().
So in summary how I can cancel other thread's nanosleep() from main thread without touching signals?
Note; I'm using pthread on linux with C language.
Note; Delays are in nanoseconds. With current setup I'm able to hit 50µs loss.
You can possibly use pthread_kill() to deliver the SIGUSR1 to a single thread
From the manpage:
The pthread_kill() function sends the signal sig to thread, a thread in the same process as the caller. The signal is asynchronously directed to thread.
If sig is 0, then no signal is sent, but error checking is still performed.
This should only have an effect to the single thread you target.
You have a big XY problem here. Sleeping and signals are not the way to implement coordination between threads. I'd go so far as to say that sleeping in a multithreaded program is almost always indicative of some sort of bug.
The tool for what you are trying to do is Condition Variables. If you're not familiar with them, I'd highly recommend the Condition Variables part of this tutorial. Instead of sleeping, your workers should be doing a timed wait on the condition variable, called in a loop, and exiting from the loop when the condition they're waiting for is true.
Lets say we have two mutexes one is called x other one is y
x is used for general locking like don't try access in same time with multiple threads. pthread_mutex_lock and pthread_mutex_unlock
y is used for nanosleep. pthread_cond_wait, pthread_cond_timedwait and pthread_cond_signal
For suspend I use cond_wait y then resume with cond_signal y. If I need suspend for some time like nanosleep I use cond_timedwait y and resume it with same way cond_signal y.
Source:
stackoverflow.com/questions/59286893/canceling-nanosleep-from-another-thread#comment104779089_59286893
I have a boss thread that spawns up to M worker threads. Over the lifetime of the program, workers may be added and removed. When the program-wide shutdown flag is signalled, I want to await the completion of these workers.
Currently, any of the threads can add/remove threads, but it's strictly not a requirement as long as any thread can initiate a spawn/removal.
What's stopping me from using a counting semaphore or pthread_barrier_wait() is that it expects a fixed number of threads.
I can't loop pthread_join() over all workers either because I'd risk leaking zombie threads that have exited and possibly since then been replaced.
The boss thread itself has no other purpose than spawning the threads initially and making sure that the process exits gracefully.
I've spent days on and off on this problem and cannot come up with something robust and simple; are there any fairly well-established ways to accomplish this with POSIX threads?
1) "Currently, any of the threads can add/remove threads"
and
2) "are there any fairly well-established ways to accomplish this with POSIX threads"
Yes. Don't do (1). Have the boss thread do it.
Or, you can protect the code which spawns threads with a critical section or mutex (I assume you are already doing this). They should check a flag to see if shutdown is in progress, and if it is, don't spawn any more threads.
You can also have a counter of "ideal number of threads" and "actual number of threads" and have threads suicide if they find "ideal > actual". (I.e. they should decrement actual, exit the critical section/mutex, then quit).
When you need to initiate shutdown, use the SAME mutex/section to set the flag. Once done, you know the number of threads cannot increase, so you can use the most recent value.
Indeed, to exit you can just have the boss thread set "ideal" to zero, exit the mutex, and repeatedly sleep 10ms and repeat until all threads have exited. Worst case is you wait an extra 10ms to quit. If that's too much cut it to 1ms.
These are just ideas. The central concept is that all thread creation/removal, and messages about thread creation/removal should be protected by a mutex to ensure that only one thread is adding/removing/querying status at a time. Once you have that in place, there is more than one way to do it...
Threads that want to initiate spawns/removals should ask the boss thread to actually do it for them. Then the boss thread doesn't have to worry about threads it doesn't know about, and you can use one of the simple methods you described in your question.
I'll take the opposite tac as some of the other answers since I have to do this now and again.
(1) Give every spawned thread access to a single pipe file descriptor either through the data passed through pthread_create or globally. Only the boss thread reads the pipe. Each thread announces its creation and termination to the boss via the pipe by passing its tid and boss adds or removes it from its list and pthread_joins it as appropriate. Boss can block on the pipe w/o having to do anything special.
(2) Do more or less the above with some other mechanism. Global ctr and list with accompanying condition variable to wake up boss; a message queue, etc.
I'm curious if I am able to do the following with the unistd c function alarm(int signal)
Having my main.... and for each thread hat is created to initializate a SIGALRM with the function, which should close my thread in case of activating. Is this possible? or 1 SIGALRM / main is legal only?
Each thread in a process has an independent signal mask, which
indicates the set of signals that the thread is currently blocking. A
thread can manipulate its signal mask using pthread_sigmask(3). In a
traditional single-threaded application, sigprocmask(2) can be used to
manipulate the signal mask.
from man 7 signal.
The problem is that alarm works per process, not per thread, so if the sigmask of the threads is the same, you can't really know which one will receive the signal.
OK, so first, the alarm() is actually taking an unsigned int value which is the number of seconds before it expires. So your example int signal isn't the correct implementation of alarm(), just FYI.
As far as this goes:
for each thread that is created to initialization a SIGALRM
The SIGALRM that is generated is done so for the process not per thread, so you will have to catch the alarm and have some internal strategy to know which thread you raised it for and handle that accordingly. Once you have your handler, you can raise the alarm over and over again, however keep in mind:
Alarm requests are not stacked;
So you'll have to do this one at a time. It's still possible, but not totally stright forward as you were hoping.
For very rough example of what I'm talking about:
you have a "manager" which keeps track of requests
thread 1 tells the manager it needs to handle something in 10s
the manager "records" this and calls set alarm(10)
thread 2 tells the manager it needs to be woken up in 3 seconds
the manager calls alarm(0) to kill the alarm, calls alarm(3) then notes that once that goes off it needs to call alarm(7) to finish thread 1's sleep time
in your alarm handler you just call the manager and let it know an alarm went off and it will wake the appropriate thread (2) then reset the alarm for the next one.
Without keeping a list of current threads, I'm trying to see that a realtime signal gets delivered to all threads in my process. My idea is to go about it like this:
Initially the signal handler is installed and the signal is unblocked in all threads.
When one thread wants to send the 'broadcast' signal, it acquires a mutex and sets a global flag that the broadcast is taking place.
The sender blocks the signal (using pthread_sigmask) for itself, and enters a loop repeatedly calling raise(sig) until sigpending indicates that the signal is pending (there were no threads remaining with the signal blocked).
As threads receive the signal, they act on it but wait in the signal handler for the broadcast flag to be cleared, so that the signal will remain masked.
The sender finishes the loop by unblocking the signal (in order to get its own delivery).
When the sender handles its own signal, it clears the global flag so that all the other threads can continue with their business.
The problem I'm running into is that pthread_sigmask is not being respected. Everything works right if I run the test program under strace (presumably due to different scheduling timing), but as soon as I run it alone, the sender receives its own signal (despite having blocked it..?) and none of the other threads ever get scheduled.
Any ideas what might be wrong? I've tried using sigqueue instead of raise, probing the signal mask, adding sleep all over the place to make sure the threads are patiently waiting for their signals, etc. and now I'm at a loss.
Edit: Thanks to psmears' answer, I think I understand the problem. Here's a potential solution. Feedback would be great:
At any given time, I can know the number of threads running, and I can prevent all thread creation and exiting during the broadcast signal if I need to.
The thread that wants to do the broadcast signal acquires a lock (so no other thread can do it at the same time), then blocks the signal for itself, and sends num_threads signals to the process, then unblocks the signal for itself.
The signal handler atomically increments a counter, and each instance of the signal handler waits until that counter is equal to num_threads to return.
The thread that did the broadcast also waits for the counter to reach num_threads, then it releases the lock.
One possible concern is that the signals will not get queued if the kernel is out of memory (Linux seems to have that issue). Do you know if sigqueue reliably informs the caller when it's unable to queue the signal (in which case I would loop until it succeeds), or could signals possibly be silently lost?
Edit 2: It seems to be working now. According to the documentation for sigqueue, it returns EAGAIN if it fails to queue the signal. But for robustness, I decided to just keep calling sigqueue until num_threads-1 signal handlers are running, interleaving calls to sched_yield after I've sent num_threads-1 signals.
There was a race condition at thread creation time, counting new threads, but I solved it with a strange (ab)use of read-write locks. Thread creation is "reading" and the broadcast signal is "writing", so unless there's a thread trying to broadcast, it doesn't create any contention at thread-creation.
raise() sends the signal to the current thread (only), so other threads won't receive it. I suspect that the fact that strace makes things work is a bug in strace (due to the way it works it ends up intercepting all signals sent to the process and re-raising them, so it may be re-raising them in the wrong way...).
You can probably get round that using kill(getpid(), <signal>) to send the signal to the current process as a whole.
However, another potential issue you might see is that sigpending() can indicate that the signal is pending on the process before all threads have received it - all that means is that there is at least one such signal pending for the process, and no CPU has yet become available to run a thread to deliver it...
Can you describe more details of what you're aiming to achieve? And how portable you want it to be? There's almost certainly a better way of doing it (signals are almost always a major headache, especially when mixed with threads...)
In multithreaded program raise(sig) is equivalent to pthread_kill(pthread_self(), sig).
Try kill(getpid(), sig)
Given that you can apparently lock thread creation and destruction, could you not just have the "broadcasting" thread post the required updates to thread-local-state in a per-thread queue, which each thread checks whenever it goes to use the thread-local-state? If there's outstanding update(s), it first applies them.
You are trying to synchronize a set of threads.
From a design pattern point of view the pthread native solution for your problem would be a pthread barrier.
Looks like linux doesnt implement pthread_suspend and continue, but I really need em.
I have tried cond_wait, but it is too slow. The work being threaded mostly executes in 50us but occasionally executes upwards of 500ms. The problem with cond_wait is two-fold. The mutex locking is taking comparable times to the micro second executions and I don't need locking. Second, I have many worker threads and I don't really want to make N condition variables when they need to be woken up.
I know exactly which thread is waiting for which work and could just pthread_continue that thread. A thread knows when there is no more work and can easily pthread_suspend itself. This would use no locking, avoid the stampede, and be faster. Problem is....no pthread_suspend or _continue.
Any ideas?
Make the thread wait for a specific signal.
Use pthread_sigmask and sigwait.
Have the threads block on a pipe read. Then dispatch the data through the pipe. The threads will awaken as a result of the arrival of the data they need to process. If the data is very large, just send a pointer through the pipe.
If specific data needs to go to specific threads you need one pipe per thread. If any thread can process any data, then all threads can block on the same pipe and they will awaken round robin.
It seems to me that such a solution (that is, using "pthread_suspend" and "pthread_continue") is inevitably racy.
An arbitrary amount of time can elapse between the worker thread finishing work and deciding to suspend itself, and the suspend actually happening. If the main thread decides during that time that that worker thread should be working again, the "continue" will have no effect and the worker thread will suspend itself regardless.
(Note that this doesn't apply to methods of suspending that allow the "continue" to be queued, like the sigwait() and read() methods mentioned in other answers).
May be try an option of pthread_cancel but be careful if any locks to be released,Read the man page to identify cancel state
Why do you care which thread does the work? It sounds like you designed yourself into a corner and now you need a trick to get yourself out of it. If you let whatever thread happened to already be running do the work, you wouldn't need this trick, and you would need fewer context switches as well.