Managing a variable number of worker threads with graceful exit - c

I have a boss thread that spawns up to M worker threads. Over the lifetime of the program, workers may be added and removed. When the program-wide shutdown flag is signalled, I want to await the completion of these workers.
Currently, any of the threads can add/remove threads, but it's strictly not a requirement as long as any thread can initiate a spawn/removal.
What's stopping me from using a counting semaphore or pthread_barrier_wait() is that it expects a fixed number of threads.
I can't loop pthread_join() over all workers either because I'd risk leaking zombie threads that have exited and possibly since then been replaced.
The boss thread itself has no other purpose than spawning the threads initially and making sure that the process exits gracefully.
I've spent days on and off on this problem and cannot come up with something robust and simple; are there any fairly well-established ways to accomplish this with POSIX threads?

1) "Currently, any of the threads can add/remove threads"
and
2) "are there any fairly well-established ways to accomplish this with POSIX threads"
Yes. Don't do (1). Have the boss thread do it.
Or, you can protect the code which spawns threads with a critical section or mutex (I assume you are already doing this). They should check a flag to see if shutdown is in progress, and if it is, don't spawn any more threads.
You can also have a counter of "ideal number of threads" and "actual number of threads" and have threads suicide if they find "ideal > actual". (I.e. they should decrement actual, exit the critical section/mutex, then quit).
When you need to initiate shutdown, use the SAME mutex/section to set the flag. Once done, you know the number of threads cannot increase, so you can use the most recent value.
Indeed, to exit you can just have the boss thread set "ideal" to zero, exit the mutex, and repeatedly sleep 10ms and repeat until all threads have exited. Worst case is you wait an extra 10ms to quit. If that's too much cut it to 1ms.
These are just ideas. The central concept is that all thread creation/removal, and messages about thread creation/removal should be protected by a mutex to ensure that only one thread is adding/removing/querying status at a time. Once you have that in place, there is more than one way to do it...

Threads that want to initiate spawns/removals should ask the boss thread to actually do it for them. Then the boss thread doesn't have to worry about threads it doesn't know about, and you can use one of the simple methods you described in your question.

I'll take the opposite tac as some of the other answers since I have to do this now and again.
(1) Give every spawned thread access to a single pipe file descriptor either through the data passed through pthread_create or globally. Only the boss thread reads the pipe. Each thread announces its creation and termination to the boss via the pipe by passing its tid and boss adds or removes it from its list and pthread_joins it as appropriate. Boss can block on the pipe w/o having to do anything special.
(2) Do more or less the above with some other mechanism. Global ctr and list with accompanying condition variable to wake up boss; a message queue, etc.

Related

Does sem_post wake up a random process

Suppose 10 processes are waiting on a semaphore using sem_wait().
and an 11th process calls sem_post on that semaphore.
which of the 10 processes will enter the critical block?
Is it like random? All the process will wake up and strive to achieve a lock.
and CPU will provide a lock to one of the processes and the rest will go back to waiting for state
The POSIX standard doesn't specify which thread will be woken up. Moreover, without artificial delays it's impossible for threads to start waiting on a semaphore in a well-defined order.
In practice, it's likely to be the thread which has been waiting the longest, as a queue structure is used to record threads waiting on a synchronization object. It definitely won't be a 'random' thread. But it's also not something you should depend on for the correctness of your code.

Close all threads, except the main

Is there a way to close all created threads if I don't have a list of their identifiers?
It is assumed that I only need the main thread, and the rest can be closed.
It's usually a good idea to have threads in charge of their own lifetime, periodically checking for some event indicating they should shut down. This usually make the architecture of your code much easier to understand.
What I'm talking about is along the lines of (pseudo-code):
def main():
# Start up all threads.
synchronised runFlag = true
for count = 1 to 10:
start thread threadFn, receiving id[count]
sleep for a bit
# Tell them all to exit, then wait.
synchronised runFlag = false
for count = 1 to 10:
wait for thread id[count] to exit
exit program
def threadFn():
initialise
# Thread will do its stuff until told to stop.
while synchronised runFlag:
do something relatively quick
exit thread
The periodic checking is a balance between efficiency of the thread loop and the amount of time you may have to wait for the thread to exit.
And, yes, I'm aware that pseudo-code uses identifiers (that you specifically stated you didn't have), but that's just one example of how to effect shutdown. You could equally, for example:
maintain a (synchronised) thread count incremented as a thread starts and decremented when it stops, then wait for it to reach zero;
have threads continue to run while a synchronised counter hasn't changed from the value it was when the thread started (you could just increment the counter in main then freely create a new batch of threads, knowing that the old ones would eventually disappear since the counter is different).
do one of a half dozen other things, depending on your needs :-)
This "lifetime handled by thread" approach is often the simplest way to achieve things since the thread is fully in control of when things happen to it. The one thing you don't want is a thread being violently killed from outside while it holds a resource lock of some sort.
Some threading implementations have ways to handle that with, for example, cancellability points, so you can cancel a thread from outside and it will die at such time it allows itself to. But, in my experience, that just complicates things.
In any case, pthread_cancel requires a thread ID so is unsuitable based on your requirements.
Is there a way to close all created threads if I don't have a list of their identifiers?
No, with POSIX threads there is not.
It is assumed that I only need the main thread, and the rest can be closed.
What you could do is have main() call fork() and let the calling main() (the parent) return, which will end the parent process along with all its thread.
The fork()ed off child process would live on as a copy of the original parent process' main() but without any other threads.
If going this route be aware, that the threads of the process going down might very well run into undefined behaviour, so that strange things might happen including messy left-overs.
All in all a bad approach.
Is there a way to close all created threads if I don't have a list of their identifiers? It is assumed that I only need the main thread, and the rest can be closed.
Technically, you can fork your process and terminate the parent. Only the thread calling fork exists in the new child process. However, the mutexes locked by other threads remain locked and this is why forking a multi-threaded process without immediately calling exec may be unwise.

Correct way to unblock a kernel thread

There are linux kernel threads that do some work every now and then, then either go to sleep or block on a semaphore. They can be in this state for several seconds - quite a long time for a thread.
If threads need to be stopped for some reason, at least if unloading the driver they belong to, I am looking for a way to get them out of sleep or out of the semaphore without waiting the whole sleep time or triggering the semaphore as often as required.
I found and read a lot about this but there are multiple advises and I am still not sure how things work. So if you could shed some light on that.
msleep_interruptible
What is able to interrupt that?
down_interruptible
This semaphore function implies interrupt-ability. Same here, what can interrupt this semaphore?
kthread_stop
It's described as sets kthread_should_stop to true and wakes it... but this function blocks until the sleep time is over (even if using msleep_interruptible) or the semaphore is triggered.
What am I understanding wrong?
Use a signal to unblock - really?
My search found a signal can interrupt the thread. Other hits say a signal is not the best way to operate on threads.
If a signal is the best choice - which signal do I use to unblock the thread but not mess it up too much?
SIGINT is a termination signal - I don't intend to terminate something, just make it go on.
More information
The threads run a loop that checks a termination flag, does some work and then block in a sleep or a semaphore. They are used for
Situation 1.
A producer-consumer scenario that uses semaphores to synchronize producer and consumer. They work perfectly to make threads wait for work and start running on setting the semaphore.
Currently I'm setting a termination flag, then setting the semaphore up. This unblocks the thread which then checks the flag and terminates. This isn't my major problem. Hovever of course I'd like to know about a better way.
Code sample
while (keep_running) {
do_your_work();
down_interruptible(&mysemaphore); // Intention: break out of this
}
Situation 2.
A thread that periodically logs things. This thread sleeps some seconds between doing it's work. After setting the flag this thread terminates at it's next run but this can take several seconds. I want to break the sleep if necessary.
Code sample
while (keep_running) {
do_your_work();
msleep(15000); // Intention: break out of this - msleep_interruptible?
}

How to create a passive waiting FIFO of thread in C?

I'm trying to figure out a way to put some thread in a passive waiting mode and wake them up as they arrive to the barrier. I have a fixed amount of thread that should arrive.
I was first thinking about a semaphore that i would initialise at 0 so it would block but they will be released in a random way. I would like to implement a system that would release the thread in the order the came to the barrier of synchronisation like a FIFO.
I also thought about using 2 semaphore, on that would block, release a thread and sort it. If the thread is the good one then it just goes, if not then it's blocked by the second semaphore. However this system seems kind of long and fastidious.
Does someone have an idea or suggestions that would help me ?
Thank you very much :)
On Linux, you can just use a condition variable and a mutex to block and unblock threads in the same FIFO order.
This is because all waiters on a condition variable append to the futex wait queue in the kernel in order. Waking up the waiters happens in the same FIFO order. As long as you keep the mutex locked while signaling the condition variable.
However, as commenters mentioned, this is a poor idea to depend on thread execution order.

Semaphore queues

I'm extending the functionality of a semaphore. I ran into a roadblock when I realized I don't know the implementation of an actual semaphore and to make sure my code ran correctly, I needed to know this.
I know a semaphore works by blocking threads that are waiting on it when they call sem_wait() and another thread currently has it locked. The thread is then blocked and then put into a wait list for that semaphore.
My question relates to what happens on a sem_post(). Is the next thread pulled off the waiting list, set as the locking thread, and allowed to be unblocked? Or is the scheme for posting completely different?
Thanks!
The next thread to unblock on it's sem_wait() will be whatever thread the OS decides is the next one to context switch into. Nobody makes any guarantee of ordering; it depends on your OS's scheduling strategy. It might be the thread that has been off the CPU for the longest, or the one that has been assigned the highest "priority", or the one that has historically had certain resource-usage statistics, or whatever.
Most likely, your current thread (the one that called sem_post()) will continue running for a while, until it either starts waiting for user input, blocks on another semaphore, or runs out of its os-allotted time slice. Then, the OS will switch in some totally unrelated process to run for a fraction of a second (probably Firefox or something), then go off and handle some network traffic, get itself a cup of tea, and, finally, when it gets around to it, pick whichever of your other threads it feels like, based on something like whether it feels based on past history that the particular thread is more CPU or I/O-bound.
In many OSes, priority is given to I/O-bound processes that haven't been around for very long. The theory is that new processes might be short-lived (if it's been around for five hours already, odds are it won't be finishing up in the next 1ms) so we might as well get them over with. I/O-bound processes are likely to continue to be I/O-bound, which means that chances are they are going to switch off the CPU shortly while waiting for other resources. Basically, the OS wants to find the process that it's going to be able to be done with ASAP, so it can get back to sipping its tea and running your malware.
Semaphores have two operations:
P() To acquire the semaphore (you seem to call this sem_wait)
V() To release the semaphore (you seem to call this sem_post)
Semaphores also have an integer associated to them, which is the number of concurrent threads allowed to pass P() without blocking. Other calls to P() will block until V() is called to free up spots.
That is the classic definition of a semaphore.
Edit: Semaphores do not make any guarantee of order. They don't have to actually use a queue or other FIFO structure. When only one thread is allowed at a time, when it calls V(), another (possibly random) thread will then return from its P() call and continue.
According to the IEEE standards, the behavior of POSIX semaphores:
If the semaphore value resulting from this operation is positive, then no threads were blocked waiting for the semaphore to become unlocked; the semaphore value is simply incremented.
If the value of the semaphore resulting from this operation is zero, then one of the threads blocked waiting for the semaphore shall be allowed to return successfully from its call to sem_wait(). If the Process Scheduling option is supported, the thread to be unblocked shall be chosen in a manner appropriate to the scheduling policies and parameters in effect for the blocked threads. In the case of the schedulers SCHED_FIFO and SCHED_RR, the highest priority waiting thread shall be unblocked, and if there is more than one highest priority thread blocked waiting for the semaphore, then the highest priority thread that has been waiting the longest shall be unblocked. If the Process Scheduling option is not defined, the choice of a thread to unblock is unspecified.
If the Process Sporadic Server option is supported, and the scheduling policy is SCHED_SPORADIC, the semantics are as per SCHED_FIFO above."

Resources