I have three threads, one thread is the main and the other two are worker threads. The first thread, when there is work to be done wakes up one of the two threads. Each thread when awakened perform some computation and while doing this if it finds more work to do can wake up the other working thread or simply decide to do the job by itself (By adding work to a local queue, for example).
While the worker threads have work to do, the main thread must wait for the work to be done. I have implemented this with condition variables as follows (the code reported here hides a lot of details, please ask if there's something non understandable):
MAIN THREAD (pseudocode):
//this function can be called from the main several time. It blocks the main thread till the work is done.
void new_work(){
//signaling to worker threads if work is available
//Now, the threads have been awakened, it's time to sleep till they have finished.
pthread_mutex_lock(&main_lock);
while (work > 0) //work is a shared atomic integer, incremented each time there's work to do and decremented when finished executing some work unit
pthread_cond_wait(&main_cond);
pthread_mutex_unlock(&main_lock);
}
WORKER THREADS:
while (1){
pthread_mutex_lock(&main_lock);
if (work == 0)
pthread_cond_signal(&main_cond);
pthread_mutex_unlock(&main_lock);
//code to let the worker thread wait again -- PROBLEM!
while (I have work to do, in my queue){
do_work()
}
}
Here is the problem: when a worker thread wakes up the main thread I'm not sure that the worker thread calls a wait to put itself in a waiting state for new work. Even if I implement this wait with another condition variable, it can happen that the main thread is awake, does some work until reaches a point in which he has to wake up the thread that has not called a wait yet... and this can lead to bad results. I've tried several ways to solve this issue but I couldn't find a solution, maybe there is an obvious way to solve it but I'm missing it.
Can you provide a scheme to solve this kind of problem? I'm using the C language and I can use whatever synchronization mechanism you think can be suited, like pthreads or posix semaphores.
Thanks
The usual way to handle this is to have a single work queue and protect it from overflow and underflow. Something like this (where I have left off the "pthread_" prefixes):
mutex queue_mutex;
cond_t queue_not_full, queue_not_empty;
void enqueue_work(Work w) {
mutex_lock(&queue_mutex);
while (queue_full())
cond_wait(&queue_not_full, &queue_mutex);
add_work_to_queue(w);
cond_signal(&queue_not_empty);
mutex_unlock(&queue_mutex);
}
Work dequeue_work() {
mutex_lock(&queue_mutex);
while (queue_empty())
cond_wait(&queue_not_empty, &queue_mutex);
Work w = remove_work_from_queue();
cond_signal(&queue_not_full);
mutex_unlock(&queue_mutex);
}
Note the symmetry between these functions: enqueue <-> dequeue, empty <-> full, not_empty <-> not full.
This provides a thread-safe bounded-size queue for any number of threads producing work and any number of threads consuming work. (Actually, it is sort of the canonical example for the use of condition variables.) If your solution does not look exactly like this, it should probably be pretty close...
If you want the main thread to distribute work to the other two, then wait until both threads have completed their work before moving on, you might be able to accomplish this with a barrier.
A barrier is a synchronization construct that you can use to make threads wait at a certain point in your code until a set number of threads are all ready to move on. Essentially, you initialize a pthread barrier, saying that x number of threads must wait on it before any are allowed to continue. As each thread finishes its work and is ready to go on, it will wait on the barrier, and once x number of threads have reached the barrier, they are all allowed to continue.
In your case, you might be able to do something like:
pthread_barrier_t barrier;
pthread_barrier_init(&barrier, 3);
master()
{
while (work_to_do) {
put_work_on_worker_queues();
pthread_barrier_wait(&barrier);
}
}
worker()
{
while(1) {
while (work_on_my_queue()) {
do_work();
}
pthread_barrier_wait(&barrier);
}
}
This should make your main thread give out work, then wait both worker threads to complete the work they were given (if any) before moving on.
Could you have "new job" queue, which is managed by the main thread? The main thread could dish out 1 job at a time to each worker thread. The main thread would also listen for completed jobs by the workers. If a worker thread finds a new job that needs doing just add it to the "new job" queue and the main thread will distribute it.
Pseudocode:
JobQueue NewJobs;
Job JobForWorker[NUM_WORKERS];
workerthread()
{
while(wait for new job)
{
do job (this may include adding new jobs to NewJobs queue)
signal job complete to main thread
}
}
main thread()
{
while(whatever)
{
wait for job completion on any worker thread
now a worker thread is free put a new job on it
}
}
I believe that what you have here is a variation on the producer-consumer problem. What you are doing is writing up an ad-hoc implementation of a counting semaphore (one that is used to provide more than just mutual exclusion).
If I've read your question right, what you are trying to do is have the worker threads block until there is a unit of work available and then perform a unit of work once it becomes available. Your issue is with the case where there is too much work available and the main thread tries to unblock a worker that is already working. I would structure your code as follows.
sem_t main_sem;
sem_init(&main_sem, 0, 0);
void new_work() {
sem_post(&main_sem);
pthread_cond_wait(&main_cond);
}
void do_work() {
while (1) {
sem_wait(&main_sem);
// do stuff
// do more stuff
pthread_cond_signal(&main_sem);
}
}
Now, if the worker threads generate more work then they can simply sem_post to the semaphore and simply defer the pthread_cond_signal till all the work is done.
Note however, if you actually need the main thread to always block when the worker is working, it's not useful to push the work to another thread when you could just call a function that does the work.
Related
I am encountering an issue where I have a hard time telling which synchronization primitive I should use.
I am creating n parallel threads that work on a region of memory, each is assigned to a specific part of this region and can accomplish its task independently from the other ones. At some point tho I need to collect the result of the work of all the threads, which is a good case for using barriers, this is what I'm doing.
I must use one of the n worker threads to collect the result of all their work, for this I have the following code that follows the computation code in my thread function:
if (pthread_barrier_wait(thread_args->barrier)) {
// Only gets called on the last thread that goes through the barrier
// This is where I want to collect the results of the worker threads
}
So far so good, but now is where I get stuck: the code above is in a loop as I want the threads to accomplish work again for a certain number of loop spins. The idea is that each time pthread_barrier_wait unblocks it means all threads have finished their work and the next iteration of the loop / parallel work can start again.
The problem with this is that the result collector block statements are not guaranteed to execute before other threads start working on this region again, so there is a race condition. I am thinking of using a UNIX condition variable like this:
// This code is placed in the thread entry point function, inside
// a loop that also contains the code doing the parallel
// processing code.
if (pthread_barrier_wait(thread_args->barrier)) {
// We lock the mutex
pthread_mutex_lock(thread_args->mutex);
collectAllWork(); // We process the work from all threads
// Set ready to 1
thread_args->ready = 1;
// We broadcast the condition variable and check it was successful
if (pthread_cond_broadcast(thread_args->cond)) {
printf("Error while broadcasting\n");
exit(1);
}
// We unlock the mutex
pthread_mutex_unlock(thread_args->mutex);
} else {
// Wait until the other thread has finished its work so
// we can start working again
pthread_mutex_lock(thread_args->mutex);
while (thread_args->ready == 0) {
pthread_cond_wait(thread_args->cond, thread_args->mutex);
}
pthread_mutex_unlock(thread_args->mutex);
}
There is multiple issues with this:
For some reason pthread_cond_broadcast never unlocks any other thread waiting on pthread_cond_wait, I have no idea why.
What happens if a thread pthread_cond_waits after the collector thread has broadcasted? I believe while (thread_args->ready == 0) and thread_args->ready = 1 prevents this, but then see next point...
On the next loop spin, ready will still be set to 1 hence no thread will call pthread_cond_wait again. I don't see any place where to properly set ready back to 0: if I do it in the else block after pthread_cond_wait, there is the possibility that another thread that wasn't cond waiting yet reads 1 and starts waiting even if I already broadcasted from the if block.
Note I am required to use barriers for this.
How can I solve this issue?
You could use two barriers (work and collector):
while (true) {
//do work
//every thread waits until the last thread has finished its work
if (pthread_barrier_wait(thread_args->work_barrier)) {
//only one gets through, then does the collecting
collectAllWork();
}
//every thread will wait until the collector has reached this point
pthread_barrier_wait(thread_args->collect_barrier);
}
You could use a kind of double buffering.
Each worker would have two storage slots for results.
Between the barriers the workers would store their results to one slot while the collector would read results from the other slot.
This approach has a few advantages:
no extra barriers
no condition queues
no locking
slot identifier does not even have to be atomic because each thread could have it's own copy of it and toggle it whenever reaching a barrier
much more performant as workers can work when collector is processing the other slot
Exemplary workflow:
Iteration 1.
workers write to slot 0
collector does nothing because no data is ready
all wait for barrier
Iteration 2.
worker write to slot 1
collector reads from slot 0
all wait for barrier
Iteration 3.
workers write to slot 0
collector reads from slot 1
all wait for barrier
Iteration 4.
go to iteration 2
I have a multi-threaded application which has a producer-consumer model.
Basically I have 2 structs.
the first one is a struct which contains all the necessary information for the work to be done.
the second one is a struct which is tied to a worker thread, it contains a pointer to the first struct.
like this:
typedef struct worker_struct {
/* information for the work to be done */
} worker_struct;
typedef struct thread_specs {
worker_struct *work;
unsigned short thread_id;
unsigned short pred_cond;
pthread_mutex_t busy_mutex;
pthread_cond_t work_signal;
} thread_specs;
now this is all fine and dandy however now from my producer I know which works needs to be done and I want to link a worker thread to work to be done. My problem is that I have no idea how can I figure out if my worker thread is currently busy or not.
I have a predicate condition with a conditional wait like so:
while ( thread_stuff->pred_cond == 0 ) {
retval = pthread_cond_wait( &(thread_stuff->work_signal), &(thread_stuff->busy_mutex);
if (retval !=0 ) {
strerror_r(retval, strerror_buf, ERRNO_BUFSIZE);
printf("cond wait error! thread: %u, error: %s\n", thread_stuff->thread_id, strerror_buf);
}
}
Now How can I make sure a thread is not busy. If I set a variable protected by a mutex AFTER it has woken up from the signal I get a race condition as I have no guarantee that the variable gets set before my consumer checks again for waiting threads.
the only way I see what I could do is do a pthread_mutex_trylock() on it with the same mutex that is coupled with the conditional wait however this seems kind of expensive and not elegant.
Is there some other way, better way to do something like this, that is figure whether a thread is currently waiting at the predicate condition?
regards
In a typical producer-consumer relationship, producers and consumers are disconnected from each other and communicate through some shared data structure such as a FIFO queue. Producers create jobs and place them on a queue. Consumers remove items from the queue and process them. So there is no need for producers to know whether there is an available consumer. They just queue a job, and the next available consumer will pick it up.
Such a design makes it easy to add or remove producers or consumers because they can be independent of each other.
If you need some kind of signal that a job is currently processing or is complete, you would typically use some kind of signaling mechanism such as an event.
If you want to limit the number of scheduled but not yet processed work items, you would limit the size of the FIFO queue.
A similar question of mine was answered quite well some time back.
C - Guarantee condvars are ready for signalling
It guarantees that your condvar is indeed ready to signal via a protected status boolean, which means that it is asleep. If you're just using a simple setup (ie: one condvar per thread), then you can use this to detect if a thread is asleep or not.
I have the following problem to solve:
Consider an application where there are three types of threads: Calculus-A,Calculus-B and Finalization. Whenever a thread type Calculus-A ends, it calls the routine endA(), which returns immediately. Whenever a thread type Calculus-B ends, it calls the routine endB(), which returns immediately. Threads like Finalization routine call wait(),
which returns only if they have already completed two Calculation-A threads and 2 Calculation-B threads. In other words, for exactly 2 conclusions of Calculus-A and 2 conclusions of Calculus-B one thread Finalization is allowed to continue.
There is an undetermined number of threads of the 3 types. It is not known the order of the routines called by threads. Threads Completion are answered in the order of arrival.
Implement routines endA(), endB() and wait() using semaphores. Besides the variables initialization, the only possible operations are P and V. Solutions with busy-waiting are not acceptable.
Here's is my solution:
semaphore calcA = 2;
semaphore calcB = 2;
semaphore wait = -3;
void endA()
{
P(calcA);
V(wait);
}
void endB()
{
P(calcB);
V(wait);
}
void wait()
{
P(wait);
P(wait);
P(wait);
P(wait);
V(calcA);
V(calcA);
V(calcB);
V(calcB);
}
I believe that there will be a deadlock due to the wait's initialization and if and wait() executes before endA() and endB(). Is there any other solution for this?
I tend to view semaphore problems as problems where one must identify "sources of waiting" and define for each a semaphore and a protocol for their access.
With that in mind, the "sources of waiting" are
Completions of CalcA
Completions of CalcB
Maybe, if I understood this right, a wait on whole completion groups, consisting of two CalcAs and two CalcBs. I say maybe because I'm not sure what "Threads Completion are answered in the order of arrival." means.
Completions of CalcA and CalcB should therefore increment their respective counters. At the other end, one Finalization thread gains exclusive access to the counters and waits in any order for the needed number of completions to constitute a completion group. It then unlocks access to the next group.
My code is below, although since I'm unfamiliar with the Dutch V and P I will use take()/give().
semaphore calcA = 0;
semaphore calcB = 0;
semaphore groupSem = 1;
void endA(){
give(calcA);
}
void endB(){
give(calcB);
}
void wait(){
take(groupSem);
take(calcA);
take(calcA);
take(calcB);
take(calcB);
give(groupSem);
}
The groupSem semaphore ensures all-or-nothing: the thread that enters the critical section will get the next two completions of each of CalcA and CalcB. If groupSem wasn't there, the first thread to enter wait could take two As and block, then be taken over by another thread that grabs two As and two B and then run away.
A worse problem that exists if the groupSem isn't there is if this second thread takes two As, one B and then blocks, and then the first thread grabs the second B. If somehow the result of the finalization allows more runs of CalculationA and CalculationB, then you may have a deadlock, because there may be no more opportunity for instances of calculation A and B to complete, therefore leaving the finalization threads hanging, unable to produce more calculation instances.
I have an application that waits for clients to connect. Each time a client connects, a new frame gets created (with the new socket file descriptor). I know how many clients will connect, after I reach that number I just run pthread_join in a for loop.
My problem is that I would like the main thread to control all the other threads. My goal is to have each thread send the same message back to the client, at the same time, and only once. There are multiple messages a thread can send.
My current thinking is to define a list of command, as follows:
char *commands[] = {
(char*) "TERMINATE\0",
.... };
And then specify a command number that represents which command to use in that char* array. All threads will do something like
write(sockfd, buffer[commandNumber], length[commandNumber]);
I thought about waiting on a condition variable, but I see two problems:
1) I want to make sure that each thread, although synchronized, execute the command only once.
2) The main thread that initiates the command has to know when all those threads is done executing the command.
Only way I see to execute 2) is to keep track of a counter (with mutexes), and when each thread executes the command, it can increase that counter. I am not sure I will be able to avoid a thread from running the command twice.
What is the best possible way please to coordinate multiple threads to execute a single action at once; and also be able to know when that action has finished executing for every thread please?
You might use a barrier to gate the operation.
Synchronizing the send
The main thread initializes a barrier named "Ready" to N+1. Then it begins accept()ing N client connections, spawning a worker thread for each. The new worker threads immediately wait on barrier "Ready".
After spawning the Nth (and last) worker, the main thread sets the desired command (perhaps using a global commandNumber). Then the main thread waits on barrier "Ready". As soon as all workers and the main thread have arrived (reaching the barrier's limit of N+1), all threads are released, knowing that they are ready to issue their command immediately.
(A common alternate approach is to use a predicate and condition variable rather than a barrier. For example, the main thread might spawn the Nth worker and then cond_broadcast() that it has set a flag ready = 1. This approach is flawed. The main thread cannot know that the Nth worker — or, indeed, any of the workers — are yet waiting on that condition. The barrier solves this problem.)
Indicating completion
Another N+1 barrier, "AllDone", could be used to indicate that the workers are all done. A semaphore initialized to -N and posted by workers would do the same. Having the workers close() their connections and the main thread select()ing or poll()ing connections would convey the same information, too.
I'm creating n threads & then starting then execution after a barrier breakdown.
In global data space:
int bkdown = 0;
In main():
pthread_barrier_init(&bar,NULL,n);
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
In thread runner function:
void *runner(void *param)
{
pthread_barrier_wait(&bar);
if(bkdown==0){bkdown=1;printf("barrier broken down!\n");}
...
pthread_exit(NULL);
}
Expected order:
breakdown imminent!
barrier broken down!
breakdown already occurred!
Actual order: (tested repeatedly)
breakdown imminent!
breakdown already occurred!
barrier broken down!!
Could someone explain why the I am not getting the "broken down" message before the "already occurred" message?
The order in which threads are run is dependent on the operating system. Just because you start a thread doesn't mean the OS is going to run it immediately.
If you really want to control the order in which threads are executed, you have to put some kind of synchronization in there (with mutexes or condition variables.)
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
Nothing stops this loop from executing until i == n-1 . pthread_create() just fires off a thread to be run. It doesn't wait for it to start or end. Thus you're at the mercy of the scheduler, which might decide to continue executing your loop, or switch to one of the newly created threads (or do both, on a SMP system).
You're also initalizing the barrier to n, so in any case none of the threads will get past the barrier until you've created all of them.
In addition to the answers of nos and Starkey you have to take into account that you have another serialization in your code that is often neglected: you are doing IO on the same FILE variable, namely stdin.
The access to that variable is mutexed internally and the order in which your n+1 threads (including your calling thread) get access to that mutex is implementation defined, take it basically as random in your case.
So the order in which you get your printf output is the order in which your threads pass through these wormholes.
You can get the expected order in one of two ways
Create each thread with a higher priority than the main thread. This will ensure that new thread will run immediately after creation and wait on the barrier.
Move the "breakdown imminent!\n" print before the pthread_create() and call use a sched_yield() call after every pthread_create(). This will schedule the newly created thread for execution.