How to properly synchronize threads at barriers - c

I am encountering an issue where I have a hard time telling which synchronization primitive I should use.
I am creating n parallel threads that work on a region of memory, each is assigned to a specific part of this region and can accomplish its task independently from the other ones. At some point tho I need to collect the result of the work of all the threads, which is a good case for using barriers, this is what I'm doing.
I must use one of the n worker threads to collect the result of all their work, for this I have the following code that follows the computation code in my thread function:
if (pthread_barrier_wait(thread_args->barrier)) {
// Only gets called on the last thread that goes through the barrier
// This is where I want to collect the results of the worker threads
}
So far so good, but now is where I get stuck: the code above is in a loop as I want the threads to accomplish work again for a certain number of loop spins. The idea is that each time pthread_barrier_wait unblocks it means all threads have finished their work and the next iteration of the loop / parallel work can start again.
The problem with this is that the result collector block statements are not guaranteed to execute before other threads start working on this region again, so there is a race condition. I am thinking of using a UNIX condition variable like this:
// This code is placed in the thread entry point function, inside
// a loop that also contains the code doing the parallel
// processing code.
if (pthread_barrier_wait(thread_args->barrier)) {
// We lock the mutex
pthread_mutex_lock(thread_args->mutex);
collectAllWork(); // We process the work from all threads
// Set ready to 1
thread_args->ready = 1;
// We broadcast the condition variable and check it was successful
if (pthread_cond_broadcast(thread_args->cond)) {
printf("Error while broadcasting\n");
exit(1);
}
// We unlock the mutex
pthread_mutex_unlock(thread_args->mutex);
} else {
// Wait until the other thread has finished its work so
// we can start working again
pthread_mutex_lock(thread_args->mutex);
while (thread_args->ready == 0) {
pthread_cond_wait(thread_args->cond, thread_args->mutex);
}
pthread_mutex_unlock(thread_args->mutex);
}
There is multiple issues with this:
For some reason pthread_cond_broadcast never unlocks any other thread waiting on pthread_cond_wait, I have no idea why.
What happens if a thread pthread_cond_waits after the collector thread has broadcasted? I believe while (thread_args->ready == 0) and thread_args->ready = 1 prevents this, but then see next point...
On the next loop spin, ready will still be set to 1 hence no thread will call pthread_cond_wait again. I don't see any place where to properly set ready back to 0: if I do it in the else block after pthread_cond_wait, there is the possibility that another thread that wasn't cond waiting yet reads 1 and starts waiting even if I already broadcasted from the if block.
Note I am required to use barriers for this.
How can I solve this issue?

You could use two barriers (work and collector):
while (true) {
//do work
//every thread waits until the last thread has finished its work
if (pthread_barrier_wait(thread_args->work_barrier)) {
//only one gets through, then does the collecting
collectAllWork();
}
//every thread will wait until the collector has reached this point
pthread_barrier_wait(thread_args->collect_barrier);
}

You could use a kind of double buffering.
Each worker would have two storage slots for results.
Between the barriers the workers would store their results to one slot while the collector would read results from the other slot.
This approach has a few advantages:
no extra barriers
no condition queues
no locking
slot identifier does not even have to be atomic because each thread could have it's own copy of it and toggle it whenever reaching a barrier
much more performant as workers can work when collector is processing the other slot
Exemplary workflow:
Iteration 1.
workers write to slot 0
collector does nothing because no data is ready
all wait for barrier
Iteration 2.
worker write to slot 1
collector reads from slot 0
all wait for barrier
Iteration 3.
workers write to slot 0
collector reads from slot 1
all wait for barrier
Iteration 4.
go to iteration 2

Related

Why is there a while loop in wait function of a semaphore, when if can be used too?

this is my code:
wait(){
while(S<=0)
//puts the thread in the block list until it wakes up(by calling post)
S = S-1
}
there is a while loop in the wait function of a semaphore, can't I use an if statement simply?
Because we can't assume that after a thread is woken up and it requires the lock another thread has not already come along and taken the resource this is guarding:
wait(){
Some lock_guard(mutex); // You lock here.
while(S<=0) {
condition.wait(lock_guard); // While you wait here
// the lock is released.
// when the condition/semaphore is signalled
// one or more threads may be released
// but they must aquire the lock before
// they return from wait.
//
// Another thread may enter this function
// aquire the lock and decrement S below
// before the waiting thread aquires the
// lock and thus mustbe resuspended.
}
S = S-1
}
Why is there a while loop in wait function of a semaphore, when if can be used too?
I take the
//puts the thread in the block list until it wakes up(by calling post)
comment as a place-holder for code that really does do what the comment describes, and the code overall to be meant as schematic for an implementation of a semaphore (else there is no semaphore to be found in it, and the [linux-kernel] tag also inclines me in this direction). In that event ...
Consider the case that two threads are blocked trying to decrement the semaphore. A third thread increments the semaphore to value 1, causing both of the first two to unblock. Only one of erstwhile-blocked threads can be allowed to decrement the semaphore at that point, else its value would drop below zero. The other needs to detect that it cannot proceed after all, and go back to waiting. That's what the loop accomplishes.
What you have here is called active waiting. Thread or process waits for variable S to change it value to 1 in order to access critical section. One IF would only check once and then go to futher instruction (in this case instruction from critical section, which would be huge error). Thats why it should wait in loop - in order to actually wait, not only check condition once.
But your code is not doing what you think it does.
while(S == 0) {}
or
while(S == 0);
would do the work. Your code constantly does S = S - 1 and with your condition it creates infinite loop. S in semaphores should never go lower than 0, as it would mean that one thread went to critical section without permisson.

Synchronization between threads using pthread library

For example, we have 5 pieces of data.(assume we have a lot of space, different version of data will not overlap each others.)
DATA0, DATA1, DATA2, DATA3, DATA4.
We have 3 threads(less than 5) working on those data.
Thread 1, working on DATA1 (version 0), has accessed some data from both DATA0(version 0) and DATA2(version 0), and create DATA1(version 1).
Thread 2, working on DATA3 (version 0), has accessed some data from both DATA2(version 0) and DATA4(version 0), and create DATA3(version 1).
Thread 3, working on DATA2 (version 0), has accessed some data from both DATA1(version 0) and DATA3(version 0), and create DATA2(version 1).
Now, if thread 1 finishes first. It has several choices, it can work on DATA0 (to create DATA0 version 1) since DATA1(version 0) and DATA4 (version 0) is available (Assume DATA0 & DATA4 are neighbors). It can also work on DATA 2 if it finds out that both DATA1(version1) and DATA3(version1) are available and create DATA2(version 2).
The requirement is the next version of data can be processed once it's neighbor data is ready(in 1 lower version).
At last, I want all threads to exit when all data arrive at version 10.
Question: How to implement this scheme using pthread library.
Note: I want to have data in different versions at the same time, so to create a barrier and make sure all data reach the same version is not an option.
Lets discuss the implementation. To have all versions (0~10) stored we would need 5*11*sizeof(data) space. Let us create two arrays of size 5 x 11. First array is DATA such that DATA[i][j] is the j th version of data i. Second array is an 'Access Matrix' - A, it denotes the state of an index, it could be:
Not started
In Progress
Completed
Algorithm: Each thread would search for an index [i][j] in the matrix such that, index [i-1][j-1] and [i+1][j-1] is 'Completed'. It would set A[i][j] to 'In Progress' while working on it. In case i=0, i-1 refers to n-1, if i=n-1, i+1 refers to 0. (like a circular queue). When all entries in the last column are 'Completed', the thread terminates. Otherwise it searches for a new data which is not completed.
Using pthread library to realize this:
Important variables: mutex, conditional variables.
pthread_mutex_t mutex= PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condvar= PTHREAD_COND_INITIALIZER;
mutex is a 'lock'. We use it when we need to make an operation atomic. Atomic operation refers to an operation that needs to be done in 1 step without breaking execution. 'condvar' is a condition variable. Using it a thread can sleep until a condition is reached, when it is reached, the thread is woken up. This avoids busy waiting using a loop.
Here, our atomic operation is updating A. Reason: If the threads simultaneously update A, it may lead to race conditions such as more than 1 thread working on a Data in parallel.
To realize this, we search and set A inside the lock. Once A is set, we release the lock and work on the data. But if no available data was found which could be worked on, we wait on the conditional variable - condvar. When we call wait on condvar, we also pass mutex. While inside the lock, wait function releases the mutex lock and waits for the conditional variable to be signaled. Once it is signaled, it requires the lock and proceeds with execution. While waiting process is in sleeping state and hence does not waste CPU time.
Whenever any thread finishes working on a piece of data, it may prepare 1 or more other samples for being worked on. Hence after a thread finishes work, it signals all other threads to check for a 'workable' Data before continuing the algorithm. Pseudo code for this is as follows:
Read the comments and function names. They describe in detail the working of pthread library. While compilation with gcc add -lpthread flag and for further details of the library looking up the man pages of these functions is more than sufficient.
void thread(void)
{
//Note: If there are various threads in the line pthread_mutex_lock(&mutex)
// each will wait till the lock is released and acquired. Until then it will sleep.
pthread_mutex_lock(&mutex); //Do the searching inside the lock
while(lastColumnNotDone){ //This will avoid previously searched indices being updated
//Search for a workable index
if(found)
{ //As A has been updated and set to in progress, no need to hold lock. As we start work on the thread we release the lock so other process might use it.
pthread_mutex_unlock(&mutex); //Note:
//WORK ON DATA
pthread_mutex_lock(&mutex); //Restore lock to in order to continue thread's execution safely.
pthread_cond_broadcast(&condvar); //Sends a wake up signal to all threads which are waiting for the conditional variable 'condvar'.
}
else //No executable data found
pthread_cond_wait(&condvar,&mutex); //While waiting, we pass the address of mutex as second parameter to wait function. This releases the lock on mutex while this function is waiting and tries to reacquire it once condvar is signaled.
}
pthread_mutex_unlock(&mutex);
}
Search and checking if all data is completed in the while loop condition can be optimized but that is a different algorithms question. Key idea here is use of pthread library and thread concept.
A is a common access matrix. Do NOT update it outside of lock.
While checking anything with respect to A, such as finding a process or checking if all data is done, lock must be held. Otherwise A can be changed by a different thread at the same time a thread is reading it.
We acquire and release locks using the functions pthread_mutex_lock and pthread_mutex_unlock. Remember, these functions take pointers of the mutex and not it's value. It is a variable that needs to be accessed and updated.
Avoid holding the lock for long amounts of time. This will cause the threads to wait for a long time for small access needs.
When calling wait, be sure that lock is held. Wait unlocks the mutex held passed as the second parameter during the duration of it's wait. After receiving the signal to wake up it tries to acquire the lock once again.

·Wait for first of various threads in C

I have something like a list of things to calculate, and they are somewhat dependent on eachother (some of those calculations, may show that other calculations on the list are not needed).
Also I want to calculate always two of them at a time (two child threads and one main thread (which is a child thread of another one, but that's another story)).
So I want the main thread to wait for ANY of the two treads -the one that finishes first makes the main thread continue-. After it continues, it will run some code to see if the other running thread can be killed (if the one that finished shows that the other is not needed), or not, and also to run a new thread.
The idea is to do something like this:
while (/*list not empty*/) {
/*detect which two entries need to be calculated*/
/*detect if running thread can be killed*/
if (/*running thread can be killed*/) {
pthread_join(/*threadnum*/, NULL)
}
switch (/*how many threads already running?*/) {
case 0:
pthread_create(/*&threadnum*/, NULL, /*calculate*/, /*foo*/);
case 1:
pthread_create(/*&threadnum*/, NULL, /*calculate*/, /*foo*/);
break;
}
/* !!!!!!!!! Question code is next line: !!!!!!!!!*/
pthread_join(/*What goes here?*/, NULL);
// If it is impossible to do with pthread_join, what else can be used?
}
My first approach (if this was impossible) would be to store in an array the status of both threads, and check every second (with a while and sleep(1)) if any of them finished, but that would make me lose time (between 0 and 1 seconds) every time a thread finishes. So I want to avoid that if possible.
EDIT: pthread_cond_wait(/* something */) seems the way to go. However I want it to be easy: the main thread and both child threads share a global variable (parent) that is set to 0 if child threads are running, and is set to 1 when one of them stops. Ideally I want to control from the main thread everything in this way:
while (/*list not empty*/) {
/*detect which two entries need to be calculated*/
/*detect if running thread can be killed*/
if (/*running thread can be killed*/) {
pthread_join(/*threadnum*/, NULL)
}
switch (/*how many threads already running?*/) {
case 0:
pthread_create(/*&threadnum*/, NULL, /*calculate*/, /*foo*/);
case 1:
pthread_create(/*&threadnum*/, NULL, /*calculate*/, /*foo*/);
break;
}
/* !!!!!!!!! Question code is next line: !!!!!!!!!*/
pthread_cond_wait(parent, /*wtf?*/)
}
Now I have an idea to stop the parent until that condition is met, which I can set to 1 inside the child threads.
Instead of making the main thread monitor and try to kill other threads, make the other threads communicate directly amongst themselves.
For example, if thread A finishes and it becomes clear that the computation in thread B is no longer needed, simply set a boolean flag. Make thread B check that flag between steps of its computation, and give up if the flag is set.
Trying to interrupt threads is bad practice--you're better off just setting a flag that the threads will check.

How implement a barrier using semaphores

I have the following problem to solve:
Consider an application where there are three types of threads: Calculus-A,Calculus-B and Finalization. Whenever a thread type Calculus-A ends, it calls the routine endA(), which returns immediately. Whenever a thread type Calculus-B ends, it calls the routine endB(), which returns immediately. Threads like Finalization routine call wait(),
which returns only if they have already completed two Calculation-A threads and 2 Calculation-B threads. In other words, for exactly 2 conclusions of Calculus-A and 2 conclusions of Calculus-B one thread Finalization is allowed to continue.
There is an undetermined number of threads of the 3 types. It is not known the order of the routines called by threads. Threads Completion are answered in the order of arrival.
Implement routines endA(), endB() and wait() using semaphores. Besides the variables initialization, the only possible operations are P and V. Solutions with busy-waiting are not acceptable.
Here's is my solution:
semaphore calcA = 2;
semaphore calcB = 2;
semaphore wait = -3;
void endA()
{
P(calcA);
V(wait);
}
void endB()
{
P(calcB);
V(wait);
}
void wait()
{
P(wait);
P(wait);
P(wait);
P(wait);
V(calcA);
V(calcA);
V(calcB);
V(calcB);
}
I believe that there will be a deadlock due to the wait's initialization and if and wait() executes before endA() and endB(). Is there any other solution for this?
I tend to view semaphore problems as problems where one must identify "sources of waiting" and define for each a semaphore and a protocol for their access.
With that in mind, the "sources of waiting" are
Completions of CalcA
Completions of CalcB
Maybe, if I understood this right, a wait on whole completion groups, consisting of two CalcAs and two CalcBs. I say maybe because I'm not sure what "Threads Completion are answered in the order of arrival." means.
Completions of CalcA and CalcB should therefore increment their respective counters. At the other end, one Finalization thread gains exclusive access to the counters and waits in any order for the needed number of completions to constitute a completion group. It then unlocks access to the next group.
My code is below, although since I'm unfamiliar with the Dutch V and P I will use take()/give().
semaphore calcA = 0;
semaphore calcB = 0;
semaphore groupSem = 1;
void endA(){
give(calcA);
}
void endB(){
give(calcB);
}
void wait(){
take(groupSem);
take(calcA);
take(calcA);
take(calcB);
take(calcB);
give(groupSem);
}
The groupSem semaphore ensures all-or-nothing: the thread that enters the critical section will get the next two completions of each of CalcA and CalcB. If groupSem wasn't there, the first thread to enter wait could take two As and block, then be taken over by another thread that grabs two As and two B and then run away.
A worse problem that exists if the groupSem isn't there is if this second thread takes two As, one B and then blocks, and then the first thread grabs the second B. If somehow the result of the finalization allows more runs of CalculationA and CalculationB, then you may have a deadlock, because there may be no more opportunity for instances of calculation A and B to complete, therefore leaving the finalization threads hanging, unable to produce more calculation instances.

Problem with thread synchronization and condition variables in C

I have three threads, one thread is the main and the other two are worker threads. The first thread, when there is work to be done wakes up one of the two threads. Each thread when awakened perform some computation and while doing this if it finds more work to do can wake up the other working thread or simply decide to do the job by itself (By adding work to a local queue, for example).
While the worker threads have work to do, the main thread must wait for the work to be done. I have implemented this with condition variables as follows (the code reported here hides a lot of details, please ask if there's something non understandable):
MAIN THREAD (pseudocode):
//this function can be called from the main several time. It blocks the main thread till the work is done.
void new_work(){
//signaling to worker threads if work is available
//Now, the threads have been awakened, it's time to sleep till they have finished.
pthread_mutex_lock(&main_lock);
while (work > 0) //work is a shared atomic integer, incremented each time there's work to do and decremented when finished executing some work unit
pthread_cond_wait(&main_cond);
pthread_mutex_unlock(&main_lock);
}
WORKER THREADS:
while (1){
pthread_mutex_lock(&main_lock);
if (work == 0)
pthread_cond_signal(&main_cond);
pthread_mutex_unlock(&main_lock);
//code to let the worker thread wait again -- PROBLEM!
while (I have work to do, in my queue){
do_work()
}
}
Here is the problem: when a worker thread wakes up the main thread I'm not sure that the worker thread calls a wait to put itself in a waiting state for new work. Even if I implement this wait with another condition variable, it can happen that the main thread is awake, does some work until reaches a point in which he has to wake up the thread that has not called a wait yet... and this can lead to bad results. I've tried several ways to solve this issue but I couldn't find a solution, maybe there is an obvious way to solve it but I'm missing it.
Can you provide a scheme to solve this kind of problem? I'm using the C language and I can use whatever synchronization mechanism you think can be suited, like pthreads or posix semaphores.
Thanks
The usual way to handle this is to have a single work queue and protect it from overflow and underflow. Something like this (where I have left off the "pthread_" prefixes):
mutex queue_mutex;
cond_t queue_not_full, queue_not_empty;
void enqueue_work(Work w) {
mutex_lock(&queue_mutex);
while (queue_full())
cond_wait(&queue_not_full, &queue_mutex);
add_work_to_queue(w);
cond_signal(&queue_not_empty);
mutex_unlock(&queue_mutex);
}
Work dequeue_work() {
mutex_lock(&queue_mutex);
while (queue_empty())
cond_wait(&queue_not_empty, &queue_mutex);
Work w = remove_work_from_queue();
cond_signal(&queue_not_full);
mutex_unlock(&queue_mutex);
}
Note the symmetry between these functions: enqueue <-> dequeue, empty <-> full, not_empty <-> not full.
This provides a thread-safe bounded-size queue for any number of threads producing work and any number of threads consuming work. (Actually, it is sort of the canonical example for the use of condition variables.) If your solution does not look exactly like this, it should probably be pretty close...
If you want the main thread to distribute work to the other two, then wait until both threads have completed their work before moving on, you might be able to accomplish this with a barrier.
A barrier is a synchronization construct that you can use to make threads wait at a certain point in your code until a set number of threads are all ready to move on. Essentially, you initialize a pthread barrier, saying that x number of threads must wait on it before any are allowed to continue. As each thread finishes its work and is ready to go on, it will wait on the barrier, and once x number of threads have reached the barrier, they are all allowed to continue.
In your case, you might be able to do something like:
pthread_barrier_t barrier;
pthread_barrier_init(&barrier, 3);
master()
{
while (work_to_do) {
put_work_on_worker_queues();
pthread_barrier_wait(&barrier);
}
}
worker()
{
while(1) {
while (work_on_my_queue()) {
do_work();
}
pthread_barrier_wait(&barrier);
}
}
This should make your main thread give out work, then wait both worker threads to complete the work they were given (if any) before moving on.
Could you have "new job" queue, which is managed by the main thread? The main thread could dish out 1 job at a time to each worker thread. The main thread would also listen for completed jobs by the workers. If a worker thread finds a new job that needs doing just add it to the "new job" queue and the main thread will distribute it.
Pseudocode:
JobQueue NewJobs;
Job JobForWorker[NUM_WORKERS];
workerthread()
{
while(wait for new job)
{
do job (this may include adding new jobs to NewJobs queue)
signal job complete to main thread
}
}
main thread()
{
while(whatever)
{
wait for job completion on any worker thread
now a worker thread is free put a new job on it
}
}
I believe that what you have here is a variation on the producer-consumer problem. What you are doing is writing up an ad-hoc implementation of a counting semaphore (one that is used to provide more than just mutual exclusion).
If I've read your question right, what you are trying to do is have the worker threads block until there is a unit of work available and then perform a unit of work once it becomes available. Your issue is with the case where there is too much work available and the main thread tries to unblock a worker that is already working. I would structure your code as follows.
sem_t main_sem;
sem_init(&main_sem, 0, 0);
void new_work() {
sem_post(&main_sem);
pthread_cond_wait(&main_cond);
}
void do_work() {
while (1) {
sem_wait(&main_sem);
// do stuff
// do more stuff
pthread_cond_signal(&main_sem);
}
}
Now, if the worker threads generate more work then they can simply sem_post to the semaphore and simply defer the pthread_cond_signal till all the work is done.
Note however, if you actually need the main thread to always block when the worker is working, it's not useful to push the work to another thread when you could just call a function that does the work.

Resources