I'm writing a toy program to help me understand how barriers work. In the program I have each thread checking to see if an element in one array is in another array. I was able to have all threads start searching simultaneously but I'd like to have them all meet up after filtering in order to correctly add any numbers to the array.
Without doing so, the the wrong numbers might be added to the array.
EDIT: Someone suggested I'd add pthread_barrier_wait(&after_filtering); in the call to filter_threads and that seemed to do trick. Now I know that I have to add a wait call in main as well as in the filter function, but I don't quite understand the flow of execution and how the wait calls in main work. I know that the wait in the thread function ensures that all threads reach that point before continuing but doesn't that happen as soon as the threads created? Meaning that nums should have values of 99 instead of 4, 3, 8, 1?
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_barrier_t before_filtering, after_filtering;
int nums[4] = {99, 99, 99, 99};
int arr[15] = {12, 5, 31, 8, 1, 6, 24, 4, 81, 42};
void filter(int i, int a[]){
// Loop through the list
int j;
for (j = 0; j < 10; j++)
{
if (nums[i] == a[j])
nums[i] = 0;
}
}
void *filter_threads(void *id){
int *myid = (int *)id;
printf("thread %d\n",*myid);
pthread_barrier_wait(&before_filtering);
filter(*myid, arr);
pthread_barrier_wait(&after_filtering);
}
int main(void)
{
pthread_t tids[3];
int index[3];
pthread_barrier_init(&before_filtering, NULL, 4);
pthread_barrier_init(&after_filtering, NULL, 4);
int i, j;
for (i = 0; i < 3; i++)
{
index[i] = i + 1;
pthread_create(&tids[i], NULL, filter_threads, &index[i]);
}
// Cannot call any filter function without first populating nums
nums[0] = 4;
nums[1] = 3;
nums[2] = 8;
nums[3] = 1;
pthread_barrier_wait(&before_filtering);
filter(0, arr);
pthread_barrier_wait(&after_filtering);
// Add new numbers to arr...
printf("New nums: ");
for (i = 0; i < 4; i++)
printf("%d, ", nums[i]);
printf("\n");
for (i = 0; i < 3; i++)
pthread_join(tids[i], NULL);
// Print new arr...
pthread_barrier_destroy(&before_filtering);
pthread_barrier_destroy(&after_filtering);
}
I tried adding another wait call after the filtering but now the program just hangs. How can I accomplish this?
A barrier is simply a mechanism to ensure that all N threads reach a certain point in the code before continuing. Hence, if you call pthread_barrier_init with a count of 4, any thread calling pthread_barrier_wait on that same barrier will not continue until three other threads have also called pthread_barrier_wait.
So, as your code provides now: the three created threads will, once started, execute the pthread_barrier_wait(&before_filtering) where they will all block until after the main thread has executed sleep(3) and then initialized the nums array. The main thread then calls pthread_barrier_wait(&before_filtering). This releases the main thread and all the others to continue execution.
After executing the filter function, each sub-thread and the main thread should execute pthread_barrier_wait(&after_filtering). Otherwise, the main thread will be stalled waiting for three other threads to wait on the barrier.
That being said, there is no need to use the second barrier at all. At that point, the main thread is really just waiting for the other three threads to finish their task and quit. The pthread_join on each sub-thread being done by the main thread accomplishes the same thing: i.e. the pthread_join on a thread will not return until that thread has finish execution.
Related
I am working with a large project and I am trying to create a test that does the following thing: first, create 5 threads. Each one of this threads will create 4 threads, which in turn each one creates 3 other threads. All this happens until 0 threads.
I have _ThreadInit() function used to create a thread:
status = _ThreadInit(mainThreadName, ThreadPriorityDefault, &pThread, FALSE);
where the 3rd parameter is the output(the thread created).
What am I trying to do is start from a number of threads that have to be created n = 5, the following way:
for(int i = 0; i < n; i++){
// here call the _ThreadInit method which creates a thread
}
I get stuck here. Please, someone help me understand how it should be done. Thanks^^
Building on Eugene Sh.'s comment, you could create a function that takes a parameter, which is the number of threads to create, that calls itself recursively.
Example using standard C threads:
#include <stdbool.h>
#include <stdio.h>
#include <threads.h>
int MyCoolThread(void *arg) {
int num = *((int*)arg); // cast void* to int* and dereference
printf("got %d\n", num);
if(num > 0) { // should we start any threads at all?
thrd_t pool[num];
int next_num = num - 1; // how many threads the started threads should start
for(int t = 0; t < num; ++t) { // loop and create threads
// Below, MyCoolThread creates a thread that executes MyCoolThread:
if(thrd_create(&pool[t], MyCoolThread, &next_num) != thrd_success) {
// We failed to create a thread, set `num` to the number of
// threads we actually created and break out.
num = t;
break;
}
}
int result;
for(int t = 0; t < num; ++t) { // join all the started threads
thrd_join(pool[t], &result);
}
}
return 0;
}
int main() {
int num = 5;
MyCoolThread(&num); // fire it up
}
Statistics from running:
1 thread got 5
5 threads got 4
20 threads got 3
60 threads got 2
120 threads got 1
120 threads got 0
I'm pretty new to C, so I'm not sure where even to start digging about my problem.
I'm trying to port python number-crunching algos to C, and since there's no GIL in C (woohoo), I can change whatever I want in memory from threads, for as long as I make sure there are no races.
I did my homework on mutexes, however, I cannot wrap my head around use of mutexes in case of continuously running threads accessing the same array over and over.
I'm using p_threads in order to split the workload on a big array a[N].
Number crunching algorithm on array a[N] is additive, so I'm splitting it using a_diff[N_THREADS][N] array, writing changes to be applied to a[N] array from each thread to a_diff[N_THREADS][N] and then merging them all together after each step.
I need to run the crunching on different versions of array a[N], so I pass them via global pointer p (in the MWE, there's only one a[N])
I'm synchronizing threads using another global array SYNC_THREADS[N_THREADS] and make sure threads quit when I need them to by setting END_THREADS global (I know, I'm using too many globals - I don't care, code is ~200 lines). My question is in regard to this synchronization technique - is it safe to do so and what is cleaner/better/faster way to achieve that?
MWEe:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define N_THREADS 3
#define N 10000000
#define STEPS 3
double a[N]; // main array
double a_diff[N_THREADS][N]; // diffs array
double params[N]; // parameter used for number-crunching
double (*p)[N]; // pointer to array[N]
// structure for bounds for crunching the array
struct bounds {
int lo;
int hi;
int thread_num;
};
struct bounds B[N_THREADS];
int SYNC_THREADS[N_THREADS]; // for syncing threads
int END_THREADS = 0; // signal to terminate threads
static void *crunching(void *arg) {
// multiple threads run number-crunching operations according to assigned low/high bounds
struct bounds *data = (struct bounds *)arg;
int lo = (*data).lo;
int hi = (*data).hi;
int thread_num = (*data).thread_num;
printf("worker %d started for bounds [%d %d] \n", thread_num, lo, hi);
int i;
while (END_THREADS != 1) { // END_THREADS tells threads to terminate
if (SYNC_THREADS[thread_num] == 1) { // SYNC_THREADS allows threads to start number-crunching
printf("worker %d working... \n", thread_num );
for (i = lo; i <= hi; ++i) {
a_diff[thread_num][i] += (*p)[i] * params[i]; // pretend this is an expensive operation...
}
SYNC_THREADS[thread_num] = 0; // thread disables itself until SYNC_THREADS is back to 1
printf("worker %d stopped... \n", thread_num );
}
}
return 0;
}
int i, j, th,s;
double joiner;
int main() {
// pre-fill arrays
for (i = 0; i < N; ++i) {
a[i] = i + 0.5;
params[i] = 0.0;
}
// split workload between workers
int worker_length = N / N_THREADS;
for (i = 0; i < N_THREADS; ++i) {
B[i].thread_num = i;
B[i].lo = i * worker_length;
if (i == N_THREADS - 1) {
B[i].hi = N;
} else {
B[i].hi = i * worker_length + worker_length - 1;
}
}
// pointer to parameters to be passed to worker
struct bounds **data = malloc(N_THREADS * sizeof(struct bounds*));
for (i = 0; i < N_THREADS; i++) {
data[i] = malloc(sizeof(struct bounds));
data[i]->lo = B[i].lo;
data[i]->hi = B[i].hi;
data[i]->thread_num = B[i].thread_num;
}
// create thread objects
pthread_t threads[N_THREADS];
// disallow threads to crunch numbers
for (th = 0; th < N_THREADS; ++th) {
SYNC_THREADS[th] = 0;
}
// launch workers
for(th = 0; th < N_THREADS; th++) {
pthread_create(&threads[th], NULL, crunching, data[th]);
}
// big loop of iterations
for (s = 0; s < STEPS; ++s) {
for (i = 0; i < N; ++i) {
params[i] += 1.0; // adjust parameters
// zero diff array
for (i = 0; i < N; ++i) {
for (th = 0; th < N_THREADS; ++th) {
a_diff[th][i] = 0.0;
}
}
p = &a; // pointer to array a
// allow threads to process numbers and wait for threads to complete
for (th = 0; th < N_THREADS; ++th) { SYNC_THREADS[th] = 1; }
// ...here threads started by pthread_create do calculations...
for (th = 0; th < N_THREADS; th++) { while (SYNC_THREADS[th] != 0) {} }
// join results from threads (number-crunching is additive)
for (i = 0; i < N; ++i) {
joiner = 0.0;
for (th = 0; th < N_THREADS; ++th) {
joiner += a_diff[th][i];
}
a[i] += joiner;
}
}
}
// join workers
END_THREADS = 1;
for(th = 0; th < N_THREADS; th++) {
pthread_join(threads[th], NULL);
}
return 0;
}
I see that workers don't overlap time-wise:
worker 0 started for bounds [0 3333332]
worker 1 started for bounds [3333333 6666665]
worker 2 started for bounds [6666666 10000000]
worker 0 working...
worker 1 working...
worker 2 working...
worker 2 stopped...
worker 0 stopped...
worker 1 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 0 stopped...
worker 2 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 2 stopped...
worker 0 stopped...
Process returned 0 (0x0) execution time : 1.505 s
and I make sure worker's don't get into each other workspaces by separating them via a_diff[thead_num][N] sub-arrays, however, I'm not sure that's always the case and that I'm not introducing hidden races somewhere...
I didn't realize what was the question :-)
So, the question is if you're thinking well with your SYNC_THREADS and END_THREADS synchronization mechanism.
Yes!... Almost. The problem is that threads are burning CPU while waiting.
Conditional Variables
To make threads wait for an event you have conditional variables (pthread_cond). These offer a few useful functions like wait(), signal() and broadcast():
wait(&cond, &m) blocks a thread in a given condition variable. [note 2]
signal(&cond) unlocks a thread waiting in a given condition variable.
broadcast(&cond) unlocks all threads waiting in a given condition variable.
Initially you'd have all the threads waiting [note 1]:
while(!start_threads)
pthread_cond_wait(&cond_start);
And, when the main thread is ready:
start_threads = 1;
pthread_cond_broadcast(&cond_start);
Barriers
If you have data dependencies between iterations, you'd want to make sure threads are executing the same iteration at any given moment.
To synchronize threads at the end of each iteration, you'll want to take a look at barriers (pthread_barrier):
pthread_barrier_init(count): initializes a barrier to synchronize count threads.
pthread_barrier_wait(): thread waits here until all count threads reach the barrier.
Extending functionality of barriers
Sometimes you'll want the last thread reaching a barrier to compute something (e.g. to increment the counter of number of iterations, or to compute some global value, or to check if execution should stop). You have two alternatives
Using pthread_barriers
You'll need to essentially have two barriers:
int rc = pthread_barrier_wait(&b);
if(rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD)
if(shouldStop()) stop = 1;
pthread_barrier_wait(&b);
if(stop) return;
Using pthread_conds to implement our own specialized barrier
pthread_mutex_lock(&mutex)
remainingThreads--;
// all threads execute this
executedByAllThreads();
if(remainingThreads == 0) {
// reinitialize barrier
remainingThreads = N;
// only last thread executes this
if(shouldStop()) stop = 1;
pthread_cond_broadcast(&cond);
} else {
while(remainingThreads > 0)
pthread_cond_wait(&cond, &mutex);
}
pthread_mutex_unlock(&mutex);
Note 1: why is pthread_cond_wait() inside a while block? May seem a bit odd. The reason behind it is due to the existence of spurious wakeups. The function may return even if no signal() or broadcast() was issued. So, just to guarantee correctness, it's usual to have an extra variable to guarantee that if a thread suddenly wakes up before it should, it runs back into the pthread_cond_wait().
From the manual:
When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
(...)
If a signal is delivered to a thread waiting for a condition variable, upon return from the signal handler the thread resumes waiting for the condition variable as if it was not interrupted, or it shall return zero due to spurious wakeup.
Note 2:
An noted by Michael Burr in the comments, you should hold a companion lock whenever you modify the predicate (start_threads) and pthread_cond_wait(). pthread_cond_wait() will release the mutex when called; and re-acquires it when it returns.
PS: It's a bit late here; sorry if my text is confusing :-)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
The problem with the following program is that the main thread ends before the other threads get a chance to display their results. Another issue is that the threads display isn't in order.
The output needs to be:
This is thread 0.
This is thread 1.
This is thread 2. etc.
code:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <time.h>
void *text(void *arg);
long code[] = { 4, 6, 3, 1, 5, 0, 2 }; // Order in which to start threads
int num = 0;
int main()
{
int i;
pthread_t tid[7];
// Initialize random number generator
time_t seconds;
time(&seconds);
srand((unsigned int) seconds);
// Create our threads
for (i = 0; i < 7; i++)
pthread_create(&tid[i], NULL, text, (void*)code[i]);
// Exit main
return 0;
}
void *text(void *arg)
{
long n = (long)arg;
int rand_sec = rand() % (3 - 1 + 1) + 1; // Random num seconds to sleep
while (num != n) {} // Busy wait used to wait for our turn
num++; // Let next thread go
sleep(rand_sec); // Sleep for random amount of time
printf("This is thread %d.\n", n);
// Exit thread
pthread_exit(0);
}
Can anyone help with using a mutex to fix the synchronization problem?
You can use pthread_join to make your main routine wait until your other threads have finished, e.g.
int main()
{
int i;
pthread_t tid[7];
// Initialize random number generator
time_t seconds;
time(&seconds);
srand((unsigned int) seconds);
// Create our threads
for (i = 0; i < 7; i++)
pthread_create(&tid[i], NULL, text, (void*)code[i]);
for(i = 0; i < 7; i++)
pthread_join(tid[i], NULL);
// Exit main
return 0;
}
Also, your code which waits for num to be the thread ID might have some issues when you start turning optimisations on. Your num variable should be declared volatile if it is shared between threads, like so:
volatile int num;
That said, busy waiting on num like so is very, very inefficient. You should consider using condition variables to notify threads when num changes, so they can check if it is their turn once, then sleep if not. Have a look at all of the pthread_cond_* functions for more information.
I am trying to learn how locks work in multi-threading. When I execute the following code without lock, it worked fine even though the variable sum is declared as a global variable and multiple threads are updating it. Could anyone please explain why here threads are working perfectly fine on a shared variable without locks?
Here is the code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NTHREADS 100
#define ARRAYSIZE 1000000
#define ITERATIONS ARRAYSIZE / NTHREADS
double sum=0.0, a[ARRAYSIZE];
pthread_mutex_t sum_mutex;
void *do_work(void *tid)
{
int i, start, *mytid, end;
double mysum=0.0;
/* Initialize my part of the global array and keep local sum */
mytid = (int *) tid;
start = (*mytid * ITERATIONS);
end = start + ITERATIONS;
printf ("Thread %d doing iterations %d to %d\n",*mytid,start,end-1);
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
mysum = mysum + a[i];
}
/* Lock the mutex and update the global sum, then exit */
//pthread_mutex_lock (&sum_mutex); //here I tried not to use locks
sum = sum + mysum;
//pthread_mutex_unlock (&sum_mutex);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
int i, start, tids[NTHREADS];
pthread_t threads[NTHREADS];
pthread_attr_t attr;
/* Pthreads setup: initialize mutex and explicitly create threads in a
joinable state (for portability). Pass each thread its loop offset */
pthread_mutex_init(&sum_mutex, NULL);
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i=0; i<NTHREADS; i++) {
tids[i] = i;
pthread_create(&threads[i], &attr, do_work, (void *) &tids[i]);
}
/* Wait for all threads to complete then print global sum */
for (i=0; i<NTHREADS; i++) {
pthread_join(threads[i], NULL);
}
printf ("Done. Sum= %e \n", sum);
sum=0.0;
for (i=0;i<ARRAYSIZE;i++){
a[i] = i*1.0;
sum = sum + a[i]; }
printf("Check Sum= %e\n",sum);
/* Clean up and exit */
pthread_attr_destroy(&attr);
pthread_mutex_destroy(&sum_mutex);
pthread_exit (NULL);
}
With and without lock I got the same answer!
Done. Sum= 4.999995e+11
Check Sum= 4.999995e+11
UPDATE: Change suggested by user3386109
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
//pthread_mutex_lock (&sum_mutex);
sum = sum + a[i];
//pthread_mutex_lock (&sum_mutex);
}
EFFECT :
Done. Sum= 3.878172e+11
Check Sum= 4.999995e+11
Mutexes are used to prevent race conditions which are undesirable situations when you have two or more threads accessing a shared resource. Race conditions such as the one in your code happen when the shared variable sum is being accessed by multiple threads. Sometimes the access to the shared variable will be interleaved in such a way that the result is incorrect and sometimes the result will be correct.
For example lets say you have two threads, thread A and thread B both adding 1 to a shared value, sum, which starts at 5. If thread A reads sum and then thread B reads sum and then thread A writes a new value followed by thread B writing a new value you will can an incorrect result, 6 as opposed to 7. However it is also possible than thread A reads and then writes a value (specifically 6) followed by thread B reading and writing a value (specifically 7) and then you get the correct result. The point being that some interleavings of operations result in the correct value and some interleavings result in an incorrect value. The job of the mutex is to force the interleaving to always be correct.
The task is to have 5 threads present at the same time, and the user assigns each a burst time. Then a round robin algorithm with a quantum of 2 is used to schedule the threads. For example, if I run the program with
$ ./m 1 2 3 4 5
The output should be
A 1
B 2
C 2
D 2
E 2
C 1
D 2
E 2
E 1
But for now my output shows only
A 1
B 2
C 2
Since the program errs where one thread does not end for the time being, I think the problem is that this thread cannot unlock to let the next thread grab the lock. My sleep() does not work, either. But I have no idea how to modify my code in order to fix them. My code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
double times[5];
char process[] = {'A', 'B', 'C', 'D', 'E'};
int turn = 0;
void StartNext(int tid) //choose the next thread to run
{
int i;
for(i = (tid + 1) % 5; times[i] == 0; i = (i + 1) % 5)
if(i == tid) //if every thread has finished
return;
turn = i;
}
void *Run(void *tid) //the thread function
{
int i = (int)tid;
while(times[i] != 0)
{
while(turn != i); //busy waiting till it is its turn
if(times[i] > 2)
{
printf("%c 2\n", process[i]);
sleep(2); //sleep is to simulate the actual running time
times[i] -= 2;
}
else if(times[i] > 0 && times[i] <= 2) //this thread will have finished after this turn
{
printf("%c %lf\n", process[i], times[i]);
sleep(times[i]);
times[i] = 0;
}
StartNext(i); //choose the next thread to run
}
pthread_exit(0);
}
int main(int argc, char **argv)
{
pthread_t threads[5];
int i, status;
if(argc == 6)
{
for(i = 0; i < 5; i++)
times[i] = atof(argv[i + 1]); //input the burst time of each thread
for(i = 0; i < 5; i++)
{
status = pthread_create(&threads[i], NULL, Run, (void *)i); //Create threads
if(status != 0)
{
printf("While creating thread %d, pthread_create returned error code %d\n", i, status);
exit(-1);
}
pthread_join(threads[i], 0); //Join threads
}
}
return 0;
}
The program is directly runnable. Could anyone help me figure it out? Thanks!
Some things I've figured out reading your code:
1. At the beginning of the Run function, you convert tid (which is a pointer to void) directly to int. Shouldn't you dereference it?
It is better to make int turn volatile, so that the compiler won't make any assumptions about its value not changing.
When you call the function sleep the second time, you pass a parameter that has type double (times[i]), and you should pass an unsigned int parameter. A direct cast like (unsigned int) times[i] should solve that.
You're doing the pthread_join before creating the other threads. When you create thread 3, and it enters its busy waiting state, the other threads won't be created. Try putting the joins after the for block.