WinApi C multithreading: how to wait for a thread to finish? - c

I'm writing a multithreaded program to calculate Fibonacci, Power and Factorial. Instead of using Sleep, I would like to wait for threads to finish, and I'd like to display ids of threads in the order they finish (first finished, first displayed). How should I do this?
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
unsigned int n = 0;
int priorytety[3] = { THREAD_PRIORITY_BELOW_NORMAL,THREAD_PRIORITY_NORMAL, THREAD_PRIORITY_ABOVE_NORMAL};
HANDLE watki[3];
DWORD WINAPI Fibbonaci(void *argumenty){
unsigned long long int prevPrev = 0;
unsigned long long int prev = 1;
unsigned long long int wynik = 1;
while (wynik <= n){
wynik = prev + prevPrev;
prevPrev = prev;
prev = wynik;
}
printf("fibbonaci : %llu \n", wynik);
ExitThread(wynik);
//return wynik;
}
DWORD WINAPI Potegi(void *argumenty){
unsigned long long int wynik = 2;
while (wynik <= n){
wynik = wynik << 1;
}
printf("potegi : %llu \n", wynik);
return wynik;
}
DWORD WINAPI Silnia(void *argumenty){
//printf("%d", atoi(argv[argc-1]));
unsigned long long int wynik = 1;
unsigned long long int i = 1;
while (wynik <= n){
wynik = wynik * i;
i = i + 1;
}
printf("silnia : %llu \n", wynik);
return wynik;
}
int main(){
int i;
DWORD id;
system("cls");
scanf_s("%d", &n);
LPTHREAD_START_ROUTINE WINAPI funkcje[3] = { Fibbonaci, Potegi, Silnia };
for (i = 0; i < 3; i++)
{
watki[i] = CreateThread(
NULL, // atrybuty bezpieczeństwa
10000, // inicjalna wielkość stosu
funkcje[i] , // funkcja wątku
(void *)n,// dane dla funkcji wątku
0, // flagi utworzenia
&id);
if (watki[i] != INVALID_HANDLE_VALUE)
{
//printf("Utworzylem watek o identyfikatorze %x\n", id);
// ustawienie priorytetu
SetThreadPriority(watki[i], priorytety[1]);
}
}
Sleep(10000);
getchar();
}

#WhozCraig is correct that you should use WaitForMultipleObjects() to wait for all the threads to finish. Read this SO post for more information.
That, however, will not tell you the order in which they ended, only when all have completed. Adding code to each function to print its thread ID should do that (use GetCurrentThreadId()). For example:
printf("potegi : %llu, thread ID %ld \n", wynik, GetCurrentThreadId());
Now we must not forget that there is time between the printf statement and when the thread actually finishes. You are not doing any work there, but technically the thread is still active. In a multithreaded environment, you cannot predict how much time will elapse between the printf and when the thread truly terminates, no matter how little code appears to be there.
If this difference is important to you, then you would need to join each thread independently and see which one terminates first. You could repeatedly call WaitForSingleObject() on each thread handle with a zero timeout and detect which one terminates first. Yes, there is a slight race condition if the third thread finishes slightly before the second while you are checking on the first, and then you check the second thread and notice it has terminated. You'll miss the fact that the third finished first. And this polling technique alters the experiment by consuming a lot of CPU while it is waiting.
Personally, I think you are better off just recording the time (based on the system clock) when each thread finished computing its result, not when its thread terminated. Use GetTickCount() or QueryPerformanceCounter() to measure the time.

Related

How to solve the dining philosophers problem with only mutexes?

I wrote this program to solve the dining philosophers problem using Dijkstra's algorithm, notice that I'm using an array of booleans (data->locked) instead of an array of binary semaphores.
I'm not sure if this solution is valid (hence the SO question).
Will access to the data->locked array in both test and take_forks functions cause data races? if so is it even possible to solve this problem using Dijkstra's algorithm with only mutexes?
I'm only allowed to use mutexes, no semaphores, no condition variables (it's an assignment).
Example of usage:
./a.out 4 1000 1000
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#define NOT_HUNGRY 1
#define HUNGRY 2
#define EATING 3
#define RIGHT ((i + 1) % data->n)
#define LEFT ((i + data->n - 1) % data->n)
typedef struct s_data
{
int n;
int t_sleep;
int t_eat;
int *state;
bool *locked;
pthread_mutex_t *state_mutex;
} t_data;
typedef struct s_arg
{
t_data *data;
int i;
} t_arg;
int ft_min(int a, int b)
{
if (a < b)
return (a);
return (b);
}
int ft_max(int a, int b)
{
if (a > b)
return (a);
return (b);
}
// if the LEFT and RIGHT threads are not eating
// and thread number i is hungry, change its state to EATING
// and signal to the while loop in `take_forks` to stop blocking.
// if a thread has a state of HUNGRY then it's guaranteed
// to be out of the critical section of `take_forks`.
void test(int i, t_data *data)
{
if (
data->state[i] == HUNGRY
&& data->state[LEFT] != EATING
&& data->state[RIGHT] != EATING
)
{
data->state[i] = EATING;
data->locked[i] = false;
}
}
// set the state of the thread number i to HUNGRY
// and block until the LEFT and RIGHT threads are not EATING
// in which case they will call `test` from `put_forks`
// which will result in breaking the while loop
void take_forks(int i, t_data *data)
{
pthread_mutex_lock(data->state_mutex);
data->locked[i] = true;
data->state[i] = HUNGRY;
test(i, data);
pthread_mutex_unlock(data->state_mutex);
while (data->locked[i]);
}
// set the state of the thread number i to NOT_HUNGRY
// then signal to the LEFT and RIGHT threads
// so they can start eating when their neighbors are not eating
void put_forks(int i, t_data *data)
{
pthread_mutex_lock(data->state_mutex);
data->state[i] = NOT_HUNGRY;
test(LEFT, data);
test(RIGHT, data);
pthread_mutex_unlock(data->state_mutex);
}
void *philosopher(void *_arg)
{
t_arg *arg = _arg;
while (true)
{
printf("%d is thinking\n", arg->i);
take_forks(arg->i, arg->data);
printf("%d is eating\n", arg->i);
usleep(arg->data->t_eat * 1000);
put_forks(arg->i, arg->data);
printf("%d is sleeping\n", arg->i);
usleep(arg->data->t_sleep * 1000);
}
return (NULL);
}
void data_init(t_data *data, pthread_mutex_t *state_mutex, char **argv)
{
int i = 0;
data->n = atoi(argv[1]);
data->t_eat = atoi(argv[2]);
data->t_sleep = atoi(argv[3]);
pthread_mutex_init(state_mutex, NULL);
data->state_mutex = state_mutex;
data->state = malloc(data->n * sizeof(int));
data->locked = malloc(data->n * sizeof(bool));
while (i < data->n)
{
data->state[i] = NOT_HUNGRY;
data->locked[i] = true;
i++;
}
}
int main(int argc, char **argv)
{
pthread_mutex_t state_mutex;
t_data data;
t_arg *args;
pthread_t *threads;
int i;
if (argc != 4)
{
fputs("Error\nInvalid argument count\n", stderr);
return (1);
}
data_init(&data, &state_mutex, argv);
args = malloc(data.n * sizeof(t_arg));
i = 0;
while (i < data.n)
{
args[i].data = &data;
args[i].i = i;
i++;
}
threads = malloc(data.n * sizeof(pthread_t));
i = 0;
while (i < data.n)
{
pthread_create(threads + i, NULL, philosopher, args + i);
i++;
}
i = 0;
while (i < data.n)
pthread_join(threads[i++], NULL);
}
Your spin loop while (data->locked[i]); is a data race; you don't hold the lock while reading it data->locked[i], and so another thread could take the lock and write to that same variable while you are reading it. In fact, you rely on that happening. But this is undefined behavior.
Immediate practical consequences are that the compiler can delete the test (since in the absence of a data race, data->locked[i] could not change between iterations), or delete the loop altogether (since it's now an infinite loop, and nontrivial infinite loops are UB). Of course other undesired outcomes are also possible.
So you have to hold the mutex while testing the flag. If it's false, you should then hold the mutex until you set it true and do your other work; otherwise there is a race where another thread could get it first. If it's true, then drop the mutex, wait a little while, take it again, and retry.
(How long is a "little while", and what work you choose to do in between, are probably things you should test. Depending on what kind of fairness algorithms your pthread implementation uses, you might run into situations where take_forks succeeds in retaking the lock even if put_forks is also waiting to lock it.)
Of course, in a "real" program, you wouldn't do it this way in the first place; you'd use a condition variable.

Buffer mutex and condition variables in C

I only just started writing multithreading in C and don't have a full understanding of how to implement it. I'm writing a code that reads an input file and puts into a buffer struct array. When the buffer has no more available space, request_t is blocked waiting for available space. It is controlled by thread Lift_R. The other threads lift 1-3 operate lift() and it writes whats in buffer to the output file depending number of int sec. sec and size and given values through command line. This will free up space for request to continue reading the input.
Can someone please help me with how to implement these functions properly. I know there are other questions relating to this, but I want my code to meet specific conditions.
(NOTE: lift operates in FIFO and threads use mutual exclusion)
This is what I wrote so far, I haven't implemented any waiting conditions or FIFO yet, I'm currently focusing on the writing to file and debugging and am soon getting to wait and signal.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include "list.h"
pthread_cond_t cond1 = PTHREAD_COND_INITIALIZER; //declare thread conditions
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; //declare mutex
int sec; //time required by each lift to serve a request
int size; //buffer size
buffer_t A[];
write_t write;
void *lift(void *vargp)
{
pthread_mutex_lock(&lock);
FILE* out;
out = fopen("sim_output.txt", "w");
//gather information to print
if (write.p == NULL) //only for when system begins
{
write.p = A[1].from;
}
write.rf = A[1].from;
write.rt = A[1].to;
write.m = (write.p - A[1].from) + (A[1].to - A[1].from);
if (write.total_r == NULL) //for when the system first begins
{
write.total_r = 0;
}
else
{
write.total_r++;
}
if (write.total_m == NULL)
{
write.total_m = write.m;
}
else
{
write.total_m = write.total_m + write.m;
}
write.c = A[1].to;
//Now write the information
fprintf(out, "Previous position: Floor %d\n", write.p);
fprintf(out, "Request: Floor %d to Floor %d\n", write.rf, write.rt);
fprintf(out, "Detail operations:\n");
fprintf(out, " Go from Floor %d to Floor %d\n", write.p, write.rf);
fprintf(out, " Go from Floor %d to Floor %d\n", write.rf, write.rt);
fprintf(out, " #movement for this request: %d\n", write.m);
fprintf(out, " #request: %d\n", write.total_r);
fprintf(out, " Total #movement: %d\n", write.total_m);
fprintf(out, "Current Position: Floor %d\n", write.c);
write.p = write.c; //for next statement
pthread_mutex_unlock(&lock);
return NULL;
}
void *request_t(void *vargp)
{
pthread_mutex_lock(&lock); //Now only request can operate
FILE* f;
FILE* f2;
f = fopen("sim_input.txt", "r");
if (f == NULL)
{
printf("input file empty\n");
exit(EXIT_FAILURE);
}
f2 = fopen("sim_output.txt", "w");
int i = 0;
for (i; i < size; i++)
{
//read the input line by line and into the buffer
fscanf(f, "%d %d", &A[i].from, &A[i].to);\
//Print buffer information to sim_output
fprintf(f2, "----------------------------\n");
fprintf(f2, "New Lift Request from Floor %d to Floor %d \n", A[i].from, A[i].to);
fprintf(f2, "Request No %d \n", i);
fprintf(f2, "----------------------------\n");
}
printf("Buffer is full");
fclose(f);
fclose(f2);
pthread_mutex_unlock(&lock);
return NULL;
}
void main(int argc, char *argv[]) // to avoid segmentation fault
{
size = atoi(argv[0]);
if (!(size >= 1))
{
printf("buffer size too small\n");
exit(0);
}
else
{
A[size].from = NULL;
A[size].to = NULL;
}
sec = atoi(argv[1]);
pthread_t Lift_R, lift_1, lift_2, lift_3;
pthread_create(&Lift_R, NULL, request_t, NULL);
pthread_join(Lift_R, NULL);
pthread_create(&lift_1, NULL, lift, NULL);
pthread_join(lift_1, NULL);
pthread_create(&lift_2, NULL, lift, NULL);
pthread_join(lift_2, NULL);
pthread_create(&lift_3, NULL, lift, NULL);
pthread_join(lift_3, NULL);
}
And here is the struct files:
#include <stdbool.h>
typedef struct Buffer
{
int from;
int to;
}buffer_t; //buffer arrary to store from and to values from sim_input
typedef struct Output
{
int l; //lift number
int p; //previous floor
int rf; //request from
int rt; //request to
int total_m; //total movement
int c; // current position
int m; //movement
int total_r; //total requests made
}write_t;
Between reading your code and question I see a large conceptual gap. There are some technical problems in the code (eg. you never fclose out); and a hard to follow sequence.
So, this pattern:
pthread_create(&x, ?, func, arg);
pthread_join(x, ...);
Can be replaced with:
func(arg);
so, your really aren't multithreaded at all; it is exactly as if:
void main(int argc, char *argv[]) // to avoid segmentation fault
{
size = atoi(argv[0]);
if (!(size >= 1))
{
printf("buffer size too small\n");
exit(0);
}
else
{
A[size].from = NULL;
A[size].to = NULL;
}
sec = atoi(argv[1]);
request_t(0);
lift(0);
lift(0);
lift(0);
}
and, knowing that, I hope you can see the futility in:
pthread_mutex_lock(&lock);
....
pthread_mutex_unlock(&lock);
So, start with a bit of a rethink of what you are doing. It sounds like you have a lift device which needs to take inbound requests, perhaps sort them, then process them. Likely 'forever'.
This probably means a sorted queue; however one not sorted by an ordinary criteria. The lift traverses the building in both directions, but means to minimize changes in direction. This involves traversing the queue with both an order ( >, < ) and a current-direction.
You would likely want request to simply evaluate the lift graph, and determine where to insert the new request.
The lift graph would be a unidirectional list of where the lift goes next. And, perhaps a rule that the list only consults its list as it stops at a given floor.
So, the Request can take a lock of the graph, alter it to reflect the new requestion, then unlock it.
The Lift can simply:
while (!Lift_is_decommissioned) {
pthread_mutex_lock(&glock);
Destination = RemoveHead(&graph);
pthread_mutex_unlock(&glock);
GoTo(Destination);
}
And the Request can be:
pthread_mutex_lock(&glock);
NewDestination = NewEvent.floorpressed;
NewDirection = NewEvent.floorpressed > NewEvent.curfloor ? Up : Down;
i = FindInsertion(&graph, NewDestination, NewDirection);
InsertAt(&graph, i, NewDestination);
pthread_mutex_unlock(&glock);
Which may be a bit surprising that there is no difference between pressing a "goto floor" button from within the lift, and a "I want lift here now" from outside the lift.
But, with this sort of separation, you can have the lift simply follow the recipe above, and the handlers for the buttons invoke the other pseudo code above.
The FindInsertion() may be a bit hairy though....

Some threads never get execution when invoked in large amount

Consider the following program,
static long count = 0;
void thread()
{
printf("%d\n",++count);
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t,NULL,(void *(*)(void*))thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
This program is expected to print 1 to 30000. But it some times prints 29945, 29999, 29959, etc. Why this is happening?
Because count isn't atomic, so you have a race condition both in the increment and in the subsequent print.
The instruction you need is atomic_fetch_add, to increment the counter and avoid the race condition. The example on cppreference illustrates the exact problem you laid out.
Your example can be made to work with just a minor adjustment:
#include <stdio.h>
#include <signal.h>
#include <sys/resource.h>
#include <pthread.h>
#include <stdatomic.h>
static atomic_long count = 1;
void * thread(void *data)
{
printf("%ld\n", atomic_fetch_add(&count, 1));
return NULL;
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t, NULL, thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
I made a handful of other changes, such as fixing the thread function signature and using the correct printf format for printing longs. But the atomic issue is why you weren't printing all the numbers you expected.
Why this is happening?
Because you have a data race (undefined behavior).
In particular, this statement:
printf("%d\n",++count);
modifies a global (shared) variable without any locking. Since the ++ does not atomically increment it, it's quite possible for multiple threads to read the same value (say 1234), increment it, and store the updated value in parallel, resulting in 1235 being printed repeatedly (two or more times), and one or more of the increments being lost.
A typical solution is to either use mutex to avoid the data race, or (rarely) an atomic variable (which guarantees atomic increment). Beware: atomic variables are quite hard to get right. You are not ready to use them yet.

C gettimeofday Doesn't Work Properly With GoLang

I've a simple C function which I call from GoLang. In my C function I'm using gettimeofday function. I understand that this function doesn't give accurate time and I'm okay with that as I only need to get some idea. I just want to get the time difference to run a big for loop.When I run the C function from a C program, it works fine. The for loop takes around 3 seconds to finish. But if I call the same function from GoLang, it looks like that it doesn't take any time in the for loop The program finishes immediately.
Here goes my code.
timetest.h
#include <sys/time.h>
#include<stdio.h>
int TimeTest(){
int i=0;
int j = 1000000000;
unsigned long long startMilli=0;
unsigned long long endMilli=0;
struct timeval te;
gettimeofday(&te, NULL);
startMilli = te.tv_sec*1000LL + te.tv_usec/1000;
for (i=0; i<1000000000; i++){
if (j-1 == i){
printf("matched\n");
}
}
gettimeofday(&te, NULL);
endMilli = te.tv_sec*1000LL + te.tv_usec/1000;
printf("Start = %d End = %d Total time %d\n", startMilli, endMilli, endMilli - startMilli);
return endMilli - startMilli;
}
test.go
package main
// #include <sys/time.h>
// #include "timetest.h"
import "C"
import (
"fmt"
)
func main(){
diff := C.TimeTest()
fmt.Println(diff)
}
Main.c
#include"timetest.h"
int main(){
int diff = TimeTest();
printf("%d\n", diff);
return 0;
}

How do we extend the program from 4 to 8 threads in C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have started learning C in Uni, and now I'm stuck on posix threads. I have a program that has a single thread, 2 threads and 4 threads as an example from lecture. I need your help to extend this program from 4 to 8/16/32 and how it will perform a difference or not?
Thank you in advance.
Here is the code for 4 thread programm:
/****************************************************************************
This program finds groups of three numbers that when multiplied together
equal 98931313. Compile with:
cc -o factorise4 factorise4.c -lrt -pthread
Kevan Buckley, University of Wolverhampton, October 2012
*****************************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <errno.h>
#include <sys/stat.h>
#include <string.h>
#include <time.h>
#include <pthread.h>
#include <math.h>
#define goal 98931313
typedef struct arguments {
int start;
int n;
} arguments_t;
void factorise(int n) {
pthread_t t1, t2, t3, t4;
//1st pthread
arguments_t t1_arguments;
t1_arguments.start = 0;
t1_arguments.n = n;
//2nd pthread
arguments_t t2_arguments;
t2_arguments.start = 250;
t2_arguments.n = n;
//3rd pthread
arguments_t t3_arguments;
t3_arguments.start = 500;
t3_arguments.n = n;
//4th pthread
arguments_t t4_arguments;
t4_arguments.start = 750;
t4_arguments.n = n;
void *find_factors();
//creating threads
pthread_create(&t1, NULL, find_factors, &t1_arguments);
pthread_create(&t2, NULL, find_factors, &t2_arguments);
pthread_create(&t3, NULL, find_factors, &t3_arguments);
pthread_create(&t4, NULL, find_factors, &t4_arguments);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_join(t3, NULL);
pthread_join(t4, NULL);
}
//Using 3 loops, 1 loop represents one value that we need to find, and go throught it until 98931313 not will be find.
void *find_factors(arguments_t *args){
int a, b, c;
for(a=args->start;a<args->start+250;a++){
for(b=0;b<1000;b++){
for(c=0;c<1000;c++){
if(a*b*c == args->n){
printf("solution is %d, %d, %d\n", a, b, c);// Printing out the answer
}
}
}
}
}
// Calculate the difference between two times.
long long int time_difference(struct timespec *start, struct timespec *finish, long long int *difference) {
long long int ds = finish->tv_sec - start->tv_sec;
long long int dn = finish->tv_nsec - start->tv_nsec;
if(dn < 0 ) {
ds--;
dn += 1000000000;
}
*difference = ds * 1000000000 + dn;
return !(*difference > 0);
}
//Prints elapsed time
int main() {
struct timespec start, finish;
long long int time_elapsed;
clock_gettime(CLOCK_MONOTONIC, &start);
factorise(goal); //This is our goal = 98931313
clock_gettime(CLOCK_MONOTONIC, &finish);
time_difference(&start, &finish, &time_elapsed);
printf("Time elaipsed was %lldns or %0.9lfs\n", time_elapsed, (time_elapsed/1.0e9));
return 0;
}
I'll give you a hint:
If you call a function twice manually, you can put its results into two separate variables:
int y0 = f(0);
int y1 = f(1);
You as well can put them into one array:
int y[2];
y[0] = f(0);
y[1] = f(1);
Or into a memory area on heap (obtained via malloc()):
int * y = malloc(2 * sizeof(*y));
y[0] = f(0);
y[1] = f(1);
In the latter two cases, you can replace the two function calls with
for (i = 0; i < 2; i++) {
y[i] = f(i);
}
Another hint:
For a changed number of threads, you will as well have to change your parameter set.
And another hint:
Thread creation, in your case, can be put into a function:
void facthread_create(pthread_t * thread, int start, int n)
{
arguments_t arguments;
arguments.start = start;
arguments.n = n;
void *find_factors();
//creating thread
pthread_create(thread, NULL, find_factors, &arguments);
}
But - there is a caveat: we have a race condition here. As soon as the thread starts, we can return and the stack space occupied by arguments is freed. So we use an improved version here which is useful for cooperation:
We add a field to arguments_t:
typedef struct arguments {
char used;
int start;
int n;
} arguments_t;
We set used to 0:
void facthread_create(pthread_t * thread, int start, int n)
{
arguments_t arguments;
arguments.start = start;
arguments.n = n;
arguments.used = 0;
void *find_factors();
//creating thread
pthread_create(thread, NULL, find_factors, &arguments);
while (!arguments.used); // wait until thread has "really" started
}
Set used to 1 once the data has safely copied:
void *find_factors(arguments_t *args){
arguments_t my_args = *args; // is this valid? Don't remember... If not, do it element-wise...
*args.used = 1; // Inform the caller that it is safe to continue
int a, b, c;
for(a=my_args.start;a<my_args.start+250;a++){
...
You should get a command line parameter (maybe -t for threads). Then instead of calling factorise from main, have a for loop which does the thread create with the parameter which is calculated from the loop number. Something like:
for (int i = 0; i < threads; i++) {
arguments.start = 250 * i;
arguments.n = n;
pthread_start(...)
}
Note that you should allocate the argument structs before the for loop for clarity.
Let me know if you need more help.
Here is some more help:
0) get the number of threads and the skip (in your case 250) from the command line.
1) create a control stuct which contains the args for the thread, the thread id, etc.
2) using the args, allocate the control struct and fill it in.
3) do a for loop to spawn off the treads.
4) do another for loop to wait for the threads to complete.
For some extra complexity, you could introduce a global variable which any thread could set to signal the other threads that the work is done and they should exit. But don't do this until you get the simple case correct.
If you post some updated code, I will help you some more.

Resources