The goal of this program is to copy a string taken in by user input word for word using multithreading. Each thread copies every fourth word so for instance the first thread copies the first and fifth words, the second copies the second and sixth words, etc. I have done quite a bit of research on mutex and I believe I have implemented the mutex lock properly however the string still comes up as jumbled nonsense when it prints. Can someone shed some light as to why the threads aren't synchronizing?
#include <stdio.h>
#include <pthread.h>
#include <string.h>
#include <stdlib.h>
void *processString(void *);
char msg1[100];
char msg2[100];
char * reg;
char * token;
char * tokens[10];
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t = PTHREAD_COND_INITIALIZER;
int main(){
int i = 0, j;
pthread_t workers[4];
printf("Please input a string of words separated by whitespace characters: \n");
scanf("%99[^\n]", msg1); //take in a full string including whitespace characters
//tokenize string into individual words
token = strtok(msg1, " ");
while(token != NULL){
tokens[i] = (char *) malloc (sizeof(token));
tokens[i] = token;
token = strtok(NULL, " ");
i++;
}
for(j = 0; j < 4; j++){
if(pthread_create(&workers[j], NULL, processString, (void *) j))
printf("Error creating pthreads");
}
for(i = 0; i < 4; i++){
pthread_join(workers[i], NULL);
}
pthread_mutex_destroy(&lock);
printf("%s\n", msg2);
return 0;
}
//each thread copies every fourth word
void *processString(void *ptr){
int j = (int) ptr, i = 0;
pthread_mutex_lock(&lock);
while(tokens[i * 4 + j] != NULL){
reg = (char *) malloc (sizeof(tokens[i * 4 + j]));
reg = tokens[i * 4 + j];
strcat(msg2, reg);
strcat(msg2, " ");
i++;
}
pthread_mutex_unlock(&lock);
return NULL;
}
As #EOF wrote in comments, mutexes provide only mutual exclusion. They prevent multiple cooperating threads from running concurrently, but they do not inherently provide any control over the order in which they are acquired by such threads. Additionally, as I described in comments myself, mutexes do provide mutual exclusion: if one thread holds a mutex then no other thread will be able to acquire that mutex, nor proceed past an attempt to do so, until that mutex is released.
There is no native synchronization object that provides directly for making threads take turns. That's not usually what you want threads to do, after all. You can arrange for it with semaphores, but that gets messy quickly as you add more threads. A pretty clean solution involves using a shared global variable to indicate which thread's turn it is to run. Access to that variable must be protected by a mutex, since all threads involved must both read and write it, but there's a problem with that: what if the thread that currently holds the mutex is not the one whose turn it is to run?
It is possible for all the threads to loop, continuously acquiring the mutex, testing the variable, and either proceeding or releasing the mutex. Unfortunately, such a busy wait tends to perform very poorly, and in general, you can't be confident that the thread that can make progress at any given point in the execution will manage to acquire the mutex in bounded time.
This is where condition variables come in. A condition variable is a synchronization object that permits any number of threads to suspend activity until some condition is satisfied, as judged by another, non-suspended thread. Using such a tool avoids the performance-draining busy-wait, and in your case it can help ensure that all your threads get their chance to run in bounded time. The general-purpose per-thread usage model for condition variables is as follows:
acquire the mutex protecting the shared variable(s) by which to judge whether I can proceed
Test whether I can proceed. If so, jump to step 5.
I can't proceed right now. Perform a wait on the condition variable.
I have awakened from the wait; go back to step 2.
Do the work I need to do.
Broadcast a signal to wake all threads waiting on the condition variable.
Release the mutex.
Variations on that are possible, but I recommend that you do not vary from it until and unless you know exactly why you want to do so, and exactly why the variation you have in mind is safe. Note, too, that when a thread performs a wait on a condition variable associated with a given mutex, it automatically releases that mutex while it waits, and re-acquires it before returning from the wait. This allows other threads to proceed in the meantime, and, in particular, to wait on the same condition variable.
As it applies to your problem, the shared state you want your threads to test is the aforementioned variable that indicates which thread's turn it is, and the the condition you want your threads to wait on is that it has become a different thread's turn (but this is implicit in the way you use the condition variable; condition variables themselves are generic). Note also that this means that part of the work each thread must do before signaling the other threads is to update which thread's turn it is. And since each thread may need to take multiple turns, you will want to wrap the whole procedure in a loop.
Related
I have a program in which multiple threads are in a loop where they acquire a binary semaphore and then increase a global counter. However, by printing out the thread IDs, I notice that only one thread ever acquires the semaphore. Here's my MRE:
#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <semaphore.h>
#define NUM_THREADS 10
#define MAX_COUNTER 100
struct threadCtx {
sem_t sem;
unsigned int counter;
};
static void *
threadFunc(void *args)
{
struct threadCtx *ctx = args;
pthread_t self;
bool done = false;
self = pthread_self();
while (!done) {
sem_wait(&ctx->sem);
if ( ctx->counter == MAX_COUNTER ) {
done = true;
}
else {
sleep(1);
printf("Thread %u increasing the counter to %u\n", (unsigned int)self, ++ctx->counter);
}
sem_post(&ctx->sem);
}
return NULL;
}
int main() {
pthread_t threads[NUM_THREADS];
struct threadCtx ctx = {.counter = 0};
sem_init(&sem.ctx, 0, 1);
for (int k=0; k<NUM_THREADS; k++) {
pthread_create(threads+k, NULL, threadFunc, &ctx);
}
for (int k=0; k<NUM_THREADS; k++) {
pthread_join(threads[k], NULL);
}
sem_destroy(&ctx.sem);
return 0;
}
The output is
Thread 1004766976 increasing the counter to 1
Thread 1004766976 increasing the counter to 2
Thread 1004766976 increasing the counter to 3
...
If I remove the call to sleep, the behavior is closer to what I would expect (i.e., the threads being woken up in a seemingly indeterminate manner). Why would this be?
David Schwartz's answer explains what is happening at a low level. That is to say, he's looking at it from the perspective of an OS developer or a hardware designer. Nothing wrong with that, but let's look at your program from the perspective of a Software Architect:
You've got multiple threads all executing the same loop. The loop locks the mutex,* it does some "work," and then it releases the mutex. OK, but what does it do next? Almost the very next thing that your loop does after releasing the mutex is it locks the mutex again. Your loop spends practically 100% of its time doing "work" with the mutex locked.
So, what's the point of running that same loop in multiple threads when there's never any opportunity for two or more threads to work at the same time?
If you want to use threads to do a parallel computation, you need to find/invent safe ways for the threads to do most of their work with the mutex unlocked. They should only lock a mutex for just long enough to post a result or, to take another assignment.
Sometimes that means writing code that is less efficient than single threaded code would be. But suppose that program (A) has a single thread that makes almost 100% use of a CPU, while program (B) uses eight CPUs but only uses them with 50% efficiency. Which program is going to win?
* I know, your example uses a sem_t (semaphore) object. But "semaphore" is what you are using. "Mutex" is the role in which you are using it.
Why would this be?
Context switches are expensive and your implementation is, wisely, minimizing them. Your threads are all fighting over the same resource, trying to schedule them closely will make performance much worse, probably for the entire system.
Since the thread that keeps getting the semaphore never uses up its timeslice, it will keep getting the resource. It is your responsibility to write code to do the work that you want done. It's the implementation's responsibility to execute your code as efficiently as it can, and that's what it's doing.
Most likely, what's going under the hood is this:
The thread that keeps getting the sempahore can always make forward progress except when it is sleeping. But when it is sleeping, no other thread that needs the sempahore can make forward progress.
The thread that keeps getting the semaphore never exhausts its timeslice because it sleeps before that happens.
So there is no reason for the implementation to ever block this thread other than when it is sleeping, meaning that no other thread can get the semaphore. If you don't want this thread to keep sleeping with the semaphore and blocking other threads, then write different code.
This question already has an answer here:
Pthread_create() incorrect start routine parameter passing
(1 answer)
Closed 3 years ago.
I tried to build a program which should create threads and assign a Print function to each one of them, while the main process should use printf function directly.
Firstly, I made it without any synchronization means and expected to get a randomized output.
Later I tried to add a mutex to the Print function which was assigned to the threads and expected to get a chronological output but it seems like the mutex had no effect about the output.
Should I use a mutex on the printf function in the main process as well?
Thanks in advance
My code:
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
pthread_t threadID[20];
pthread_mutex_t lock;
void* Print(void* _num);
int main(void)
{
int num = 20, indx = 0, k = 0;
if (pthread_mutex_init(&lock, NULL))
{
perror("err pthread_mutex_init\n");
return errno;
}
for (; indx < num; ++indx)
{
if (pthread_create(&threadID[indx], NULL, Print, &indx))
{
perror("err pthread_create\n");
return errno;
}
}
for (; k < num; ++k)
{
printf("%d from main\n", k);
}
indx = 0;
for (; indx < num; ++indx)
{
if (pthread_join(threadID[indx], NULL))
{
perror("err pthread_join\n");
return errno;
}
}
pthread_mutex_destroy(&lock);
return 0;
}
void* Print(void* _indx)
{
pthread_mutex_lock(&lock);
printf("%d from thread\n", *(int*)_indx);
pthread_mutex_unlock(&lock);
return NULL;
}
All questions of program bugs notwithstanding, pthreads mutexes provide only mutual exclusion, not any guarantee of scheduling order. This is typical of mutex implementations. Similarly, pthread_create() only creates and starts threads; it does not make any guarantee about scheduling order, such as would justify an assumption that the threads reach the pthread_mutex_lock() call in the same order that they were created.
Overall, if you want to order thread activities based on some characteristic of the threads, then you have to manage that yourself. You need to maintain a sense of which thread's turn it is, and provide a mechanism sufficient to make a thread notice when it's turn arrives. In some circumstances, with some care, you can do this by using semaphores instead of mutexes. The more general solution, however, is to use a condition variable together with your mutex, and some shared variable that serves as to indicate who's turn it currently is.
The code passes the address of the same local variable to all threads. Meanwhile, this variable gets updated by the main thread.
Instead pass it by value cast to void*.
Fix:
pthread_create(&threadID[indx], NULL, Print, (void*)indx)
// ...
printf("%d from thread\n", (int)_indx);
Now, since there is no data shared between the threads, you can remove that mutex.
All the threads created in the for loop have different value of indx. Because of the operating system scheduler, you can never be sure which thread will run. Therefore, the values printed are in random order depending on the randomness of the scheduler. The second for-loop running in the parent thread will run immediately after creating the child threads. Again, the scheduler decides the order of what thread should run next.
Every OS should have an interrupt (at least the major operating systems have). When running the for-loop in the parent thread, an interrupt might happen and leaves the scheduler to make a decision of which thread to run. Therefore, the numbers being printed in the parent for-loop are printed randomly, because all threads run "concurrently".
Joining a thread means waiting for a thread. If you want to make sure you print all numbers in the parent for loop in chronological order, without letting child thread interrupt it, then relocate the for-loop section to be after the thread joining.
I'm currently learning about concurrency at my University. In this context I have to implement the reader/writer problem in C, and I think I'm on the right track.
My thought on the problem is, that we need two locks rd_lock and wr_lock. When a writer thread wants to change our global variable, it tries to grab both locks, writes to the global and unlocks. When a reader wants to read the global, it checks if wr_lock is currently locked, and then reads the value, however one of the reader threads should grab the rd_lock, but the other readers should not care if rd_lock is locked.
I am not allowed to use the implementation already in the pthread library.
typedef struct counter_st {
int value;
} counter_t;
counter_t * counter;
pthread_t * threads;
int readers_tnum;
int writers_tnum;
pthread_mutex_t rd_lock;
pthread_mutex_t wr_lock;
void * reader_thread() {
while(true) {
pthread_mutex_lock(&rd_lock);
pthread_mutex_trylock(&wr_lock);
int value = counter->value;
printf("%d\n", value);
pthread_mutex_unlock(&rd_lock);
}
}
void * writer_thread() {
while(true) {
pthread_mutex_lock(&wr_lock);
pthread_mutex_lock(&rd_lock);
// TODO: increment value of counter->value here.
counter->value += 1;
pthread_mutex_unlock(&rd_lock);
pthread_mutex_unlock(&wr_lock);
}
}
int main(int argc, char **args) {
readers_tnum = atoi(args[1]);
writers_tnum = atoi(args[2]);
pthread_mutex_init(&rd_lock, 0);
pthread_mutex_init(&wr_lock, 0);
// Initialize our global variable
counter = malloc(sizeof(counter_t));
counter->value = 0;
pthread_t * threads = malloc((readers_tnum + writers_tnum) * sizeof(pthread_t));
int started_threads = 0;
// Spawn reader threads
for(int i = 0; i < readers_tnum; i++) {
int code = pthread_create(&threads[started_threads], NULL, reader_thread, NULL);
if (code != 0) {
printf("Could not spawn a thread.");
exit(-1);
} else {
started_threads++;
}
}
// Spawn writer threads
for(int i = 0; i < writers_tnum; i++) {
int code = pthread_create(&threads[started_threads], NULL, writer_thread, NULL);
if (code != 0) {
printf("Could not spawn a thread.");
exit(-1);
} else {
started_threads++;
}
}
}
Currently it just prints a lot of zeroes, when run with 1 reader and 1 writer, which means, that it never actually executes the code in the writer thread. I know that this is not going to work as intended with multiple readers, however I don't understand what is wrong, when running it with one of each.
Don't think of the locks as "reader lock" and "writer lock".
Because you need to allow multiple concurrent readers, readers cannot hold a mutex. (If they do, they are serialized; only one can hold a mutex at the same time.) They can take one for a short duration (before they begin the access, and after they end the access), to update state, but that's it.
Split the timeline for having a rwlock into three parts: "grab rwlock", "do work", "release rwlock".
For example, you could use one mutex, one condition variable, and a counter. The counter holds the number of active readers. The condition variable is signaled on by the last reader, and by writers just before they release the mutex, to wake up a waiting writer. The mutex protects both, and is held by writers for the whole duration of their write operation.
So, in pseudocode, you might have
Function rwlock_rdlock:
Take mutex
Increment counter
Release mutex
End Function
Function rwlock_rdunlock:
Take mutex
Decrement counter
If counter == 0, Then:
Signal_on cond
End If
Release mutex
End Function
Function rwlock_wrlock:
Take mutex
While counter > 0:
Wait_on cond
End Function
Function rwlock_unlock:
Signal_on cond
Release mutex
End Function
Remember that whenever you wait on a condition variable, the mutex is atomically released for the duration of the wait, and automatically grabbed when the thread wakes up. So, for waiting on a condition variable, a thread will have the mutex both before and after the wait, but not during the wait itself.
Now, the above approach is not the only one you might implement.
In particular, you might note that in the above scheme, there is a different "unlock" operation you must use, depending on whether you took a read or a write lock on the rwlock. In POSIX pthread_rwlock_t implementation, there is just one pthread_rwlock_unlock().
Whatever scheme you design, it is important to examine it whether it works right in all situations: a lone read-locker, a lone-write-locker, several read-lockers, several-write-lockers, a lone write-locker and one read-locker, a lone write-locker and several read-lockers, several write-lockers and a lone read-locker, and several read- and write-lockers.
For example, let's consider the case when there are several active readers, and a writer wants to write-lock the rwlock.
The writer grabs the mutex. It then notices that the counter is nonzero, so it starts waiting on the condition variable. When the last reader -- note how the order of the readers exiting does not matter, since a simple counter is used! -- unlocks its readlock on the rwlock, it signals on the condition variable, which wakes up the writer. The writer then grabs the mutex, sees the counter is zero, and proceeds to do its work. During that time, the mutex is held by the writer, so all new readers will block, until the writer releases the mutex. Because the writer will also signal on the condition variable when it releases the mutex, it is a race between other waiting writers and waiting readers, who gets to go next.
I'm creating a multi-thread program in C and I've some troubles.
There you have the function which create the threads :
void create_thread(t_game_data *game_data)
{
size_t i;
t_args *args = malloc(sizeof(t_args));
i = 0;
args->game = game_data;
while (i < 10)
{
args->initialized = 0;
args->id = i;
printf("%zu CREATION\n", i);//TODO: Debug
pthread_create(&game_data->object[i]->thread_id, NULL, &do_action, args);
i++;
while (args->initialized == 0)
continue;
}
}
Here you have my args struct :
typedef struct s_args
{
t_game_data *object;
size_t id;
int initialized;
}args;
And finally, the function which handle the created threads
void *do_action(void *v_args)
{
t_args *args;
t_game_data *game;
size_t id;
args = v_args;
game = args->game;
id = args->id;
args->initialized = 1;
[...]
return (NULL);
}
The problem is :
The main thread will create new thread faster than the new thread can init his variables :
args = v_args;
game = args->game;
id = args->id;
So, sometime, 2 different threads will get same id from args->id.
To solve that, I use an variable initialized as a bool so make "sleep" the main thread during the new thread's initialization.
But I think that is really sinful.
Maybe there is a way to do that with a mutex? But I heard it wasn't "legal" to unlock a mutex which does not belong his thread.
Thanks for your answers!
The easiest solution to this problem would be to pass a different t_args object to each new thread. To do that, move the allocation inside the loop, and make each thread responsible for freeing its own argument struct:
void create_thread(t_game_data *game_data) {
for (size_t i = 0; i < 10; i++) {
t_args *args = malloc(sizeof(t_args));
if (!args) {
/* ... handle allocation error ... */
} else {
args->game = game_data;
args->id = i;
printf("%zu CREATION\n", i);//TODO: Debug
if (pthread_create(&game_data->object[i]->thread_id, NULL,
&do_action, args) != 0) {
// thread creation failed
free(args);
// ...
}
}
}
}
// ...
void *do_action(void *v_args) {
t_args *args = v_args;
t_game_data *game = args->game;
size_t id = args->id;
free(v_args);
args = v_args = NULL;
// ...
return (NULL);
}
But you also write:
To solve that, I use an variable initialized as a bool so make "sleep"
the main thread during the new thread's initialization.
But I think that is really sinful. Maybe there is a way to do that
with a mutex? But I heard it wasn't "legal" to unlock a mutex which
does not belong his thread.
If you nevertheless wanted one thread to wait for another thread to modify some data, as your original strategy requires, then you must employ either atomic data or some kind of synchronization object. Your code otherwise contains a data race, and therefore has undefined behavior. In practice, you cannot assume in your original code that the main thread will ever see the new thread's write to args->initialized. "Sinful" is an unusual way to describe that, but maybe appropriate if you belong to the Church of the Holy C.
You could solve that problem with a mutex by protecting just the test of args->initialized in your loop -- not the whole loop -- with a mutex, and protecting the threads' write to that object with the same mutex, but that's nasty and ugly. It would be far better to wait for the new thread to increment a semaphore (not a busy wait, and the initialized variable is replaced by the semaphore), or to set up and wait on a condition variable (again not a busy wait, but the initialized variable or an equivalent is still needed).
The problem is that in create_thread you are passing the same t_args structure to each thread. In reality, you probably want to create your own t_args structure for each thread.
What's happening is your 1st thread is starting up with the args passed to it. Before that thread can run do_action the loop is modifying the args structure. Since thread2 and thread1 will both be pointing to the same args structure, when they run do_action they will have the same id.
Oh, and don't forget to not leak your memory
Your solution should work in theory except for a couple of major problems.
The main thread will sit spinning in the while loop that checks the flag using CPU cycles (this is the least bad problem and can be OK if you know it won't have to wait long)
Compiler optimisers can get trigger happy with respect to empty loops. They are also often unaware that a variable may get modified by other threads and can make bad decisions on that basis.
On multi core systems, the main thread may never see the change to args->initiialzed or at least not until much later if the change is in the cache of another core that hasn't been flushed back to main memory yet.
You can use John Bollinger's solution that mallocs a new set of args for each thread and it is fine. The only down side is a malloc/free pair for each thread creation. The alternative is to use "proper" synchronisation functions like Santosh suggests. I would probably consider this except I would use a semaphore as being a bit simpler than a condition variable.
A semaphore is an atomic counter with two operations: wait and signal. The wait operation decrements the semaphore if its value is greater than zero, otherwise it puts the thread into a wait state. The signal operation increments the semaphore, unless there are threads waiting on it. If there are, it wakes one of the threads up.
The solution is therefore to create a semaphore with an initial value of 0, start the thread and wait on the semaphore. The thread then signals the semaphore when it is finished with the initialisation.
#include <semaphore.h>
// other stuff
sem_t semaphore;
void create_thread(t_game_data *game_data)
{
size_t i;
t_args args;
i = 0;
if (sem_init(&semaphore, 0, 0) == -1) // third arg is initial value
{
// error
}
args.game = game_data;
while (i < 10)
{
args.id = i;
printf("%zu CREATION\n", i);//TODO: Debug
pthread_create(&game_data->object[i]->thread_id, NULL, &do_action, args);
sem_wait(&semaphore);
i++;
}
sem_destroy(&semaphore);
}
void *do_action(void *v_args) {
t_args *args = v_args;
t_game_data *game = args->game;
size_t id = args->id;
sem_post(&semaphore);
// Rest of the thread work
return NULL;
}
Because of the synchronisation, I can reuse the args struct safely, in fact, I don't even need to malloc it - it's small so I declare it local to the function.
Having said all that, I still think John Bollinger's solution is better for this use-case but it's useful to be aware of semaphores generally.
You should consider using condition variable for this. You can find an example here http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_28.html.
Basically wait in the main thread and signal in your other threads.
I'd like to create multi-threads program in C (Linux) with:
Infinite loop with infinite number of tasks
One thread per one task
Limit the total number of threads, so if for instance total threads number is more then MAX_THREADS_NUMBER, do sleep(), until total threads number become less then MAX_THREADS_NUMBER, continue after.
Resume: I need to do infinite number of tasks(one task per one thread) and I'd like to know how to implement it using pthreads in C.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THREADS 50
pthread_t thread[MAX_THREADS];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
pthread_mutex_lock(&lock);
counter += 1;
printf("Job %d started\n", counter);
pthread_mutex_unlock(&lock);
return NULL;
}
int main(void)
{
int i = 0;
int ret;
if (pthread_mutex_init(&lock, NULL) != 0)
{
printf("\n mutex init failed\n");
return 1;
}
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
// Wait all threads to finish
for (i = 0; i < MAX_THREADS; i++) {
pthread_join(thread[i], NULL);
}
pthread_mutex_destroy(&lock);
return 0;
}
How to make this loop infinite?
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
I need something like this:
while (1) {
if (thread_number > MAX_THREADS_NUMBER)
sleep(1);
ret = pthread_create(...);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
Your current program is based on a simple dispatch design: the initial thread creates worker threads, assigning each one a task to perform. Your question is, how you make this work for any number of tasks, any number of worker threads. The answer is, you don't: your chosen design makes such a modification basically impossible.
Even if I were to answer your stated questions, it would not make the program behave the way you'd like. It might work after a fashion, but it'd be like a bicycle with square wheels: not very practical, nor robust -- not even fun after you stop laughing at how silly it looks.
The solution, as I wrote in a comment to the original question, is to change the underlying design: from a simple dispatch to a thread pool approach.
Implementing a thread pool requires two things: First, is to change your viewpoint from starting a thread and having it perform a task, to each thread in the "pool" grabbing a task to perform, and returning to the "pool" after they have performed it. Understanding this is the hard part. The second part, implementing a way for each thread to grab a new task, is simple: this typically centers around a data structure, protected with locks of some sort. The exact data structure does depend on what the actual work to do is, however.
Let's assume you wanted to parallelize the calculation of the Mandelbrot set (or rather, the escape time, or the number of iterations needed before a point can be ruled to be outside the set; the Wikipedia page contains pseudocode for exactly this). This is one of the "embarrassingly parallel" problems; those where the sub-problems (here, each point) can be solved without any dependencies.
Here's how I'd do the core of the thread pool in this case. First, the escape time or iteration count needs to be recorded for each point. Let's say we use an unsigned int for this. We also need the number of points (it is a 2D array), a way to calculate the complex number that corresponds to each point, plus some way to know which points have either been computed, or are being computed. Plus mutually exclusive locking, so that only one thread will modify the data structure at once. So:
typedef struct {
int x_size, y_size;
size_t stride;
double r_0, i_0;
double r_dx, i_dx;
double r_dy, i_dy;
unsigned int *iterations;
sem_t done;
pthread_mutex_t mutex;
int x, y;
} fractal_work;
When an instance of fractal_work is constructed, x_size and y_size are the number of columns and rows in the iterations map. The number of iterations (or escape time) for point x,y is stored in iterations[x+y*stride]. The real part of the complex coordinate for that point is r_0 + x*r_dx + y*r_dy, and imaginary part is i_0 + x*i_dx + y*i_dy (which allows you to scale and rotate the fractal freely).
When a thread grabs the next available point, it first locks the mutex, and copies the x and y values (for itself to work on). Then, it increases x. If x >= x_size, it resets x to zero, and increases y. Finally, it unlocks the mutex, and calculates the escape time for that point.
However, if x == 0 && y >= y_size, the thread posts on the done semaphore and exits, letting the initial thread know that the fractal is complete. (The initial thread just needs to call sem_wait() once for each thread it created.)
The thread worker function is then something like the following:
void *fractal_worker(void *data)
{
fractal_work *const work = (fractal_work *)data;
int x, y;
while (1) {
pthread_mutex_lock(&(work->mutex));
/* No more work to do? */
if (work->x == 0 && work->y >= work->y_size) {
sem_post(&(work->done));
pthread_mutex_unlock(&(work->mutex));
return NULL;
}
/* Grab this task (point), advance to next. */
x = work->x;
y = work->y;
if (++(work->x) >= work->x_size) {
work->x = 0;
++(work->y);
}
pthread_mutex_unlock(&(work->mutex));
/* z.r = work->r_0 + (double)x * work->r_dx + (double)y * work->r_dy;
z.i = work->i_0 + (double)x * work->i_dx + (double)y * work->i_dy;
TODO: implement the fractal iteration,
and count the iterations (say, n)
save the escape time (number of iterations)
in the work->iterations array; e.g.
work->iterations[(size_t)x + work->stride*(size_t)y] = n;
*/
}
}
The program first creates the fractal_work data structure for the worker threads to work on, initializes it, then creates some number of threads giving each thread the address of that fractal_work structure. It can then call fractal_worker() itself too, to "join the thread pool". (This pool automatically "drains", i.e. threads will return/exit, when all points in the fractal are done.)
Finally, the main thread calls sem_wait() on the done semaphore, as many times as it created worker threads, to ensure all the work is done.
The exact fields in the fractal_work structure above do not matter. However, it is at the very core of the thread pool. Typically, there is at least one mutex or rwlock protecting the work details, so that each worker thread gets unique work details, as well as some kind of flag or condition variable or semaphore to let the original thread know that the task is now complete.
In a multithreaded server, there is usually only one instance of the structure (or variables) describing the work queue. It may even contain things like minimum and maximum number of threads, allowing the worker threads to control their own number to dynamically respond to the amount of work available. This sounds magical, but is actually simple to implement: when a thread has completed its work, or is woken up in the pool with no work, and is holding the mutex, it first examines how many queued jobs there are, and what the current number of worker threads is. If there are more than the minimum number of threads, and no work to do, the thread reduces the number of threads, and exits. If there are less than the maximum number of threads, and there is a lot of work to do, the thread first creates a new thread, then grabs the next task to work on. (Yes, any thread can create new threads into the process. They are all on equal footing, too.)
A lot of the code in a practical multithreaded application using one or more thread pools to do work, is some sort of bookkeeping. Thread pool approaches very much concentrates on the data, and the computation needed to be performed on the data. I'm sure there are much better examples of thread pools out there somewhere; the hard part is to think of a good task for the application to perform, as the data structures are so task-dependent, and many computations are so simple that parallelizing them makes no sense (since creating new threads does have a small computational cost, it'd be silly to waste time creating threads when a single thread does the same work in the same or less time).
Many tasks that benefit from parallelization, on the other hand, require information to be shared between workers, and that requires a lot of thinking to implement correctly. (For example, although solutions exist for parallelizing molecular dynamics simulations efficiently, most simulators still calculate and exchange data in separate steps, rather than at the same time. It's just that hard to do right, you see.)
All this means that you cannot expect to be able to write the code, unless you understand the concept. Indeed, truly understanding the concepts are the hard part: writing the code is comparatively easy.
Even in the above example, there are certain tripping points: Does the order of posting the semaphore and releasing the mutex matter? (Well, it depends on what the thread that is waiting for the fractal to complete does -- and indeed, if it is waiting yet.) If it was a condition variable instead of a semaphore, it would be essential that the thread that is interested in the fractal completion is waiting on the condition variable, otherwise it would miss the signal/broadcast. (This is also why I used a semaphore.)