This question is a follow-up to a question which happened to be more complex than I had initially thought would be. In a program I'm writing the main thread takes care of GUI-driven data updates, a producer thread (with a number of sub-threads, because the producer task is "embarrassingly parallel") writes to the circular buffer, while the real-time consumer thread reads from it. Original platform of development was OSX/Darwin, but I'd like to make the code more portable, UNIX source compatible. Everything can easily be written in POSIX, except for the following OSX-specific GCD command for which I can't estimate a POSIX equivalent, if any. It launches the producer thread, from which its subthreads are being launched programmatically, depending on the number of available logical CPU cores:
void dproducer (bool on, int cpuNum, uData* data)
{
if (on == true)
{
data->state = starting;
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^{
producerSum(on, cpuNum, data);
});
}
return;
}
This is the block diagram of the program:
For clarity I'm also adding the producerSum code. It's an infinite loop whose execution can either wait for the consumer thread to do the work, or get interrupted by changing data->state, which has global scope:
void producerSum(bool on, int cpuNum, uData* data)
{
int rc;
pthread_t threads[cpuNum]; //subthreads
tData thread_args[cpuNum];
void* resulT;
static float frames [4096];
while(on){
memset(frames, 0, 4096*sizeof(float));
if( (fbuffW = (float**)calloc(cpuNum + 1, sizeof(float*)))!= NULL)
for (int i=0; i<cpuNum; ++i){
fbuffW[i] = (float*)calloc(data->frames, sizeof(float));
thread_args[i].tid = i; //ord. number of thread
thread_args[i].cpuCount = cpuNum; //counter increment step
thread_args[i].data = data;
rc = pthread_create(&threads[i], NULL, producerTN, (void *) &thread_args[i]);
if(rc != 0)printf("rc = %s\n",strerror(rc));
}
for (int i=0; i<cpuNum; ++i) rc = pthread_join(threads[i], &resulT);
//each subthread writes to fbuffW[i] and results get summed
for(UInt32 samp = 0; samp < data->frames; samp++)
for(int i = 0; i < cpuNum; i++)
frames[samp] += fbuffW[i][samp];
switch (data->state) { ... } //graceful interruption, resuming and termination mechanism
{ … } //simple code for copying frames[] to the circular buffer
pthread_cond_wait (&cond, &mutex);//wait for the real-time consumer
for(int i = 0; i < cpuNum; i++) free(fbuffW[i]); free(fbuffW);
} //end while(on)
return;
}
The syncing inside the producer thread is being successfully handled by pthread_create( ) and pthread_join( ), while necessary coordination between the producer and consumer threads is being successfully handled by a variable of pthread_mutex_t and a variable of pthread_cond_t (with corresponding locking, unlocking, broadcasting and waiting commands). uData is a program defined struct (or class instance). Any direction where to look at would help indeed.
Thanks for reading this post!
A dispatch queue is just what it sounds like: a queue, as in the standard FIFO list data structure. It holds tasks. Those tasks can be represented by Objective-C Blocks as in your code or by function pointers and context pointer values. You'll presumably need to avoid Blocks if you're aiming for cross-platform compatibility. In fact, since you only ever dispatch one task, your tasks can just encapsulate the parameters (on, cpuNum, and data) and not the code (the call to producerSum()).
The queues are serviced by threads from a thread pool. GCD manages the threads and pool. At least on OS X, there's integration with the OS such that the pool's size is governed by overall system load, which you won't be able to reproduce in a cross-platform manner.
Operations on a dispatch queue are thread-safe. This includes adding tasks to them and the worker threads removing tasks from them.
You're going to have to implement all of this. It's definitely possible, but it will be a bother. In many ways, the queues and the thread pool constitute a producer-consumer architecture. Basically, your GCD-based solution was a bit of a cheat because you just used a producer-consumer API to implement your producer-consumer design. Now, you're going to have to really implement a producer-consumer design without the crutch.
There's basically no more to it than the thread-creation and POSIX condition variables you're already using.
dispatch_async() is basically just locking the mutex for the queue of tasks, adding the task to the queue, signalling the condition variable, and unlocking the mutex. Each worker thread will just wait on the condition variable and, when it wakes, lock the mutex, pop a task off the queue if there's one, unlock the mutex, and run the task if it got one. You probably also need a mechanism to signal the worker thread that it's time to gracefully terminate.
Related
This might be a dumb question, i'm very sorry if that's the case. But i'm struggling to take advantage of the multiple cores in my computer to perform multiple computations at the same time in my Quad-Core MacBook. This is not for any particular project, just a general question, since i want to learn for when i eventually do need to do this kind of things
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
I'm also aware of processed that can be created with fork, but i'm nor sure they are guaranteed to use more CPU, or if they, like threads, just help with IO-bound operations.
Finally i'm aware of CUDA, allowing paralellism in the GPU (And i think OpenCL and Compute Shaders also allows my code to run in the CPU in parallel) but i'm currently looking for something that will allow me to take advantage of the multiple CPU cores that my computer has.
In python, i'm aware of the multiprocessing module, which seems to provide an API very similar to threads, but there i do seem to gain an edge by running multiple functions performing computations in parallel. I'm looking into how could i get this same advantage in C, but i don't seem to be able
Any help pointing me to the right direction would be very much appreciated
Note: I'm trying to achive true parallelism, not concurrency
Note 2: I'm only aware of threads and using multiple processes in C, with threads i don't seem to be able to win the performance boost i want. And i'm not very familiar with processes, but i'm still not sure if running multiple processes is guaranteed to give me the advantage i'm looking for.
A simple program to heat up your CPU (100% utilization of all available cores).
Hint: The thread starting function does not return, program exit via [CTRL + C]
#include <pthread.h>
void* func(void *arg)
{
while (1);
}
int main()
{
#define NUM_THREADS 4 //use the number of cores (if known)
pthread_t threads[NUM_THREADS];
for (int i=0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, func, NULL);
for (int i=0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
return 0;
}
Compilation:
gcc -pthread -o thread_test thread_test.c
If i start ./thread_test, all cores are at 100%.
A word to fork and pthread_create:
fork creates a new process (the current process image will be copied and executed in parallel), while pthread_create will create a new thread, sometimes called a lightweight process.
Both, processes and threads will run in 'parallel' to the parent process.
It depends, when to use a child process over a thread, e.g. a child is able to replace its process image (via exec family) and has its own address space, while threads are able to share the address space of the current parent process.
There are of course a lot more differences, for that i recommend to study the following pages:
man fork
man pthreads
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
No, they don't. Except if you block and your threads don't block, you'll see alll of them running. Just try this (beware that this consumes all your cpu time) that starts 16 threads each counting in a busy loop for 60 s. You will see all of them running and makins your cores to fail to their knees (it runs only a minute this way, then everything ends):
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 16 /* had 16 cores, so I used this. Put here as many
* threads as cores you have. */
struct thread_data {
pthread_t thread_id; /* the thread id */
struct timespec end_time; /* time to get out of the tunnel */
int id; /* the array position of the thread */
unsigned long result; /* number of times looped */
};
void *thread_body(void *data)
{
struct thread_data *p = data;
p->result = 0UL;
clock_gettime(CLOCK_REALTIME, &p->end_time);
p->end_time.tv_sec += 60; /* 60 s. */
struct timespec now;
do {
/* just get the time */
clock_gettime(CLOCK_REALTIME, &now);
p->result++;
/* if you call printf() you will see them slowing, as there's a
* common buffer that forces all thread to serialize their outputs
*/
/* check if we are over */
} while ( now.tv_sec < p->end_time.tv_sec
|| now.tv_nsec < p->end_time.tv_nsec);
return p;
} /* thread_body */
int main()
{
struct thread_data thrd_info[N];
for (int i = 0; i < N; i++) {
struct thread_data *d = &thrd_info[i];
d->id = i;
d->result = 0;
printf("Starting thread %d\n", d->id);
int res = pthread_create(&d->thread_id,
NULL, thread_body, d);
if (res < 0) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
printf("Thread %d started\n", d->id);
}
printf("All threads created, waiting for all to finish\n");
for (int i = 0; i < N; i++) {
struct thread_data *joined;
int res = pthread_join(thrd_info[i].thread_id,
(void **)&joined);
if (res < 0) {
perror("pthread_join");
exit(EXIT_FAILURE);
}
printf("PTHREAD %d ended, with value %lu\n",
joined->id, joined->result);
}
} /* main */
Linux and all multithread systems work the same, they create a new execution unit (if both don't share the virtual address space, they are both processes --not exactly so, but this explains the main difference between a process and a thread--) and the available processors are given to each thread as necessary. Threads are normally encapsulated inside processes (they share ---not in linux, if that has not changed recently--- the process id, and virtual memory) Processes run each in a separate virtual space, so they can only share things through the system resources (files, shared memory, communication sockets/pipes, etc.)
The problem with your test case (you don't show it so I have go guess) is that probably you will make all threads in a loop in which you try to print something. If you do that, probably the most time each thread is blocked trying to do I/O (to printf() something)
Stdio FILEs have the problem that they share a buffer between all threads that want to print on the same FILE, and the kernel serializes all the write(2) system calls to the same file descriptor, so if the most of the time you pass in the loop is blocked in a write, the kernel (and stdio) will end serializing all the calls to print, making it to appear that only one thread is working at a time (all the threads will become blocked by the one that is doing the I/O) This busy loop will make all the threads to run in parallel and will show you how the cpu is collapsed.
Parallelism in C can be achieved by using the fork() function. This function simulates a thread by allowing two threads to run simultaneously and share data. The first thread forks itself, and the second thread is then executed as if it was launched from main(). Forking allows multiple processes to be Run concurrently without conflicts arising.
To make sure that data is shared appropriately between the two threads, use the wait() function before accessing shared resources. Wait will block execution of the current program until all database connections are closed or all I/O has been completed, whichever comes first.
Consider the following section of a C function:
for (int i = 0; i < n; ++i) {
thread_arg *arg = (thread_arg *) malloc(sizeof(thread_arg));
arg->random_value = random_value;
arg->message = &(message[i * 10]);
if (pthread_create(NULL, NULL, thread_start, (void *) &arg)) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
}
In this for loop, I create n threads which all perform a common routine with different parameters. This for loop is part of a bigger function which returns a data structure which gets modified by all threads in parallel. Thus, it is important that this bigger function won't return before all threads are done.
I was hoping to find a simpler way then giving an individual ID to all these threads and joining afterwards with pthread_join.Is there any general approach to say to a function something like "hey, don't return until all threads you've created returned"?
There are at least two other ways:
Use pthread barriers. The name barrier is used in a completely different sense than you usually hear it when talking about concurrency. Here, it's a synchronization primitive that lets each of a set of threads (waiters on it) block until all of them have reached it, then unblocks them all together. You'd first initialize the barrier in some shared location with n+1 as the count, then have both the function itself and all the n threads it created call pthread_barrier_wait before finishing. Assuming you do it this way, after returning from the wait, the n threads can no longer access the shared state; they need to immediately return.
Create the same thing (or a simplified version of it) with a condvar and mutex. Have a count, protected by a mutex, of how many of the n threads are still working. The function that created them can then do:
pthread_mutex_lock(&cnt_mtx);
while (count > 0) pthread_cond_wait(&cnt_cv, &cnt_mtx);
pthread_mutex_unlock(&cnt_mtx);
Generally, though, I'd use pthread_join here. That's what it's for.
This question already has an answer here:
Pthread_create() incorrect start routine parameter passing
(1 answer)
Closed 3 years ago.
I tried to build a program which should create threads and assign a Print function to each one of them, while the main process should use printf function directly.
Firstly, I made it without any synchronization means and expected to get a randomized output.
Later I tried to add a mutex to the Print function which was assigned to the threads and expected to get a chronological output but it seems like the mutex had no effect about the output.
Should I use a mutex on the printf function in the main process as well?
Thanks in advance
My code:
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
pthread_t threadID[20];
pthread_mutex_t lock;
void* Print(void* _num);
int main(void)
{
int num = 20, indx = 0, k = 0;
if (pthread_mutex_init(&lock, NULL))
{
perror("err pthread_mutex_init\n");
return errno;
}
for (; indx < num; ++indx)
{
if (pthread_create(&threadID[indx], NULL, Print, &indx))
{
perror("err pthread_create\n");
return errno;
}
}
for (; k < num; ++k)
{
printf("%d from main\n", k);
}
indx = 0;
for (; indx < num; ++indx)
{
if (pthread_join(threadID[indx], NULL))
{
perror("err pthread_join\n");
return errno;
}
}
pthread_mutex_destroy(&lock);
return 0;
}
void* Print(void* _indx)
{
pthread_mutex_lock(&lock);
printf("%d from thread\n", *(int*)_indx);
pthread_mutex_unlock(&lock);
return NULL;
}
All questions of program bugs notwithstanding, pthreads mutexes provide only mutual exclusion, not any guarantee of scheduling order. This is typical of mutex implementations. Similarly, pthread_create() only creates and starts threads; it does not make any guarantee about scheduling order, such as would justify an assumption that the threads reach the pthread_mutex_lock() call in the same order that they were created.
Overall, if you want to order thread activities based on some characteristic of the threads, then you have to manage that yourself. You need to maintain a sense of which thread's turn it is, and provide a mechanism sufficient to make a thread notice when it's turn arrives. In some circumstances, with some care, you can do this by using semaphores instead of mutexes. The more general solution, however, is to use a condition variable together with your mutex, and some shared variable that serves as to indicate who's turn it currently is.
The code passes the address of the same local variable to all threads. Meanwhile, this variable gets updated by the main thread.
Instead pass it by value cast to void*.
Fix:
pthread_create(&threadID[indx], NULL, Print, (void*)indx)
// ...
printf("%d from thread\n", (int)_indx);
Now, since there is no data shared between the threads, you can remove that mutex.
All the threads created in the for loop have different value of indx. Because of the operating system scheduler, you can never be sure which thread will run. Therefore, the values printed are in random order depending on the randomness of the scheduler. The second for-loop running in the parent thread will run immediately after creating the child threads. Again, the scheduler decides the order of what thread should run next.
Every OS should have an interrupt (at least the major operating systems have). When running the for-loop in the parent thread, an interrupt might happen and leaves the scheduler to make a decision of which thread to run. Therefore, the numbers being printed in the parent for-loop are printed randomly, because all threads run "concurrently".
Joining a thread means waiting for a thread. If you want to make sure you print all numbers in the parent for loop in chronological order, without letting child thread interrupt it, then relocate the for-loop section to be after the thread joining.
I'm currently learning about concurrency at my University. In this context I have to implement the reader/writer problem in C, and I think I'm on the right track.
My thought on the problem is, that we need two locks rd_lock and wr_lock. When a writer thread wants to change our global variable, it tries to grab both locks, writes to the global and unlocks. When a reader wants to read the global, it checks if wr_lock is currently locked, and then reads the value, however one of the reader threads should grab the rd_lock, but the other readers should not care if rd_lock is locked.
I am not allowed to use the implementation already in the pthread library.
typedef struct counter_st {
int value;
} counter_t;
counter_t * counter;
pthread_t * threads;
int readers_tnum;
int writers_tnum;
pthread_mutex_t rd_lock;
pthread_mutex_t wr_lock;
void * reader_thread() {
while(true) {
pthread_mutex_lock(&rd_lock);
pthread_mutex_trylock(&wr_lock);
int value = counter->value;
printf("%d\n", value);
pthread_mutex_unlock(&rd_lock);
}
}
void * writer_thread() {
while(true) {
pthread_mutex_lock(&wr_lock);
pthread_mutex_lock(&rd_lock);
// TODO: increment value of counter->value here.
counter->value += 1;
pthread_mutex_unlock(&rd_lock);
pthread_mutex_unlock(&wr_lock);
}
}
int main(int argc, char **args) {
readers_tnum = atoi(args[1]);
writers_tnum = atoi(args[2]);
pthread_mutex_init(&rd_lock, 0);
pthread_mutex_init(&wr_lock, 0);
// Initialize our global variable
counter = malloc(sizeof(counter_t));
counter->value = 0;
pthread_t * threads = malloc((readers_tnum + writers_tnum) * sizeof(pthread_t));
int started_threads = 0;
// Spawn reader threads
for(int i = 0; i < readers_tnum; i++) {
int code = pthread_create(&threads[started_threads], NULL, reader_thread, NULL);
if (code != 0) {
printf("Could not spawn a thread.");
exit(-1);
} else {
started_threads++;
}
}
// Spawn writer threads
for(int i = 0; i < writers_tnum; i++) {
int code = pthread_create(&threads[started_threads], NULL, writer_thread, NULL);
if (code != 0) {
printf("Could not spawn a thread.");
exit(-1);
} else {
started_threads++;
}
}
}
Currently it just prints a lot of zeroes, when run with 1 reader and 1 writer, which means, that it never actually executes the code in the writer thread. I know that this is not going to work as intended with multiple readers, however I don't understand what is wrong, when running it with one of each.
Don't think of the locks as "reader lock" and "writer lock".
Because you need to allow multiple concurrent readers, readers cannot hold a mutex. (If they do, they are serialized; only one can hold a mutex at the same time.) They can take one for a short duration (before they begin the access, and after they end the access), to update state, but that's it.
Split the timeline for having a rwlock into three parts: "grab rwlock", "do work", "release rwlock".
For example, you could use one mutex, one condition variable, and a counter. The counter holds the number of active readers. The condition variable is signaled on by the last reader, and by writers just before they release the mutex, to wake up a waiting writer. The mutex protects both, and is held by writers for the whole duration of their write operation.
So, in pseudocode, you might have
Function rwlock_rdlock:
Take mutex
Increment counter
Release mutex
End Function
Function rwlock_rdunlock:
Take mutex
Decrement counter
If counter == 0, Then:
Signal_on cond
End If
Release mutex
End Function
Function rwlock_wrlock:
Take mutex
While counter > 0:
Wait_on cond
End Function
Function rwlock_unlock:
Signal_on cond
Release mutex
End Function
Remember that whenever you wait on a condition variable, the mutex is atomically released for the duration of the wait, and automatically grabbed when the thread wakes up. So, for waiting on a condition variable, a thread will have the mutex both before and after the wait, but not during the wait itself.
Now, the above approach is not the only one you might implement.
In particular, you might note that in the above scheme, there is a different "unlock" operation you must use, depending on whether you took a read or a write lock on the rwlock. In POSIX pthread_rwlock_t implementation, there is just one pthread_rwlock_unlock().
Whatever scheme you design, it is important to examine it whether it works right in all situations: a lone read-locker, a lone-write-locker, several read-lockers, several-write-lockers, a lone write-locker and one read-locker, a lone write-locker and several read-lockers, several write-lockers and a lone read-locker, and several read- and write-lockers.
For example, let's consider the case when there are several active readers, and a writer wants to write-lock the rwlock.
The writer grabs the mutex. It then notices that the counter is nonzero, so it starts waiting on the condition variable. When the last reader -- note how the order of the readers exiting does not matter, since a simple counter is used! -- unlocks its readlock on the rwlock, it signals on the condition variable, which wakes up the writer. The writer then grabs the mutex, sees the counter is zero, and proceeds to do its work. During that time, the mutex is held by the writer, so all new readers will block, until the writer releases the mutex. Because the writer will also signal on the condition variable when it releases the mutex, it is a race between other waiting writers and waiting readers, who gets to go next.
I'd like to create multi-threads program in C (Linux) with:
Infinite loop with infinite number of tasks
One thread per one task
Limit the total number of threads, so if for instance total threads number is more then MAX_THREADS_NUMBER, do sleep(), until total threads number become less then MAX_THREADS_NUMBER, continue after.
Resume: I need to do infinite number of tasks(one task per one thread) and I'd like to know how to implement it using pthreads in C.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THREADS 50
pthread_t thread[MAX_THREADS];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
pthread_mutex_lock(&lock);
counter += 1;
printf("Job %d started\n", counter);
pthread_mutex_unlock(&lock);
return NULL;
}
int main(void)
{
int i = 0;
int ret;
if (pthread_mutex_init(&lock, NULL) != 0)
{
printf("\n mutex init failed\n");
return 1;
}
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
// Wait all threads to finish
for (i = 0; i < MAX_THREADS; i++) {
pthread_join(thread[i], NULL);
}
pthread_mutex_destroy(&lock);
return 0;
}
How to make this loop infinite?
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
I need something like this:
while (1) {
if (thread_number > MAX_THREADS_NUMBER)
sleep(1);
ret = pthread_create(...);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
Your current program is based on a simple dispatch design: the initial thread creates worker threads, assigning each one a task to perform. Your question is, how you make this work for any number of tasks, any number of worker threads. The answer is, you don't: your chosen design makes such a modification basically impossible.
Even if I were to answer your stated questions, it would not make the program behave the way you'd like. It might work after a fashion, but it'd be like a bicycle with square wheels: not very practical, nor robust -- not even fun after you stop laughing at how silly it looks.
The solution, as I wrote in a comment to the original question, is to change the underlying design: from a simple dispatch to a thread pool approach.
Implementing a thread pool requires two things: First, is to change your viewpoint from starting a thread and having it perform a task, to each thread in the "pool" grabbing a task to perform, and returning to the "pool" after they have performed it. Understanding this is the hard part. The second part, implementing a way for each thread to grab a new task, is simple: this typically centers around a data structure, protected with locks of some sort. The exact data structure does depend on what the actual work to do is, however.
Let's assume you wanted to parallelize the calculation of the Mandelbrot set (or rather, the escape time, or the number of iterations needed before a point can be ruled to be outside the set; the Wikipedia page contains pseudocode for exactly this). This is one of the "embarrassingly parallel" problems; those where the sub-problems (here, each point) can be solved without any dependencies.
Here's how I'd do the core of the thread pool in this case. First, the escape time or iteration count needs to be recorded for each point. Let's say we use an unsigned int for this. We also need the number of points (it is a 2D array), a way to calculate the complex number that corresponds to each point, plus some way to know which points have either been computed, or are being computed. Plus mutually exclusive locking, so that only one thread will modify the data structure at once. So:
typedef struct {
int x_size, y_size;
size_t stride;
double r_0, i_0;
double r_dx, i_dx;
double r_dy, i_dy;
unsigned int *iterations;
sem_t done;
pthread_mutex_t mutex;
int x, y;
} fractal_work;
When an instance of fractal_work is constructed, x_size and y_size are the number of columns and rows in the iterations map. The number of iterations (or escape time) for point x,y is stored in iterations[x+y*stride]. The real part of the complex coordinate for that point is r_0 + x*r_dx + y*r_dy, and imaginary part is i_0 + x*i_dx + y*i_dy (which allows you to scale and rotate the fractal freely).
When a thread grabs the next available point, it first locks the mutex, and copies the x and y values (for itself to work on). Then, it increases x. If x >= x_size, it resets x to zero, and increases y. Finally, it unlocks the mutex, and calculates the escape time for that point.
However, if x == 0 && y >= y_size, the thread posts on the done semaphore and exits, letting the initial thread know that the fractal is complete. (The initial thread just needs to call sem_wait() once for each thread it created.)
The thread worker function is then something like the following:
void *fractal_worker(void *data)
{
fractal_work *const work = (fractal_work *)data;
int x, y;
while (1) {
pthread_mutex_lock(&(work->mutex));
/* No more work to do? */
if (work->x == 0 && work->y >= work->y_size) {
sem_post(&(work->done));
pthread_mutex_unlock(&(work->mutex));
return NULL;
}
/* Grab this task (point), advance to next. */
x = work->x;
y = work->y;
if (++(work->x) >= work->x_size) {
work->x = 0;
++(work->y);
}
pthread_mutex_unlock(&(work->mutex));
/* z.r = work->r_0 + (double)x * work->r_dx + (double)y * work->r_dy;
z.i = work->i_0 + (double)x * work->i_dx + (double)y * work->i_dy;
TODO: implement the fractal iteration,
and count the iterations (say, n)
save the escape time (number of iterations)
in the work->iterations array; e.g.
work->iterations[(size_t)x + work->stride*(size_t)y] = n;
*/
}
}
The program first creates the fractal_work data structure for the worker threads to work on, initializes it, then creates some number of threads giving each thread the address of that fractal_work structure. It can then call fractal_worker() itself too, to "join the thread pool". (This pool automatically "drains", i.e. threads will return/exit, when all points in the fractal are done.)
Finally, the main thread calls sem_wait() on the done semaphore, as many times as it created worker threads, to ensure all the work is done.
The exact fields in the fractal_work structure above do not matter. However, it is at the very core of the thread pool. Typically, there is at least one mutex or rwlock protecting the work details, so that each worker thread gets unique work details, as well as some kind of flag or condition variable or semaphore to let the original thread know that the task is now complete.
In a multithreaded server, there is usually only one instance of the structure (or variables) describing the work queue. It may even contain things like minimum and maximum number of threads, allowing the worker threads to control their own number to dynamically respond to the amount of work available. This sounds magical, but is actually simple to implement: when a thread has completed its work, or is woken up in the pool with no work, and is holding the mutex, it first examines how many queued jobs there are, and what the current number of worker threads is. If there are more than the minimum number of threads, and no work to do, the thread reduces the number of threads, and exits. If there are less than the maximum number of threads, and there is a lot of work to do, the thread first creates a new thread, then grabs the next task to work on. (Yes, any thread can create new threads into the process. They are all on equal footing, too.)
A lot of the code in a practical multithreaded application using one or more thread pools to do work, is some sort of bookkeeping. Thread pool approaches very much concentrates on the data, and the computation needed to be performed on the data. I'm sure there are much better examples of thread pools out there somewhere; the hard part is to think of a good task for the application to perform, as the data structures are so task-dependent, and many computations are so simple that parallelizing them makes no sense (since creating new threads does have a small computational cost, it'd be silly to waste time creating threads when a single thread does the same work in the same or less time).
Many tasks that benefit from parallelization, on the other hand, require information to be shared between workers, and that requires a lot of thinking to implement correctly. (For example, although solutions exist for parallelizing molecular dynamics simulations efficiently, most simulators still calculate and exchange data in separate steps, rather than at the same time. It's just that hard to do right, you see.)
All this means that you cannot expect to be able to write the code, unless you understand the concept. Indeed, truly understanding the concepts are the hard part: writing the code is comparatively easy.
Even in the above example, there are certain tripping points: Does the order of posting the semaphore and releasing the mutex matter? (Well, it depends on what the thread that is waiting for the fractal to complete does -- and indeed, if it is waiting yet.) If it was a condition variable instead of a semaphore, it would be essential that the thread that is interested in the fractal completion is waiting on the condition variable, otherwise it would miss the signal/broadcast. (This is also why I used a semaphore.)