I have some difficulties in understanding multiple thread. Here is the situation:
I am going to select some integers from an array and store them into another array with some conditions. The conditions is quite complicated, basically it's a huge set of comparison between array[i] and all others array[not i]. Let's called it checkCondition();
First, I create the pthread. Here is my code, noted that dataPackage is a struct containing the array.
for(int i = 0; i < thread_number; i++){
if(pthread_create(&tid, NULL, checkPermissible, &dataPackage) != 0){
perror("Error occurs when creating thread!!!\n");
for(int i = 0; i < thread_number; i++){
pthread_join(tid, NULL);
Here is the content of checkPermissible()
void* checkPermissible(void* content){
readThread *dataPackage = content;
for(int i = 0; i < (*dataPackage).arrayLength; i++){
insert(array[i], (*dataPackage).result);
//if condition true, insert it into result(a pointer)
//mutex avoid different thread insert the value at the same time
However, It would not have any difference if I'm not using pThread way to do this. How to I implement checkPermissible() in order to bring out the advantage of multiple thread? I quite confused about this stuff.
My idea is, dividing the array into noOfThread in each Thread. For example, I have an array[20] and 4 thread,
Thread 1: compute checkCondition with array[0] to array[4]
Thread 2: compute checkCondition with array[5] to array[9]
Thread 3: compute checkCondition with array[10] to array[14]
Thread 4: compute checkCondition with array[15] to array[19]
Something like that, in which I don't know how to achieve.
First, you can pass lower and upper bound or addresses to a thread in your structure as follows:
struct readThread {
int low;
int hi;
int * myarray;
for (int i=low;i<hi;++i)
struct readThread {
int * start;
int * end;
for (int* i=start; i<end; ++i)
First one is easier and easier to understand as well. In this way, your array will split.
There are other ways like creating split copies of your error for each thread.
I'm having an issue with my code. Disclaimer btw, I'm new to C. Trying to learn it on my own. Anyways, I'm trying to get the minimum and maximum of an array. I broke the array into 4 parts to make 4 separate arrays and then used those 4 to pass in one of the parameters of each thread. With that being said, I'm only able to get the maximum for each part of the array and not the minimum and I don't understand why.
I think we can simplify your code, avoid all these unnecessary malloc calls, and simplify your algorithm for finding a min/max pair in an array.
Start by having a thread function that takes as input the following: an array (represented by a pointer), an index into the array from where to start searching on, and an index in the array on where to stop. Further, this function will need two output parameters - smallest and largest integer found in the array subset found.
Start with the parameter declaration. Similar to your MaxMin, but has both input and output parameters:
struct ThreadParameters
// input
int* array;
int start;
int end;
// output
int smallest;
int largest;
And then a thread function that scans from array[start] all the way up to (but not including) array[end]. And it puts the results of its scan into the smallest and largest member of the above struct:
void* find_min_max(void* args)
struct ThreadParameters* params = (struct ThreadParameters*)args;
int *array = params->array;
int start = params->start;
int end = params->end;
int smallest = array[start];
int largest = array[start];
for (int i = start; i < end; i++)
if (array[i] < smallest)
smallest = array[i];
if (array[i] > largest)
largest = array[i];
// write the result back to the parameter structure
params->smallest = smallest;
params->largest = largest;
return NULL;
And while we are at it, use capitol letters for your macros:
#define THREAD_COUNT 4
Now you can keep with your "4 separate arrays" design. But there's no reason to since the thread function can scan any range of any array. So let's declare a single global array as follows:
#define ARRAY_SIZE 400
int arr[ARRAY_SIZE];
The capitol letter syntax is preferred for macros.
fillArray becomes simpler:
void fillArray()
for (int i = 0; i < ARRAY_SIZE; i++)
arr[i] = rand() % 1000 + 1;
Now main, becomes a whole lot simpler by doing these techniques.:
We'll leverage the stack to allocate our thread parameter structure (no malloc and free)
We'll simply start 4 threads - passing each thread a pointer to a ThreadParameter struct. Since the thread won't outlive main, this is safe.
After starting each thread, we just wait for each thread to finish)
Then we scan the list of thread parameters to get the final smallest and largest.
main becomes much easier to manage:
int main()
int smallest;
int largest;
// declare an array of threads and associated parameter instances
pthread_t threads[THREAD_COUNT] = {0};
struct ThreadParameters thread_parameters[THREAD_COUNT] = {0};
// intialize the array
// smallest and largest needs to be set to something
smallest = arr[0];
largest = arr[0];
// start all the threads
for (int i = 0; i < THREAD_COUNT; i++)
thread_parameters[i].array = arr;
thread_parameters[i].start = i * (ARRAY_SIZE / THREAD_COUNT);
thread_parameters[i].end = (i+1) * (ARRAY_SIZE / THREAD_COUNT);
thread_parameters[i].largest = 0;
pthread_create(&threads[i], NULL, find_min_max, &thread_parameters[i]);
// wait for all the threads to complete
for (int i = 0; i < THREAD_COUNT; i++)
pthread_join(threads[i], NULL);
// Now aggregate the "smallest" and "largest" results from all thread runs
for (int i = 0; i < THREAD_COUNT; i++)
if (thread_parameters[i].smallest < smallest)
smallest = thread_parameters[i].smallest;
if (thread_parameters[i].largest > largest)
largest = thread_parameters[i].largest;
printf("Smallest is %d\n", smallest);
printf("Largest is %d\n", largest);
I'm pretty new to C, so I'm not sure where even to start digging about my problem.
I'm trying to port python number-crunching algos to C, and since there's no GIL in C (woohoo), I can change whatever I want in memory from threads, for as long as I make sure there are no races.
I did my homework on mutexes, however, I cannot wrap my head around use of mutexes in case of continuously running threads accessing the same array over and over.
I'm using p_threads in order to split the workload on a big array a[N].
Number crunching algorithm on array a[N] is additive, so I'm splitting it using a_diff[N_THREADS][N] array, writing changes to be applied to a[N] array from each thread to a_diff[N_THREADS][N] and then merging them all together after each step.
I need to run the crunching on different versions of array a[N], so I pass them via global pointer p (in the MWE, there's only one a[N])
I'm synchronizing threads using another global array SYNC_THREADS[N_THREADS] and make sure threads quit when I need them to by setting END_THREADS global (I know, I'm using too many globals - I don't care, code is ~200 lines). My question is in regard to this synchronization technique - is it safe to do so and what is cleaner/better/faster way to achieve that?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define N_THREADS 3
#define N 10000000
#define STEPS 3
double a[N]; // main array
double a_diff[N_THREADS][N]; // diffs array
double params[N]; // parameter used for number-crunching
double (*p)[N]; // pointer to array[N]
// structure for bounds for crunching the array
struct bounds {
int lo;
int hi;
int thread_num;
struct bounds B[N_THREADS];
int SYNC_THREADS[N_THREADS]; // for syncing threads
int END_THREADS = 0; // signal to terminate threads
static void *crunching(void *arg) {
// multiple threads run number-crunching operations according to assigned low/high bounds
struct bounds *data = (struct bounds *)arg;
int lo = (*data).lo;
int hi = (*data).hi;
int thread_num = (*data).thread_num;
printf("worker %d started for bounds [%d %d] \n", thread_num, lo, hi);
int i;
while (END_THREADS != 1) { // END_THREADS tells threads to terminate
if (SYNC_THREADS[thread_num] == 1) { // SYNC_THREADS allows threads to start number-crunching
printf("worker %d working... \n", thread_num );
for (i = lo; i <= hi; ++i) {
a_diff[thread_num][i] += (*p)[i] * params[i]; // pretend this is an expensive operation...
SYNC_THREADS[thread_num] = 0; // thread disables itself until SYNC_THREADS is back to 1
printf("worker %d stopped... \n", thread_num );
return 0;
int i, j, th,s;
double joiner;
int main() {
// pre-fill arrays
for (i = 0; i < N; ++i) {
a[i] = i + 0.5;
params[i] = 0.0;
// split workload between workers
int worker_length = N / N_THREADS;
for (i = 0; i < N_THREADS; ++i) {
B[i].thread_num = i;
B[i].lo = i * worker_length;
if (i == N_THREADS - 1) {
B[i].hi = N;
} else {
B[i].hi = i * worker_length + worker_length - 1;
// pointer to parameters to be passed to worker
struct bounds **data = malloc(N_THREADS * sizeof(struct bounds*));
for (i = 0; i < N_THREADS; i++) {
data[i] = malloc(sizeof(struct bounds));
data[i]->lo = B[i].lo;
data[i]->hi = B[i].hi;
data[i]->thread_num = B[i].thread_num;
// create thread objects
pthread_t threads[N_THREADS];
// disallow threads to crunch numbers
for (th = 0; th < N_THREADS; ++th) {
// launch workers
for(th = 0; th < N_THREADS; th++) {
pthread_create(&threads[th], NULL, crunching, data[th]);
// big loop of iterations
for (s = 0; s < STEPS; ++s) {
for (i = 0; i < N; ++i) {
params[i] += 1.0; // adjust parameters
// zero diff array
for (i = 0; i < N; ++i) {
for (th = 0; th < N_THREADS; ++th) {
a_diff[th][i] = 0.0;
p = &a; // pointer to array a
// allow threads to process numbers and wait for threads to complete
for (th = 0; th < N_THREADS; ++th) { SYNC_THREADS[th] = 1; }
// ...here threads started by pthread_create do calculations...
for (th = 0; th < N_THREADS; th++) { while (SYNC_THREADS[th] != 0) {} }
// join results from threads (number-crunching is additive)
for (i = 0; i < N; ++i) {
joiner = 0.0;
for (th = 0; th < N_THREADS; ++th) {
joiner += a_diff[th][i];
a[i] += joiner;
// join workers
for(th = 0; th < N_THREADS; th++) {
pthread_join(threads[th], NULL);
return 0;
I see that workers don't overlap time-wise:
worker 0 started for bounds [0 3333332]
worker 1 started for bounds [3333333 6666665]
worker 2 started for bounds [6666666 10000000]
worker 0 working...
worker 1 working...
worker 2 working...
worker 2 stopped...
worker 0 stopped...
worker 1 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 0 stopped...
worker 2 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 2 stopped...
worker 0 stopped...
Process returned 0 (0x0) execution time : 1.505 s
and I make sure worker's don't get into each other workspaces by separating them via a_diff[thead_num][N] sub-arrays, however, I'm not sure that's always the case and that I'm not introducing hidden races somewhere...
I didn't realize what was the question :-)
So, the question is if you're thinking well with your SYNC_THREADS and END_THREADS synchronization mechanism.
Yes!... Almost. The problem is that threads are burning CPU while waiting.
Conditional Variables
To make threads wait for an event you have conditional variables (pthread_cond). These offer a few useful functions like wait(), signal() and broadcast():
wait(&cond, &m) blocks a thread in a given condition variable. [note 2]
signal(&cond) unlocks a thread waiting in a given condition variable.
broadcast(&cond) unlocks all threads waiting in a given condition variable.
Initially you'd have all the threads waiting [note 1]:
And, when the main thread is ready:
start_threads = 1;
If you have data dependencies between iterations, you'd want to make sure threads are executing the same iteration at any given moment.
To synchronize threads at the end of each iteration, you'll want to take a look at barriers (pthread_barrier):
pthread_barrier_init(count): initializes a barrier to synchronize count threads.
pthread_barrier_wait(): thread waits here until all count threads reach the barrier.
Extending functionality of barriers
Sometimes you'll want the last thread reaching a barrier to compute something (e.g. to increment the counter of number of iterations, or to compute some global value, or to check if execution should stop). You have two alternatives
Using pthread_barriers
You'll need to essentially have two barriers:
int rc = pthread_barrier_wait(&b);
if(shouldStop()) stop = 1;
if(stop) return;
Using pthread_conds to implement our own specialized barrier
// all threads execute this
if(remainingThreads == 0) {
// reinitialize barrier
remainingThreads = N;
// only last thread executes this
if(shouldStop()) stop = 1;
} else {
while(remainingThreads > 0)
pthread_cond_wait(&cond, &mutex);
Note 1: why is pthread_cond_wait() inside a while block? May seem a bit odd. The reason behind it is due to the existence of spurious wakeups. The function may return even if no signal() or broadcast() was issued. So, just to guarantee correctness, it's usual to have an extra variable to guarantee that if a thread suddenly wakes up before it should, it runs back into the pthread_cond_wait().
From the manual:
When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
If a signal is delivered to a thread waiting for a condition variable, upon return from the signal handler the thread resumes waiting for the condition variable as if it was not interrupted, or it shall return zero due to spurious wakeup.
Note 2:
An noted by Michael Burr in the comments, you should hold a companion lock whenever you modify the predicate (start_threads) and pthread_cond_wait(). pthread_cond_wait() will release the mutex when called; and re-acquires it when it returns.
PS: It's a bit late here; sorry if my text is confusing :-)
I've got the following example, let's say I want for each thread to count from 0 to 9.
void* iterate(void* arg) {
int i = 0;
while(i<10) {
int main() {
int j = 0;
pthread_t tid[100];
while(j<100) {
variable i - is in a critical section, it will be overwritten multiple times and therefore threads will fail to count.
int* i=(int*)calloc(1,sizeof(int));
doesn't solve the problem either. I don't want to use mutex. What is the most common solution for this problem?
As other users are commenting, there are severals problems in your example:
Variable i is not shared (it should be a global variable, for instance), nor in a critical section (it is a local variable to each thread). To have a critical section you should use locks or transactional memory.
You don't need to create and destroy threads every iteration. Just create a number of threads at the beggining and wait for them to finish (join).
pthread_exit() is not necessary, just return from the thread function (with a value).
A counter is a bad example for threads. It requires atomic operations to avoid overwriting the value of other threads. Actually, a multithreaded counter is a typical example of why atomic accesses are necessary (see this tutorial, for example).
I recommend you to start with some tutorials, like this or this.
I also recommend frameworks like OpenMP, they simplify the semantics of multithreaded programs.
EDIT: example of a shared counter and 4 threads.
#include <stdio.h>
#include <pthread.h>
#define NUM_THREADS 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int counter = 0;
void* iterate(void* arg) {
int i = 0;
while(i++ < 10) {
// enter critical section
return NULL;
int main() {
int j;
pthread_t tid[NUM_THREADS];
for(j = 0; j < NUM_THREADS; ++j)
// let the threads do their magic
for(j = 0; j < NUM_THREADS; ++j)
printf("%d", counter);
return 0;
I'm writting a program that needs to pass a matrix from a parent process to its child (that's why I'm using the fork() instruction). I've just read this and this to solve the problem myself, but I still can't understand how to use the read() and write() instructions with the pipe I've created so far. I know these instructions write series of bytes, but I'm not sure about using them with structures or dynamically allocated variables (like a matrix).
Here is the code I used to test (note the comments I put):
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
// Structure definition (Matrix)
typedef struct {
int **mat;
int rows;
int cols;
} Matrix;
int main() {
// Create the pipe
int file_desc[2];
if (pipe(file_desc) != 0) exit(1);
// Create two processes
if (fork() == 0) {
/** Instructions for the child process */
// Read the matrix structure from the pipe
Matrix *received = NULL;
read(file_desc[0], received, sizeof *received);
if (received != NULL) {
// Print the received matrix
int i, j;
printf("The matrix I've just *received* from the parent is:\n");
for (i = 0; i < received->cols; i++) {
for (j = 0; j < received->rows; j++) printf("%d\t", received->mat[i][j]);
} else printf("received = NULL :'(\n");
} else {
/** Instructions for the parent process */
/* Create a matrix dinamically.
* In fact, in the real program I have a function to create a matrix given the
* rows and columns, and fill it with random values, so it returns a Matrix *
* (pointer to Matrix), but for testing purposes I've only written this
* (also useful if I need an array of Matrix elements, for example)
* */
Matrix *myMatrix = calloc(1, sizeof *myMatrix);
// Put the contents into the variable
myMatrix->rows = 2;
myMatrix->cols = 2;
myMatrix->mat = calloc(myMatrix->rows, sizeof *(myMatrix->mat));
int i, j;
for (i = 0; i < myMatrix->cols; i++)
(myMatrix->mat)[i] = calloc(myMatrix->cols, sizeof **(myMatrix->mat));
// Fill the matrix with some values (testing)
(myMatrix->mat)[0][0] = 4;
(myMatrix->mat)[0][1] = 2;
(myMatrix->mat)[1][0] = 1;
(myMatrix->mat)[1][1] = 3;
// Print the matrix
printf("The matrix I've just filled in the parent is:\n");
for (i = 0; i < myMatrix->cols; i++) {
for (j = 0; j < myMatrix->rows; j++) printf("%d\t", myMatrix->mat[i][j]);
// Write the matrix structure to the pipe (here is where I have the problem!)
write(file_desc[1], myMatrix, sizeof *myMatrix);
// Wait for the child process to terminate
printf("The child process has just finished, the parent process continues.\n");
return 0;
In fact, I tried first with a pointer to an int and it worked. But when I run this program, I receive this output:
The matrix I've just filled in the parent is:
4 2
1 3
received = NULL :'(
The child process has just finished, the parent process continues.
And I don't know why I get the NULL -- I'm almost sure I'm using the write() instruction incorrectly. Any help about this will be appreciated =)
EDIT: I think the matrix should be converted to text, for example, and then pass the string to the child, parse it and convert it to a Matrix structure again. I don't know if this approach is the best. Is there another approach besides this one?
EDIT: I tried the same code with a static variable (changing int **mat; to int mat[2][2]; inside the structure declaration) but the user should change the matrix size.
This is a serious problem:
Matrix *received = NULL;
read(file_desc[0], received, sizeof *received);
Received is a null pointer. That read is going to try to write data to NULL, which is an invalid address. It would be much simpler to write:
Matrix received;
read(file_desc[0], &received, sizeof received);
I'm looking to do a matrix multiply using threads where each thread does a single multiplication and then the main thread will add up all of the results and place them in the appropriate spot in the final matrix (after the other threads have exited).
The way I am trying to do it is to create a single row array that holds the results of each thread. Then I would go through the array and add + place the results in the final matrix.
Ex: If you have the matrices:
A = [{1,4}, {2,5}, {3,6}]
B = [{8,7,6}, {5,4,3}]
Then I want an array holding [8, 20, 7, 16, 6, 12, 16 etc]
I would then loop through the array adding up every 2 numbers and placing them in my final array.
This is a HW assignment so I am not looking for exact code, but some logic on how to store the results in the array properly. I'm struggling with how to keep track of where I am in each matrix so that I don't miss any numbers.
EDIT2: Forgot to mention that there must be a single thread for every single multiplication to be done. Meaning for the example above, there will be 18 threads each doing its own calculation.
EDIT: I'm currently using this code as a base to work off of.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10
int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];
struct v {
int i; /* row */
int j; /* column */
void *runner(void *param); /* the thread */
int main(int argc, char *argv[]) {
int i,j, count = 0;
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
//Assign a row and column for each thread
struct v *data = (struct v *) malloc(sizeof(struct v));
data->i = i;
data->j = j;
/* Now create the thread passing it data as a parameter */
pthread_t tid; //Thread ID
pthread_attr_t attr; //Set of thread attributes
//Get the default attributes
//Create the thread
//Make sure the parent waits for all thread to complete
pthread_join(tid, NULL);
//Print out the resulting matrix
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
printf("%d ", C[i][j]);
//The thread will begin control in this function
void *runner(void *param) {
struct v *data = param; // the structure that holds our data
int n, sum = 0; //the counter and sum
//Row multiplied by column
for(n = 0; n< K; n++){
sum += A[data->i][n] * B[n][data->j];
//assign the sum to its coordinate
C[data->i][data->j] = sum;
//Exit the thread
You need to store M * K * N element-wise products. The idea is presumably that the threads will all run in parallel, or at least will be able to do, so each thread needs its own distinct storage location of appropriate type. A straightforward way to do that would be to create an array with that many elements ... but of what element type?
Each thread will need to know not only where to store its result, but also which multiplication to perform. All of that information needs to be conveyed via a single argument of type void *. One would typically, then, create a structure type suitable for holding all the data needed by one thread, create an instance of that structure type for each thread, and pass pointers to those structures. Sounds like you want an array of structures, then.
The details could be worked a variety of ways, but the one that seems most natural to me is to give the structure members for the two factors, and a member in which to store the product. I would then have the main thread declare a 3D array of such structures (if the needed total number is smallish) or else dynamically allocate one. For example,
struct multiplication {
// written by the main thread; read by the compute thread:
int factor1;
int factor2;
// written by the compute thread; read by the main thread:
int product;
} partial_result[M][K][N];
How to write code around that is left as the exercise it is intended to be.
Not sure haw many threads you would need to dispatch and I am also not sure if you would use join later to pick them up. I am guessing you are in C here so I would use the thread id as a way to track which row to process .. something like :
#define NUM_THREADS 64
* struct to pass parameters to a dispatched thread
typedef struct {
int value; /* thread number */
char somechar[128]; /* char data passed to thread */
unsigned long ret;
struct foo *row;
} thread_parm_t;
Where I am guessing that each thread will pick up its row data in the pointer *row which has some defined type foo. A bunch of integers or floats or even complex types. Whatever you need to pass to the thread.
* the thread to actually crunch the row data
void *thr_rowcrunch( void *parm );
pthread_t tid[NUM_THREADS]; /* POSIX array of thread IDs */
Then in your main code segment something like :
thread_parm_t *parm=NULL;
Then dispatch the threads with something like :
for ( i = 0; i < NUM_THREADS; i++) {
parm = malloc(sizeof(thread_parm_t));
parm->value = i;
strcpy(parm->somechar, char_data_to-pass );
fill_in_row ( parm->row, my_row_data );
pthread_create(&tid[i], NULL, thr_insert, (void *)parm);
Then later on :
for ( i = 0; i < NUM_THREADS; i++)
pthread_join(tid[i], NULL);
However the real work needs to be done in thr_rowcrunch( void *parm ) which receives the row data and then each thread just knows its own thread number. The guts of what you do in that dispatched thread however I can only guess at.
Just trying to help here, not sure if this is clear.