Dividing work up between threads? (pthread) [closed]

Dividing work up between threads? (pthread) [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I was creating a program to do some maths on some numbers for a school project. Say I have 10 threads but 42 items to process, I want them to process all the items evenly and take on an even amount of jobs. I'm using the POSIX pthread library, I know it's something to do with mutex but I'm not entirely sure.
Here's a simplified example of what I'm doing, however I want to balance the work load out evenly.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
int numbers = { 1, 78, 19, 49, 14, 1, 14. 19, 57, 15, 95, 19, 591, 591 };
void* method() {
for(size_t i = 0; i < 14; i++) {
printf("%d\n", (numbers[i] * 2));
}
}
int main(int argc, char const *argv[]) {
pthread_t th[10];
for (size_t i = 0; i < 10; i++) {
pthread_create(&th[i], NULL, method, NULL);
}
return 0;
}

You want each thread to process given indices in the table. You don't have to protect the table with mutex as long as you divide work properly between threads so they won't race for the same data.
An idea:
/* this structure will wrap all thread's data */
struct work
{
size_t start, end;
pthread_t tid;
};
void* method(void*);
#define IDX_N 42 /* in this example */
int main(int argc, char const *argv[])
{
struct work w[10];
size_t idx_start, idx_end, idx_n = IDX_N / 10;
idx_start = 0;
idx_end = idx_start + idx_n;
for (size_t i = 0; i < 10; i++)
{
w[i].start = idx_start; /* starting index */
w[i].end = idx_end; /* ending index */
/* pass the information about starting and ending point for each
* thread by pointing it's argument to appropriate work struct */
pthread_create(&w[i], NULL, method, (void*)&work[i]);
idx_start = idx_end;
idx_end = (idx_end + idx_n < IDX_N ? idx_end + idx_n : IDX_N);
}
return 0;
}
void*
method(void* arg)
{
struct work *w = (struct work* arg);
/* now each thread can learn where it should start and stop
* by examining indices that were passed to it in argument */
for(size_t i = w->start; i < w->end; i++)
printf("%d\n", (numbers[i] * 2));
return NULL;
}
For a little bit more complex example you can check this and this.

If you know ahead of time (i.e., before starting the threads) how many items you need to process, you just need to partition them among the threads. For example, tell the first thread to process items 0-9, the next to process 10-19, or whatever.

Related

Create an array with increasing decimal values in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Super noob question here, but I haven't worked much in C and I'm trying to create an array but it doesn't seem to be working. I have played around a bit in an online compilator but I just can't get it right.
What I want is an array containing 100 elements. I want the first element to be 8, the last element to be 12, and every element should increase by 0.04. So [8, 8.04, 8.08, ..... , 11.96, 12].
Can anyone help a newbie out? :)

#define NUMS 101
int main()
{
double arr[NUMS];
double start = 8.0, end = 12.0;
double gap = (end - start) / (NUMS - 1);
int i;
for (i = 0; i < NUMS; ++i)
arr[i] = start + i * gap;
}

Here is code example based on Blaze snipped. This code first fills array and then prints it out. Also as jwismar mentioned you need 101 elements.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[]) {
float a[101];
for (int i = 0; i < sizeof(a)/sizeof(a[0]); i++)
a[i] = 8.0 + (i*0.04);
for (int i = 0; i < sizeof(a)/sizeof(a[0]); i++) {
printf("%f\n",a[i]);
}
return 0;
}

BTW if you want to start from 8 and end with 12 then you need 101 elements.
#include <stdio.h>
#include <stdlib.h>
int main()
{
double arr[101];
int i;
arr[0] = 8;
for (i = 1; i < 101; i++)
arr[i] = arr[i - 1] + 0.04;
for (i = 0; i < 101; i++)
printf("%f\n",arr[i]);
}

Threading program issues [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
The problem with the following program is that the main thread ends before the other threads get a chance to display their results. Another issue is that the threads display isn't in order.
The output needs to be:
This is thread 0.
This is thread 1.
This is thread 2. etc.
code:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <time.h>
void *text(void *arg);
long code[] = { 4, 6, 3, 1, 5, 0, 2 }; // Order in which to start threads
int num = 0;
int main()
{
int i;
pthread_t tid[7];
// Initialize random number generator
time_t seconds;
time(&seconds);
srand((unsigned int) seconds);
// Create our threads
for (i = 0; i < 7; i++)
pthread_create(&tid[i], NULL, text, (void*)code[i]);
// Exit main
return 0;
}
void *text(void *arg)
{
long n = (long)arg;
int rand_sec = rand() % (3 - 1 + 1) + 1; // Random num seconds to sleep
while (num != n) {} // Busy wait used to wait for our turn
num++; // Let next thread go
sleep(rand_sec); // Sleep for random amount of time
printf("This is thread %d.\n", n);
// Exit thread
pthread_exit(0);
}
Can anyone help with using a mutex to fix the synchronization problem?

You can use pthread_join to make your main routine wait until your other threads have finished, e.g.
int main()
{
int i;
pthread_t tid[7];
// Initialize random number generator
time_t seconds;
time(&seconds);
srand((unsigned int) seconds);
// Create our threads
for (i = 0; i < 7; i++)
pthread_create(&tid[i], NULL, text, (void*)code[i]);
for(i = 0; i < 7; i++)
pthread_join(tid[i], NULL);
// Exit main
return 0;
}
Also, your code which waits for num to be the thread ID might have some issues when you start turning optimisations on. Your num variable should be declared volatile if it is shared between threads, like so:
volatile int num;
That said, busy waiting on num like so is very, very inefficient. You should consider using condition variables to notify threads when num changes, so they can check if it is their turn once, then sleep if not. Have a look at all of the pthread_cond_* functions for more information.

How to create thread blocks?

I have two questions.
First:
I need to create thread blocks gradually not more then some max value, for example 20.
For example, first 20 thread go, job is finished, only then 20 second thread go, and so on in a loop.
Total number of jobs could be much larger then total number of threads (in our example 20), but total number of threads should not be bigger then our max value (in our example 20).
Second:
Could threads be added continuously? For example, 20 threads go, one thread job is finished, we see that total number of threads is 19 but our max value is 20, so we can create one more thread, and one more thread go :)
So we don't waste a time waiting another threads job to be done and our total threads number is not bigger then our some max value (20 in our example) - sounds cool.
Conclusion:
For total speed I consider the second variant would be much faster and better, and I would be very graceful if you help me with this, but also tell how to do the first variant.
Here is me code (it's not working properly and the result is strange - some_array elements become wrong after eleven step in a loop, something like this: Thread counter = 32748):
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <math.h>
#define num_threads 5 /* total max number of threads */
#define lines 17 /* total jobs to be done */
/* args for thread start function */
typedef struct {
int *words;
} args_struct;
/* thread start function */
void *thread_create(void *args) {
args_struct *actual_args = args;
printf("Thread counter = %d\n", *actual_args->words);
free(actual_args);
}
/* main function */
int main(int argc, char argv[]) {
float block;
int i = 0;
int j = 0;
int g;
int result_code;
int *ptr[num_threads];
int some_array[lines] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
pthread_t threads[num_threads];
/* counting how many block we need */
block = ceilf(lines / (double)num_threads);
printf("blocks= %f\n", block);
/* doing ech thread block continuously */
for (g = 1; g <= block; g++) {
//for (i; i < num_threads; ++i) { i < (num_threads * g),
printf("g = %d\n", g);
for (i; i < lines; ++i) {
printf("i= %d\n", i);
/* locate memory to args */
args_struct *args = malloc(sizeof *args);
args->words = &some_array[i];
if(pthread_create(&threads[i], NULL, thread_create, args)) {
free(args);
/* goto error_handler */
}
}
/* wait for each thread to complete */
for (j; j < lines; ++j) {
printf("j= %d\n", j);
result_code = pthread_join(threads[j], (void**)&(ptr[j]));
assert(0 == result_code);
}
}
return 0;
}

Prevent False Sharing without using padding

I'm currently learning about pthreads in C and came across the issue of False Sharing. I think I understand the concept of it and I've tried experimenting a bit.
Below is a short program that I've been playing around with. Eventually I'm going to change it into a program to take a large array of ints and sum it in parallel.
#include <stdio.h>
#include <pthread.h>
#define THREADS 4
#define NUMPAD 14
struct s
{
int total; // 4 bytes
int my_num; // 4 bytes
int pad[NUMPAD]; // 4 * NUMPAD bytes
} sum_array[4];
static void *worker(void * ind) {
const int curr_ind = *(int *) ind;
for (int i = 0; i < 10; ++i) {
sum_array[curr_ind].total += sum_array[curr_ind].my_num;
}
printf("%d\n", sum_array[curr_ind].total);
return NULL;
}
int main(void) {
int args[THREADS] = { 0, 1, 2, 3 };
pthread_t thread_ids[THREADS];
for (size_t i = 0; i < THREADS; ++i) {
sum_array[i].total = 0;
sum_array[i].my_num = i + 1;
pthread_create(&thread_ids[i], NULL, worker, &args[i]);
}
for (size_t i = 0; i < THREADS; ++i) {
pthread_join(thread_ids[i], NULL);
}
}
My question is, is it possible to prevent false sharing without using padding? Here struct s has a size of 64 bytes so that each struct is on its own cache line (assuming that the cache line is 64 bytes). I'm not sure how else I can achieve parallelism without padding.
Also, if I were to sum an array of a varying size between 1000-50,000 bytes, how could I prevent false sharing? Would I be able to pad it out using a similar program? My current thoughts are to put each int from the big array, into an array of struct s and then use parallelism to sum it. However I'm not sure if this is the optimal solution.

Partition the problem: In worker(), sum into a local variable, then add the local variable to the array:
static void *worker(void * ind) {
const int curr_ind = *(int *) ind;
int localsum = 0;
for (int i = 0; i < 10; ++i) {
localsum += sum_array[curr_ind].my_num;
}
sum_array[curr_ind].total += localsum;
printf("%d\n", sum_array[curr_ind].total);
return NULL;
}
This may still have false sharing after the loop, but that is one time per thread. Thread creation overhead is much more significant than a single cache-miss. Of course, you probably want to have a loop that actually does something time-consuming, as your current code can be optimized to:
static void *worker(void * ind) {
const int curr_ind = *(int *) ind;
int localsum = 10 * sum_array[curr_ind].my_num;
sum_array[curr_ind].total += localsum;
printf("%d\n", sum_array[curr_ind].total);
return NULL;
}
The runtime of which is definitely dominated by thread creation and synchronization in printf().

Matrix Multiply with Threads (each thread does single multiply)

I'm looking to do a matrix multiply using threads where each thread does a single multiplication and then the main thread will add up all of the results and place them in the appropriate spot in the final matrix (after the other threads have exited).
The way I am trying to do it is to create a single row array that holds the results of each thread. Then I would go through the array and add + place the results in the final matrix.
Ex: If you have the matrices:
A = [{1,4}, {2,5}, {3,6}]
B = [{8,7,6}, {5,4,3}]
Then I want an array holding [8, 20, 7, 16, 6, 12, 16 etc]
I would then loop through the array adding up every 2 numbers and placing them in my final array.
This is a HW assignment so I am not looking for exact code, but some logic on how to store the results in the array properly. I'm struggling with how to keep track of where I am in each matrix so that I don't miss any numbers.
Thanks.
EDIT2: Forgot to mention that there must be a single thread for every single multiplication to be done. Meaning for the example above, there will be 18 threads each doing its own calculation.
EDIT: I'm currently using this code as a base to work off of.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10
int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];
struct v {
int i; /* row */
int j; /* column */
};
void *runner(void *param); /* the thread */
int main(int argc, char *argv[]) {
int i,j, count = 0;
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
//Assign a row and column for each thread
struct v *data = (struct v *) malloc(sizeof(struct v));
data->i = i;
data->j = j;
/* Now create the thread passing it data as a parameter */
pthread_t tid; //Thread ID
pthread_attr_t attr; //Set of thread attributes
//Get the default attributes
pthread_attr_init(&attr);
//Create the thread
pthread_create(&tid,&attr,runner,data);
//Make sure the parent waits for all thread to complete
pthread_join(tid, NULL);
count++;
}
}
//Print out the resulting matrix
for(i = 0; i < M; i++) {
for(j = 0; j < N; j++) {
printf("%d ", C[i][j]);
}
printf("\n");
}
}
//The thread will begin control in this function
void *runner(void *param) {
struct v *data = param; // the structure that holds our data
int n, sum = 0; //the counter and sum
//Row multiplied by column
for(n = 0; n< K; n++){
sum += A[data->i][n] * B[n][data->j];
}
//assign the sum to its coordinate
C[data->i][data->j] = sum;
//Exit the thread
pthread_exit(0);
}
Source: http://macboypro.wordpress.com/2009/05/20/matrix-multiplication-in-c-using-pthreads-on-linux/

You need to store M * K * N element-wise products. The idea is presumably that the threads will all run in parallel, or at least will be able to do, so each thread needs its own distinct storage location of appropriate type. A straightforward way to do that would be to create an array with that many elements ... but of what element type?
Each thread will need to know not only where to store its result, but also which multiplication to perform. All of that information needs to be conveyed via a single argument of type void *. One would typically, then, create a structure type suitable for holding all the data needed by one thread, create an instance of that structure type for each thread, and pass pointers to those structures. Sounds like you want an array of structures, then.
The details could be worked a variety of ways, but the one that seems most natural to me is to give the structure members for the two factors, and a member in which to store the product. I would then have the main thread declare a 3D array of such structures (if the needed total number is smallish) or else dynamically allocate one. For example,
struct multiplication {
// written by the main thread; read by the compute thread:
int factor1;
int factor2;
// written by the compute thread; read by the main thread:
int product;
} partial_result[M][K][N];
How to write code around that is left as the exercise it is intended to be.

Not sure haw many threads you would need to dispatch and I am also not sure if you would use join later to pick them up. I am guessing you are in C here so I would use the thread id as a way to track which row to process .. something like :
#define NUM_THREADS 64
/*
* struct to pass parameters to a dispatched thread
*/
typedef struct {
int value; /* thread number */
char somechar[128]; /* char data passed to thread */
unsigned long ret;
struct foo *row;
} thread_parm_t;
Where I am guessing that each thread will pick up its row data in the pointer *row which has some defined type foo. A bunch of integers or floats or even complex types. Whatever you need to pass to the thread.
/*
* the thread to actually crunch the row data
*/
void *thr_rowcrunch( void *parm );
pthread_t tid[NUM_THREADS]; /* POSIX array of thread IDs */
Then in your main code segment something like :
thread_parm_t *parm=NULL;
Then dispatch the threads with something like :
for ( i = 0; i < NUM_THREADS; i++) {
parm = malloc(sizeof(thread_parm_t));
parm->value = i;
strcpy(parm->somechar, char_data_to-pass );
fill_in_row ( parm->row, my_row_data );
pthread_create(&tid[i], NULL, thr_insert, (void *)parm);
}
Then later on :
for ( i = 0; i < NUM_THREADS; i++)
pthread_join(tid[i], NULL);
However the real work needs to be done in thr_rowcrunch( void *parm ) which receives the row data and then each thread just knows its own thread number. The guts of what you do in that dispatched thread however I can only guess at.
Just trying to help here, not sure if this is clear.