Adding two vectors using pthreads without a global sum variable - c

I am trying to calculate the sum of two vectors a and b using pthreads in C. I am given a function that computes the sum in sequential form and another which does so in parallel form. My program is working properly but computing different sums when there are multiple threads. I have used proper thread synchronization on the critical area, but still cannot see where I am going wrong. I get the correct answer on the first thread since there is only one thread doing the job and then I get wrong answers on multiple threads. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
// type for value of vector element
typedef short value_t;
// type for vector dimension / indices
typedef long index_t;
// function type to combine two values
typedef value_t (*function_t)(const value_t x, const value_t y);
// struct to store the respective values of the vectors a,b and c
typedef struct{
index_t start;
index_t end;
value_t *arr;
value_t *brr;
value_t *crr;
value_t *part_sum;
pthread_mutex_t *mutex;
}arg_struct;
// function to combine two values
value_t add(const value_t x, const value_t y) {
return ((x+y)*(x-y)) % ((int)x+1) +27;
}
// function to initialize the vectors a,b and c
void vectorInit(index_t n, value_t a[n], value_t b[n], value_t c[n]) {
for(index_t i=0; i<n; i++) {
a[i] = (value_t)(2*i);
b[i] = (value_t)(n-i);
c[i] = 0;
}
}
// function to count the sum of two variables sequentially
value_t vectorOperation(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f) {
value_t sum = 0;
for(index_t i=0; i<n; i++) {
sum += (c[i] = f(a[i], b[i]));
}
return sum;
}
/* Thread function */
void* vector_sum(void* arg)
{
arg_struct *param = (arg_struct*)arg;
/*
for(index_t i= param->start; i<param->end; i++)
{
pthread_mutex_lock(&param->mutex);
*param->part_sum += vectorOperation(i,param->arr,param->brr,param->crr,add);
pthread_mutex_unlock(&param->mutex);
}
*/
index_t n = param->end - param->start;
pthread_mutex_lock(&(*param->mutex));
// Each thread uses the vectorOperation function to calculate the sum sequentially(Also the critical area)
*param->part_sum = *param->part_sum + vectorOperation(n,param->arr,param->brr,param->crr,add);
//*param->part_sum += vectorOperation(param->end-param->start,param->arr,param->brr,param->crr,add);
pthread_mutex_unlock(&(*param->mutex));
pthread_exit(NULL);
}
// Sum of two vectors in parallel.
value_t vectorOperationParallel(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f, int p) {
value_t sum = 0;
pthread_t threads[p];
arg_struct thread_args[p];
pthread_mutex_t mutex;
pthread_mutex_init(&mutex,NULL);
index_t div = (n+p-1)/p;
for(int i=0; i<p; i++)
{
thread_args[i].start = i*div;
thread_args[i].end = (i+1)*div;
thread_args[i].arr = a;
thread_args[i].brr = b;
thread_args[i].crr = c;
for(int j =0; j<div; j++)
{
thread_args[i].arr[j] = a[thread_args[i].start+j];
thread_args[i].brr[j] = b[thread_args[i].start+j];
thread_args[i].crr[j] = c[thread_args[i].start+j];
}
thread_args[i].part_sum = ∑
thread_args[i].mutex = &mutex;
pthread_create(&threads[i],NULL,vector_sum, (void*)&thread_args[i]);
}
for(int i=0; i<p; i++)
{
pthread_join(threads[i],NULL);
}
return sum;
}
int main(int argc, char **argv)
{
// check for correct argument count
if (argc != 3)
{
printf ("usage: %s vector_size n_threads\n", argv[0]);
exit (EXIT_FAILURE);
}
// get arguments
// vector size
index_t n = (index_t)atol (argv[1]);
// number of threads
int p = atoi (argv[2]);
// check for plausible values
if((p < 1) || (p > 1000)) {
printf("illegal number of threads\n");
exit (EXIT_FAILURE);
}
// allocate memory
value_t *a = malloc(n * sizeof(*a));
value_t *b = malloc(n * sizeof(*b));
value_t *c = malloc(n * sizeof(*c));
if((a == NULL) || (b == NULL) || (c == NULL)) {
printf("no more memory\n");
exit(EXIT_FAILURE);
}
// initialize vectors a,b,c
vectorInit(n, a, b, c);
// work on vectors sequentially
value_t c1sum = vectorOperation(n, a, b, c, add);
// work on vectors parallel for all thread counts from 1 to p
for(int thr=1; thr<= p; thr++) {
// do operation
value_t c2sum = vectorOperationParallel(n, a, b, c, add, thr);
// check result
if(c1sum != c2sum) {
printf("!!! error: vector results are not identical !!!\nsum1=%ld, sum2=%ld\n", (long)c1sum, (long)c2sum);
return EXIT_FAILURE;
}
else
printf("The results are equal: sum1=%ld, sum2=%ld\n",(long)c1sum, (long)c2sum);
}
return EXIT_SUCCESS;
}

Okay I am not sure but this seems to be what is wrong.
At first the names for the variables are horrible.
then n.m. commented:
pthread_mutex_init in a loop is probably a bad idea
you calculate index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
And later you use div * num_of_threads to distrubute the elements. This way you may try to access more elements than there are available.
example:
index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
//(13 * 5 - 1) / 5 = 3
thread_args[i].end = (i + 1) * div; // for the last i ( = 2)
//(2 + 1) * 5 = 15
As soon as you access i >= 13 you get garbage values (undefined behaviour)
Then you make a copy of parts of your original array (I would assume this is slower then just passing a reference to the original).
You don't seem to use the result array *thread_args[i].crr at all.
You only need the mutex for the sum of all values as you have dedicated memory for every array you pass in the thread. You could even pass pointers of the original arrays to the threads without a mutex if you would not use the sum variable in all of them. Because as every addition is self contained and does not access memory of another addition, no mutex is needed.
To calculate the sum of all value you could just use the return value of the thread instead of a reference to a value you pass to every one. This way it would be much faster.
I am not sure if I found everything, but this may help you improve this a good bit.

Related

I am trying to pass 2 sums from a subroutine back to the main function in C

I am currently trying to take a sum from two different subroutine and pass it back to the main function, but every time I do this, it just comes up with a zero value and I am unsure why. I have tried putting my print statements in the main function and just doing calculations in the subroutines and that still didn't work, so I know that my variables aren't returning right and my sum is an actual number. How do I pass my variable sum back to my main function correctly?
Here is my code:
#include<stdio.h>
int X[2000];
int Y[2000];
int main()
{
FILE*fpdata1= NULL;
FILE*fpdata2 = NULL;
fpdata1=fopen("DataSet1.txt","r");
fpdata2=fopen("DataSet2.txt","r");
if(fpdata1==NULL || fpdata2 == NULL)
{
printf("file couldn't be found");
}
int i=0;
while(i<2000)
{
fscanf(fpdata1,"%d!",&X[i]);
fscanf(fpdata2,"%d!",&Y[i]);
// printf("This is X: %d\n",X[i]);
// printf("This is Y: %d\n",Y[i]);
i++;
}
fclose(fpdata1);
fclose(fpdata2);
avgX(X);
avgY(Y);
float sum;
float sumY;
float totalsum;
float totalavg;
totalsum= sum + sumY;
totalavg= totalsum/4000;
printf("Sum X: %f\n\n",sum);
printf("Total sum: %f\n\n",totalsum);
printf("The total average is: %0.3f\n\n",totalavg);
return 0;
}
int avgX(int X[])
{
int i=0;
float averageX;
float sum;
sum = 0;
while (i<2000)
{
sum += X[i];
i++;
}
averageX = sum/2000;
printf("Sum of X: %f\n\n",sum);
printf("The sum of Data Set 1 is: %0.3f\n\n",averageX);
return(sum);
}
int avgY(int Y[])
{
int i=0;
float averageY;
float sumY;
sumY = 0;
while (i<2000)
{
sumY += Y[i];
i++;
}
averageY = sumY/2000;
printf("Sum of Y: %f\n\n",sumY);
printf("The sum of Data Set 2 is: %0.3f\n\n",averageY);
return (sumY);
}
Firstly, it would appear you are expecting the lines
avgX(X);
avgY(Y);
to somehow update the sum and sumY variables in the main function. This is a fundamental misunderstanding of how memory is accessed.
Local variable declarations with the same identifier are not shared between functions. They can be accessed only from within the function in which they are declared (and only for the duration of the function call).
In this example, the apples variables in each of the functions have absolutely no correlation to one another. Expecting this program to print 15 is wrong. This program has undefined behavior because foo and bar read values from uninitialized variables.
void foo(void) {
int apples;
/* This is undefined behaviour,
* as apples was never initialized. Do not do this. */
apples += 5;
}
void bar(void) {
int apples;
/* This is undefined behaviour,
* as apples was never initialized. Do not do this. */
printf("%d\n", apples);
}
int main(void) {
int apples = 10;
foo();
bar();
return 0;
}
Instead of this, you'll want to utilize the arguments and return values of your functions. In this example, in main we pass the value of apples as an argument to foo, which adds 5 to this value and returns the result. We assign this return value, overwriting our previous value.
int foo(int val) {
return value + 5;
}
void bar(int val) {
printf("%d\n", val);
}
int main(void) {
int apples = 10;
apples = foo(apples);
bar(apples);
return 0;
}
Again note that the val parameters do not refer some "shared variable", they are local to both foo and bar individually.
As for the specifics of your program:
The functions avgX and avgY do the exact same thing, just with different identifiers.
It would be better to write a more generic summation function with an additional length parameter so that you are not hard-coding data sizes everywhere.
int sum_ints(int *values, size_t length) {
int result = 0;
for (size_t i = 0; i < length; i++)
result += values[i];
return result;
}
You can then easily write averaging logic utilizing this function.
You do check that your file pointers are not invalid, which is good, but you don't halt the program or otherwise remedy the issue.
It is potentially naive to assume a file will always contain exactly 2000 entries. You can use the return value of fscanf, which is the number of conversions that took place, to test if you've failed to read data. Its also used to signify errors.
Though the fact that global variables are zeroed-out saves you from potentially operating on unpopulated data (in the event the files contain less than 2000 entries), it would be best to avoid global variables when there is an alternative option.
It might be better to separate the reading of files to its own function, so that failures can be handled per-file, and reading limits can be untethered.
int main(void) or int main(int argc, char **argv) are the correct, valid signatures for main.
With all that said, here is a substantially refactored version of your code. Note that an implicit conversion takes place when we assign the integer return value of sum_ints to our floating point variables.
#include <stdio.h>
#include <stdlib.h>
#define DATA_SIZE 2000
int sum_ints(int *values, size_t length) {
int result = 0;
for (size_t i = 0; i < length; i++)
result += values[i];
return result;
}
size_t read_int_file(int *dest, size_t sz, const char *fname) {
FILE *file;
size_t i;
if ((file = fopen(fname, "r")) == NULL) {
fprintf(stderr, "Critical: Failed to open file: %s\n", fname);
exit(EXIT_FAILURE);
}
for (i = 0; i < sz; i++)
if (fscanf(file, "%d!", dest + i) != 1)
break;
fclose(file);
return i;
}
int main(void) {
int data_x[DATA_SIZE] = { 0 },
data_y[DATA_SIZE] = { 0 };
size_t data_x_len = read_int_file(data_x, DATA_SIZE, "DataSet1.txt");
size_t data_y_len = read_int_file(data_y, DATA_SIZE, "DataSet2.txt");
float sum_x = sum_ints(data_x, data_x_len),
sum_y = sum_ints(data_y, data_y_len);
float total_sum = sum_x + sum_y;
float total_average = total_sum / (data_x_len + data_y_len);
printf("Sums: [X = %.2f] [Y = %.2f] [Total = %.2f]\n"
"The total average is: %0.3f\n",
sum_x, sum_y, total_sum,
total_average);
}

C - pointing members of array of struct into another array ( no duplicate struct data just point to it )

I have two identical arrays of struct , one in reverse order.
The problem is that i don't want duplicate the same data into the two arrays , i would a reversed array with elements pointing elements of the first array in a way that i can edit the members of struct of first array or from the reversed array taking effect in both.
you can view the source and run it online here https://onlinegdb.com/SJbepdWxS
#include <stdio.h>
typedef struct point{
int id;
float x,y,z;
} point;
void printPoints(point *pts,int len){
int i = 0;
while (pts !=NULL && i < len){
printf("id %d x %f y%f z %f\n",pts->id,pts->x,pts->y,pts->z);
pts++;
i++;
}
}
void translatePoints(point *pts,int len,float t){
int i = 0;
while (pts !=NULL && i < len){
pts->x = pts->x + t;
pts->y = pts->y + t;
pts->z = pts->z + t;
pts++;
i++;
}
}
void reversePoints(point *pts, int len, point *rev){
int i = 0;
int j = len;
while (i < len){
j=len-i-1;
rev[j]=pts[i];
i++;
}
}
int main()
{
int i;
int t1=200;
int t2=300;
int len=3;
point points[len];
point rev_points[len];
for(i=0; i<len ; i++){
points[i].id=i;
points[i].x=10+i;
points[i].y=20+i;
points[i].z=30+i;
}
//point * pts = points;
printf("\nprint points \n\n");
printPoints(points,len);
printf("\ntranslate points %d...\n\n",t1);
translatePoints(points,len,t1);
printf("\nprint points\n\n");
printf("\nreverse points to rev_points\n");
reversePoints(points,len,rev_points);
printf("\nprint rev_points \n\n");
printPoints(rev_points,len);
printf("\ntranslate rev_points %d...\n\n",t2);
translatePoints(rev_points,len,t2);
printf("\nprint rev_points\n\n");
printPoints(rev_points,len);
printf("\nprint points\n\n");
printPoints(points,len);
return 0;
}
I expect that struct values of both arrays change when i change value in one of the two array.
But changing values of struct in the first array , the second array not changes and the other way around.
One way to look at this is a set of points and two permutations on the set. This sets up a points array, which is used as a set, and forward_points and reverse_points as arrays of pointers to the point array that we are going to use as permutations.
#include <stdio.h>
struct Point {
int id;
float x,y,z;
};
/* Print a point. */
static void printPoint(struct Point *point) {
printf("id %d x %f y%f z %f\n",point->id,point->x,point->y,point->z);
}
/* These print out an array of pointers to point. */
static void printPointsRef(struct Point **ref, int len) {
struct Point **end = ref + len;
while(ref < end) printPoint(*(ref++));
}
/* This translates all the `pts` up to `len` by `(1,1,1)*t`. */
static void translatePoints(struct Point *pts, int len, float t) {
struct Point *end = pts + len;
while(pts < end) {
pts->x = pts->x + t;
pts->y = pts->y + t;
pts->z = pts->z + t;
pts++;
}
}
/* Helper function to `main`. */
static void printPoints(struct Point **forward_points,
struct Point **reverse_points, int len) {
printf("print points\nprint points forward:\n");
printPointsRef(forward_points,len);
printf("print points reverse:\n");
printPointsRef(reverse_points,len);
printf("\n");
}
int main(void)
{
const int len = 3;
/* This is the actual points structure. */
struct Point points[len];
/* These are arrays of pointers to points; they are
permutations of `points`. */
struct Point *forward_points[len], *reverse_points[len];
int i;
const int t1=200;
for(i=0; i<len; i++) {
/* Initialise element `i` of `points`. */
points[i].id=i;
points[i].x=10+i;
points[i].y=20+i;
points[i].z=30+i;
/* Initialise element `i` of `forward_points`
to point to `points[i]`, and `backward_points`
to point the other way (it doesn't matter that
the backwards points are uninitialised, they
will be.) */
forward_points[i] = &points[i];
reverse_points[i] = &points[len - 1 - i];
}
printPoints(forward_points, reverse_points, len);
/* Translation is a vector space operation and doesn't
care about order; we just do it on the original points. */
printf("translate points %d...\n\n",t1);
translatePoints(points,len,t1);
printPoints(forward_points, reverse_points, len);
return 0;
}
Of course, there is no integrity constraints on the pointers; nothing stopping one from pointing at anything, null, the same elements, or anything else.
I added an other struct with one element that is a pointer
typedef struct ptr_point{
point * p;
} ptr_point;
I edited the function reversePoints
void reversePoints(point *pts, int len, ptr_point *rev){
// This function is used only to test pointers
int i = 0;
int j = len;
while (i < len){
j=len-i-1;
rev[j].p = &pts[i];
i++;
}
}
and added another function to print ptr_points
void printPtrPoints(ptr_point *pts,int len){
int i = 0;
while (i < len){
printf("id %d x %f y%f z %f\n",pts->p->id,pts->p->x,pts->p->y,pts->p->z);
pts++;
i++;
}
}
and declaring the second array as ptr_point array
ptr_point rev_points[len];
In conclusion : now data in the second array are not replicated but pointing to element structure of the first array.
The need to not replicate data arise in presence of millions of coordinate points that if replicate more than one time , sorting it for example by x, y, z and so on , occupe much memory with the difficulty of managing .
This fix however forces me to use structures->type in order to change the access mode to read or set values.
I don't know if this is the best solution but it has solved the problem for not duplicate the data.
you can run the source with fixes here: https://onlinegdb.com/SknP_i-eS
Thank you all for the advice.

Return two values with pop function from priority queue

I have priority queue which returns with pop function just int y, but I need return int x and int y. So I found, that I can use struct (struct point) to return two values from function, but I can't figure, how it implement (rewrite int out to struct and use it in main).
Structs:
typedef struct { int x; int y; int pri; } q_elem_t;
typedef struct { q_elem_t *buf; int n, alloc; } pri_queue_t, *pri_queue;
struct point{int PointX; int PointY;};
Pop function:
int priq_pop(pri_queue q, int *pri)
{
int out;
if (q->n == 1) return 0;
q_elem_t *b = q->buf;
out = b[1].y;
if (pri) *pri = b[1].pri;
/* pull last item to top, then down heap. */
--q->n;
int n = 1, m;
while ((m = n * 2) < q->n) {
if (m + 1 < q->n && b[m].pri > b[m + 1].pri) m++;
if (b[q->n].pri <= b[m].pri) break;
b[n] = b[m];
n = m;
}
b[n] = b[q->n];
if (q->n < q->alloc / 2 && q->n >= 16)
q->buf = realloc(q->buf, (q->alloc /= 2) * sizeof(b[0]));
return out;
}
Use in main():
/* pop them and print one by one */
int c;
while ((c = priq_pop(q, &p)))
printf("%d: %d\n", p, c);
I'm starting with C, so I will be gratefull for any help.
You can declare your structures like so:
typedef struct queue_element_struct { // It's good practice to name your structs
int x,y;
int pri;
} queue_element_t;
typedef struct priority_queue_struct {
queue_element_t *buf;
int n, alloc;
} pri_queue_t, *pri_queue; // Don't know what `*pri_queue` is for
Then change your function to return a pointer to a queue_element_t structure
queue_element_t * priq_pop(pri_queue q, int *pri)
Change
int out;
if (q->n == 1) return 0;
q_elem_t *b = q->buf;
out = b[1].y;
To
// Create new pointer to queue_element_t structure
// that will be returned by this function
queue_element_t *out;
out = (queue_element_t *) malloc(sizeof(queue_element_t));
if (! out) {
// Could not allocate
}
if (q->n == 1) return 0;
// Set data from queue
out->x = q->buf[1].x;
out->y = q->buf[1].y;
I don't know exactly what your function does, but that is how you return a structure in C.
You said you're just starting with C, so I recommend:
“Code Complete” book by Steve McConnell. It is very useful to comment your code (no matter how small)
properly name your variables: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Variable_Names
learn about pointers. All you can read about them, read it.
You could make your queue data of type struct point
Structs:
typedef struct point{int PointX; int PointY;} q_data;
typedef struct { q_data d; int pri; } q_elem_t;
typedef struct { q_elem_t *buf; int n, alloc; } pri_queue_t, *pri_queue;
Pop function:
q_data priq_pop(pri_queue q, int *pri)
{
q_data out = {0,0};
if (q->n == 1) return out;
q_elem_t *b = q->buf;
out = b[1].d;
if (pri) *pri = b[1].pri;
/* pull last item to top, then down heap. */
--q->n;
int n = 1, m;
while ((m = n * 2) < q->n) {
if (m + 1 < q->n && b[m].pri > b[m + 1].pri) m++;
if (b[q->n].pri <= b[m].pri) break;
b[n] = b[m];
n = m;
}
b[n] = b[q->n];
if (q->n < q->alloc / 2 && q->n >= 16)
q->buf = realloc(q->buf, (q->alloc /= 2) * sizeof(b[0]));
return out;
}
Use in main():
/* pop them and print one by one */
q_data c;
while ((c = priq_pop(q, &p)))
printf("%d: %d, %d\n", p, c.PointX, x.PointY);
Something like this should do the trick. I didn't test it though, so there might be errors.
good luck!
In C++ you would use a vector or something similar to store an array of Unfortunately you can't fall back on this.
Why not use an array though, you could have your queue be an array of q_elem_t?
q_elem_t *my_array = q_elem_t array[100]; //pseudo code
For more about making an array of structs see here: How do you make an array of structs in C?
The only thing with an array is that you need to either malloc an arbitrary size (i.e. array[100]) or you need to dynamically control the memory of the array. If you are starting out it might just be best to declare an array of size 100.
To me it looks like the confusion is in the lack of datastructure. Array is a good starting point but if you want to learn more check out linked lists and things like that.

Multithreading in C, get the average of 4 arrays

I'm new to multithreading had my first lesson yesterday. So I've wrote a program to get the average of 4 big arrays , each array is a thread and the main waits for all the threads and gives the average of the 4 arrays. This is possible because each thread gives the average of one array. The array is just a headerfile with a float array.
It compiles but gives me a segmentation error and I don't see why.
#include "gemiddelde.h"
#include <stdlib.h>
#include <stdio.h>
float *gemiddelde(void *arg)
{
float *a;
int i;
a = (float *)arg;
float * som;
for( i = 0; i < 100000; i++)
*som += a[i];
*som = *som / 100000;
return som;
}
int main()
{
pthread_t t1,t2,t3,t4;
float * som1, * som2, * som3, * som4, *result;
pthread_create(&t1,NULL,gemiddelde,a1);
pthread_create(&t2,NULL,gemiddelde,a2);
pthread_create(&t3,NULL,gemiddelde,a3);
pthread_create(&t4,NULL,gemiddelde,a4);
pthread_join(t1,som1);
pthread_join(t2,som2);
pthread_join(t3,som3);
pthread_join(t4,som4);
usleep(1);
*result = *som1 + *som2 + *som3 + *som4;
printf("Gemiddelde is: %f ", *result);
return 0;
}
Can someone help me?
Kind regards,
In
*result = *som1 + *som2 + *som3 + *som4;
result is used unitialized. Make it a plain float instead of a pointer.
From your current code, segfault occurs because som* aren't initialized -- they are dangling pointers.
Your code is very problematic, because the thread code requires memory to store the result, and as it stands your code is plain wrong because it doesn't have any memory and just dereferences a dangling pointer. But even allocating memory inside the thread is not a great idea, because it's not clear who is responsible for it and who will clean it up. So it's much better to allocate all your required memory in the main function. First some boiler plate to set up the thread argument data:
typedef struct thread_arg_type_
{
float * data;
size_t len;
float retval;
} thread_arg_type;
thread_arg_type * create_thread_arg(size_t n)
{
thread_arg_type * result = malloc(sizeof(thread_arg_type));
if (!result) return NULL;
float * const p = malloc(n * sizeof(float));
if (!p)
{
free(result);
return NULL;
}
result->len = n;
result->data = p;
return result;
}
void free_thread_arg(thred_arg_type * r)
{
if (r) free(r->data);
free(r);
}
Now here's how we use it:
int main()
{
thread_arg_type * arg;
pthread_t t;
arg = create_thread_arg(array1_size);
pthread_create(&t, NULL, getmiddle, arg);
// ...
pthread_join(t, NULL);
printf("The result is: %f.\n", arg->retval);
free_thread_arg(arg);
}
And finally we must adapt getmiddle:
void * getmiddle(thread_arg_t * arg)
{
arg->retval = 0;
for(unsigned int i = 0; i != arg->len; ++i)
arg->retval += arg->data[i];
arg->retval /= arg->len;
return NULL;
}

C pthread Segmentation fault

so I was trying to make a GPGPU emulator with c & pthreads but ran into a rather strange problem which I have no idea why its occurring. The code is as below:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <assert.h>
// simplifies malloc
#define MALLOC(a) (a *)malloc(sizeof(a))
// Index of x/y coordinate
#define x (0)
#define y (1)
// Defines size of a block
#define BLOCK_DIM_X (3)
#define BLOCK_DIM_Y (2)
// Defines size of the grid, i.e., how many blocks
#define GRID_DIM_X (5)
#define GRID_DIM_Y (7)
// Defines the number of threads in the grid
#define GRID_SIZE (BLOCK_DIM_X * BLOCK_DIM_Y * GRID_DIM_X * GRID_DIM_Y)
// execution environment for the kernel
typedef struct exec_env {
int threadIdx[2]; // thread location
int blockIdx[2];
int blockDim[2];
int gridDim[2];
float *A,*B; // parameters for the thread
float *C;
} exec_env;
// kernel
void *kernel(void *arg)
{
exec_env *env = (exec_env *) arg;
// compute number of threads in a block
int sz = env->blockDim[x] * env->blockDim[y];
// compute the index of the first thread in the block
int k = sz * (env->blockIdx[y]*env->gridDim[x] + env->blockIdx[x]);
// compute the index of a thread inside a block
k = k + env->threadIdx[y]*env->blockDim[x] + env->threadIdx[x];
// check whether it is in range
assert(k >= 0 && k < GRID_SIZE && "Wrong index computation");
// print coordinates in block and grid and computed index
/*printf("tx:%d ty:%d bx:%d by:%d idx:%d\n",env->threadIdx[x],
env->threadIdx[y],
env->blockIdx[x],
env->blockIdx[y], k);
*/
// retrieve two operands
float *A = &env->A[k];
float *B = &env->B[k];
printf("%f %f \n",*A, *B);
// retrieve pointer to result
float *C = &env->C[k];
// do actual computation here !!!
// For assignment replace the following line with
// the code to do matrix addition and multiplication.
*C = *A + *B;
// free execution environment (not needed anymore)
free(env);
return NULL;
}
// main function
int main(int argc, char **argv)
{
float A[GRID_SIZE] = {-1};
float B[GRID_SIZE] = {-1};
float C[GRID_SIZE] = {-1};
pthread_t threads[GRID_SIZE];
int i=0, bx, by, tx, ty;
//Error location
/*for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
}*/
// Step 1: create execution environment for threads and create thread
for (bx=0;bx<GRID_DIM_X;bx++) {
for (by=0;by<GRID_DIM_Y;by++) {
for (tx=0;tx<BLOCK_DIM_X;tx++) {
for (ty=0;ty<BLOCK_DIM_Y;ty++) {
exec_env *e = MALLOC(exec_env);
assert(e != NULL && "memory exhausted");
e->threadIdx[x]=tx;
e->threadIdx[y]=ty;
e->blockIdx[x]=bx;
e->blockIdx[y]=by;
e->blockDim[x]=BLOCK_DIM_X;
e->blockDim[y]=BLOCK_DIM_Y;
e->gridDim[x]=GRID_DIM_X;
e->gridDim[y]=GRID_DIM_Y;
// set parameters
e->A = A;
e->B = B;
e->C = C;
// create thread
pthread_create(&threads[i++],NULL,kernel,(void *)e);
}
}
}
}
// Step 2: wait for completion of all threads
for (i=0;i<GRID_SIZE;i++) {
pthread_join(threads[i], NULL);
}
// Step 3: print result
for (i=0;i<GRID_SIZE;i++) {
printf("%f ",C[i]);
}
printf("\n");
return 0;
}
Ok this code here runs fine, but as soon as I uncomment the "Error Location" (for loop which assigns A[i] = i and B[i] = i + 1, I get snapped by a segmentation fault in unix, and by these random 0s within C in cygwin. I must admit my fundamentals in C is pretty poor, so it may be highly likely that I missed something. If someone can give an idea on what's going wrong it'd be greatly appreciated. Thanks.
It works when you comment that because i is still 0 when the 4 nested loops start.
You have this:
for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
}
/* What value is `i` now ? */
And then
pthread_create(&threads[i++],NULL,kernel,(void *)e);
^
So pthread_create will try to access some interesting indexes indeed.

Resources