I'm new to multithreading had my first lesson yesterday. So I've wrote a program to get the average of 4 big arrays , each array is a thread and the main waits for all the threads and gives the average of the 4 arrays. This is possible because each thread gives the average of one array. The array is just a headerfile with a float array.
It compiles but gives me a segmentation error and I don't see why.
#include "gemiddelde.h"
#include <stdlib.h>
#include <stdio.h>
float *gemiddelde(void *arg)
{
float *a;
int i;
a = (float *)arg;
float * som;
for( i = 0; i < 100000; i++)
*som += a[i];
*som = *som / 100000;
return som;
}
int main()
{
pthread_t t1,t2,t3,t4;
float * som1, * som2, * som3, * som4, *result;
pthread_create(&t1,NULL,gemiddelde,a1);
pthread_create(&t2,NULL,gemiddelde,a2);
pthread_create(&t3,NULL,gemiddelde,a3);
pthread_create(&t4,NULL,gemiddelde,a4);
pthread_join(t1,som1);
pthread_join(t2,som2);
pthread_join(t3,som3);
pthread_join(t4,som4);
usleep(1);
*result = *som1 + *som2 + *som3 + *som4;
printf("Gemiddelde is: %f ", *result);
return 0;
}
Can someone help me?
Kind regards,
In
*result = *som1 + *som2 + *som3 + *som4;
result is used unitialized. Make it a plain float instead of a pointer.
From your current code, segfault occurs because som* aren't initialized -- they are dangling pointers.
Your code is very problematic, because the thread code requires memory to store the result, and as it stands your code is plain wrong because it doesn't have any memory and just dereferences a dangling pointer. But even allocating memory inside the thread is not a great idea, because it's not clear who is responsible for it and who will clean it up. So it's much better to allocate all your required memory in the main function. First some boiler plate to set up the thread argument data:
typedef struct thread_arg_type_
{
float * data;
size_t len;
float retval;
} thread_arg_type;
thread_arg_type * create_thread_arg(size_t n)
{
thread_arg_type * result = malloc(sizeof(thread_arg_type));
if (!result) return NULL;
float * const p = malloc(n * sizeof(float));
if (!p)
{
free(result);
return NULL;
}
result->len = n;
result->data = p;
return result;
}
void free_thread_arg(thred_arg_type * r)
{
if (r) free(r->data);
free(r);
}
Now here's how we use it:
int main()
{
thread_arg_type * arg;
pthread_t t;
arg = create_thread_arg(array1_size);
pthread_create(&t, NULL, getmiddle, arg);
// ...
pthread_join(t, NULL);
printf("The result is: %f.\n", arg->retval);
free_thread_arg(arg);
}
And finally we must adapt getmiddle:
void * getmiddle(thread_arg_t * arg)
{
arg->retval = 0;
for(unsigned int i = 0; i != arg->len; ++i)
arg->retval += arg->data[i];
arg->retval /= arg->len;
return NULL;
}
Related
I am working on making a simple perception model in C and I had decided that I wanted to have some sort of abstraction using opaque pointers. Code bellow could give more clues to the problem
perceptron.h
#ifndef __PERCEPTRON_H__
#define __PERCEPTRON_H__
typedef struct _Perceptron _perceptron;
typedef struct{
//public
float * input;
float * weigths;
int size;
//private
void * m_perceptron;
}Perceptron;
Perceptron * InitPerceptron();
void FreePerceptron(Perceptron * instance);
void FeedForward(float input[],float weights[],int size,Perceptron * perceptron);
#endif
and this is my perception.c
typedef struct _Perceptron{
float * input;
int size;
}_perceptron;
static void _FeedForward(float input[],float weights[],int size,_perceptron * p){
if(p->input == NULL)
p->input = (float *)calloc(size,sizeof(float));
for(int i = 0;i < size;i++)
p->input[i] = input[i] * weights[i];
for(int i = 0;i < size;i++)
printf("%f ",p->input[i]);
}
Perceptron * InitPerceptron(){
Perceptron * instance = (Perceptron *)malloc(sizeof(Perceptron));
instance ->m_perceptron = (_perceptron *)malloc(sizeof(_perceptron));
return instance;
}
void FeedForward(float input[],float weights[],int size,Perceptron * perceptron) {
if(perceptron->input == NULL)
perceptron->input = (float *)malloc(size*sizeof(float));
if(perceptron->weigths == NULL)
perceptron->weigths = (float *)malloc(size*sizeof(float));
for(int i = 0;i < size;i++){
perceptron->input[i] = input[i];
perceptron->weigths[i] = weights[i];
}
perceptron->size = size;
_FeedForward(perceptron->input,perceptron->weigths,perceptron->size,perceptron->m_perceptron);
}
void FreePerceptron(Perceptron * instance){
free(instance->m_perceptron);
free(instance);
}
and this is my main.c
float input[] = {1,2,3};
float weights[] = {1,1,1};
Perceptron * perceptron = InitPerceptron();
FeedForward(input,weights,3,perceptron);
FreePerceptron(perceptron);
return 0;
(gdb) r
Starting program: E:\repos\NeuralNetwork\bin\neuralnetwork.exe
[New Thread 14564.0x49c4]
[New Thread 14564.0x3a44]
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004017b4 in FeedForward ()
(gdb)
This is the error I am getting
My initial guess is that maybe I am accessing pointers perception and _perceptron as NULL pointers, so I tried removing the FeedForward function and it seems to work just fine. My next guess is that the inputs probably are given in a wrong way, but that does not seem likely.What could be the cause for the segmentation fault. Could it be in the main function itself our is it something related to the use of malloc to the float pointers or is the use of my opaque pointers wrong ?
The values stored here are uninitialized at the moment of comparison:
if(perceptron->input == NULL)
and
if(perceptron->weigths == NULL)
and
if(p->input == NULL)
Which means that there may be no memory allocated for these pointers. Set these explicitly toNULL to solve the problem.
Side note: It is "weights", not "weigths".
Hello guys,
I have a problem in my code and i do not know how to fix it(A Segmentation fault(core dumped))!
So my teacher wants me to write a program that creates N treads and makes them do some calculations.I have 3 global 2d arrays A,B,C (i have them as pointers because i don not know the size,the user gives it as argument).I try to allocate memory to them in the main function.
So the problem is i get a Segmentation fault when i try to create the treads in "pthread_create(&tid[id],NULL,add,(void *)(long) i);" :(.
I can't figure out why this is happening.I tried using the gdb command but the result was that the problem is in pthread_create.
However when i put in comment the arrays(A,B,C) and the malloc they are using is runs(but the final result is 0).
I am using a virtual box(with Ubuntu inside if that helps :D).
The following code is what i wrote so far:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
long int p,N,Total_Sum;
long int **A,**B,**C;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_barrier_t bar;
void * add(void *arg){
long int i,j,Local_Sum=0;
long int lines,start,end,id;
id = (long int)arg;
lines = N/p;
start = id*lines;
end = start+lines;
for(i=start;i<end;i++){
for(j=0;j<N;j++){
A[i][j] = 1;
B[i][j] = 1;
}
}
for(i=start;i<end;i++){
for(j=0;j<N;j++){
C[i][j] = A[i][j] * B[i][j];
Local_Sum += C[i][j];
printf("C[%ld][%ld] = %ld\n",i,j,C[i][j]);
}
}
pthread_mutex_lock(&mutex);
Total_Sum += Local_Sum;
pthread_mutex_unlock(&mutex);
pthread_barrier_wait(&bar);
pthread_exit(0);
}
int main(int argc, char *argv[]){
long int i,j,id;
pthread_t *tid;
if(argc!=3){
printf("Provide Number Of Threads And Size\n");
exit(1);
}
p = atoi(argv[1]);
tid = (pthread_t *) malloc(p*sizeof(pthread_t));
if(tid == NULL){
printf("Could Not Allocate Memory\n");
exit(1);
}
pthread_barrier_init(&bar,NULL,p);
N = atoi(argv[2]);
A = (long int**) malloc(N*sizeof(long int*));
B = (long int**) malloc(N*sizeof(long int*));
C = (long int**) malloc(N*sizeof(long int*));
for(i=0;i<N;i++){
A[i] = (long int*) malloc(N*sizeof(long int));
B[i] = (long int*) malloc(N*sizeof(long int));
C[i] = (long int*) malloc(N*sizeof(long int));
}
if((A==NULL) || (B == NULL) || (C == NULL)){
printf("Count Not Allocate Memory\n");
exit(1);
}
for(i=0;i<p;i++){
pthread_create(&tid[id],NULL,add,(void *)(long) i);
}
for(i=0;i<p;i++){
pthread_join(tid[id],NULL);
}
for(i=0;i<N;i++){
free(A[i]);
free(B[i]);
free(C[i]);
}
free(A);
free(B);
free(C);
printf("Final Result Is Equal To: %ld\n",Total_Sum);
return 0;
}
******I know it gets a little bit messy because of the mutex and the the barriers but ask me for further specifications :D.******
Thanks!!!!!!
I think the only problem are the indexes in the following lines:
for(i=0;i<p;i++){
pthread_create(&tid[id],NULL,add,(void *)(long) i);
}
for(i=0;i<p;i++){
pthread_join(tid[id],NULL);
}
id has only been declared, but never initialized! Maybe it's just a typo and you wanted to use i as index for the tid
The solutions should be:
for(i=0;i<p;i++){
pthread_create(&tid[i],NULL,add,(void *)(long) i);
}
for(i=0;i<p;i++){
pthread_join(tid[i],NULL);
}
The answer to the source of your core dump problem has already been addressed, but to address the other things you asked, or stated:
1st:
Regarding your statement: i have them as pointers because i don not know the size,the user gives it as argument.
Often times, in C, you can avoid using calloc/malloc in your code by using VLAs instead. Available in C99 and beyond. (see links)
VLA a
VLA b
2nd:
Regarding your statement: I know it gets a little bit messy...
Its really not that messy, but you could consider cleaning up the memory allocation/freeing steps by moving most of the work into a function:
long int **A,**B,**C;
int N;
...
//in main
N = atoi(argv[2]);
A = Create2D(N, N);
B = Create2D(N, N);
B = Create2D(N, N);
...
free2D(A, N);
free2D(B, N);
free2D(C, N);
long ** Create2D(int c, int r)
{
long **arr;
int y = 0;
arr = calloc(c, sizeof(long *));
for(y=0;y<c;y++)
{
arr[y] = calloc((2*y)+1, sizeof(long));
}
return arr;
}
void free2D(long **arr, int c)
{
int i;
if(!arr) return;
for(i=0;i<c;i++)
{
if(arr[i])
{
free(arr[i]);
arr[i] = NULL;
}
}
free(arr);
arr = NULL;
}
Side note:
There is nothing absolutely wrong with your memory statements as they are:
A = (long int**) malloc(N*sizeof(long int*));
However, although C++ requires it, there is no reason to cast the return of malloc ,calloc or realloc when using C. (See discussion here) The following is sufficient (in C):
A = malloc(N*sizeof(long int*));
I am trying to calculate the sum of two vectors a and b using pthreads in C. I am given a function that computes the sum in sequential form and another which does so in parallel form. My program is working properly but computing different sums when there are multiple threads. I have used proper thread synchronization on the critical area, but still cannot see where I am going wrong. I get the correct answer on the first thread since there is only one thread doing the job and then I get wrong answers on multiple threads. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
// type for value of vector element
typedef short value_t;
// type for vector dimension / indices
typedef long index_t;
// function type to combine two values
typedef value_t (*function_t)(const value_t x, const value_t y);
// struct to store the respective values of the vectors a,b and c
typedef struct{
index_t start;
index_t end;
value_t *arr;
value_t *brr;
value_t *crr;
value_t *part_sum;
pthread_mutex_t *mutex;
}arg_struct;
// function to combine two values
value_t add(const value_t x, const value_t y) {
return ((x+y)*(x-y)) % ((int)x+1) +27;
}
// function to initialize the vectors a,b and c
void vectorInit(index_t n, value_t a[n], value_t b[n], value_t c[n]) {
for(index_t i=0; i<n; i++) {
a[i] = (value_t)(2*i);
b[i] = (value_t)(n-i);
c[i] = 0;
}
}
// function to count the sum of two variables sequentially
value_t vectorOperation(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f) {
value_t sum = 0;
for(index_t i=0; i<n; i++) {
sum += (c[i] = f(a[i], b[i]));
}
return sum;
}
/* Thread function */
void* vector_sum(void* arg)
{
arg_struct *param = (arg_struct*)arg;
/*
for(index_t i= param->start; i<param->end; i++)
{
pthread_mutex_lock(¶m->mutex);
*param->part_sum += vectorOperation(i,param->arr,param->brr,param->crr,add);
pthread_mutex_unlock(¶m->mutex);
}
*/
index_t n = param->end - param->start;
pthread_mutex_lock(&(*param->mutex));
// Each thread uses the vectorOperation function to calculate the sum sequentially(Also the critical area)
*param->part_sum = *param->part_sum + vectorOperation(n,param->arr,param->brr,param->crr,add);
//*param->part_sum += vectorOperation(param->end-param->start,param->arr,param->brr,param->crr,add);
pthread_mutex_unlock(&(*param->mutex));
pthread_exit(NULL);
}
// Sum of two vectors in parallel.
value_t vectorOperationParallel(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f, int p) {
value_t sum = 0;
pthread_t threads[p];
arg_struct thread_args[p];
pthread_mutex_t mutex;
pthread_mutex_init(&mutex,NULL);
index_t div = (n+p-1)/p;
for(int i=0; i<p; i++)
{
thread_args[i].start = i*div;
thread_args[i].end = (i+1)*div;
thread_args[i].arr = a;
thread_args[i].brr = b;
thread_args[i].crr = c;
for(int j =0; j<div; j++)
{
thread_args[i].arr[j] = a[thread_args[i].start+j];
thread_args[i].brr[j] = b[thread_args[i].start+j];
thread_args[i].crr[j] = c[thread_args[i].start+j];
}
thread_args[i].part_sum = ∑
thread_args[i].mutex = &mutex;
pthread_create(&threads[i],NULL,vector_sum, (void*)&thread_args[i]);
}
for(int i=0; i<p; i++)
{
pthread_join(threads[i],NULL);
}
return sum;
}
int main(int argc, char **argv)
{
// check for correct argument count
if (argc != 3)
{
printf ("usage: %s vector_size n_threads\n", argv[0]);
exit (EXIT_FAILURE);
}
// get arguments
// vector size
index_t n = (index_t)atol (argv[1]);
// number of threads
int p = atoi (argv[2]);
// check for plausible values
if((p < 1) || (p > 1000)) {
printf("illegal number of threads\n");
exit (EXIT_FAILURE);
}
// allocate memory
value_t *a = malloc(n * sizeof(*a));
value_t *b = malloc(n * sizeof(*b));
value_t *c = malloc(n * sizeof(*c));
if((a == NULL) || (b == NULL) || (c == NULL)) {
printf("no more memory\n");
exit(EXIT_FAILURE);
}
// initialize vectors a,b,c
vectorInit(n, a, b, c);
// work on vectors sequentially
value_t c1sum = vectorOperation(n, a, b, c, add);
// work on vectors parallel for all thread counts from 1 to p
for(int thr=1; thr<= p; thr++) {
// do operation
value_t c2sum = vectorOperationParallel(n, a, b, c, add, thr);
// check result
if(c1sum != c2sum) {
printf("!!! error: vector results are not identical !!!\nsum1=%ld, sum2=%ld\n", (long)c1sum, (long)c2sum);
return EXIT_FAILURE;
}
else
printf("The results are equal: sum1=%ld, sum2=%ld\n",(long)c1sum, (long)c2sum);
}
return EXIT_SUCCESS;
}
Okay I am not sure but this seems to be what is wrong.
At first the names for the variables are horrible.
then n.m. commented:
pthread_mutex_init in a loop is probably a bad idea
you calculate index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
And later you use div * num_of_threads to distrubute the elements. This way you may try to access more elements than there are available.
example:
index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
//(13 * 5 - 1) / 5 = 3
thread_args[i].end = (i + 1) * div; // for the last i ( = 2)
//(2 + 1) * 5 = 15
As soon as you access i >= 13 you get garbage values (undefined behaviour)
Then you make a copy of parts of your original array (I would assume this is slower then just passing a reference to the original).
You don't seem to use the result array *thread_args[i].crr at all.
You only need the mutex for the sum of all values as you have dedicated memory for every array you pass in the thread. You could even pass pointers of the original arrays to the threads without a mutex if you would not use the sum variable in all of them. Because as every addition is self contained and does not access memory of another addition, no mutex is needed.
To calculate the sum of all value you could just use the return value of the thread instead of a reference to a value you pass to every one. This way it would be much faster.
I am not sure if I found everything, but this may help you improve this a good bit.
I am a beginner with C and currently struggling with using structs in functions. Even tho I give my functions the pointer to my struct and use that to change their values it seems like my functions are unable to change the value of the pointer themselves
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <assert.h>
typedef struct Vector
{
int n;
double *entry;
}Vector;
//defining the struct vector with n for its length and entry as a pointer which will become an array for its values
Vector *newVector(int n)
{
int i = 0;
Vector *X = NULL;
assert(n > 0);
X = malloc(sizeof(Vector));
assert(X != NULL);
X->n = n;
X->entry = malloc(n+1*sizeof(double));
assert(X->entry != NULL);
X->entry[0] = 0;
for (i=1; i<n+1; ++i) {
scanf("%lf",&X->entry[i]);
}
return X;
}
//making a new vector struct from the size given and returning its pointer (the array is made so it goes from 1-n instead of 0-(n-1)
void delVector(Vector *X)
{
assert(X != NULL);
assert(X->entry != NULL);
free(X->entry);
free(X);
X = NULL;
}
int getVectorLength (Vector *X)
{
assert(X != NULL);
return X->n;
}
void setVectorLength(Vector *X, int k)
{
assert(X != NULL);
assert(k>0);
delVector(X);
printf("so far.\n");
X = newVector(k);
} //deleting the vector and then replacing it with the new sized vector
double getVectorEntry (Vector *X, int h)
{
return X->entry[h];
}
void setVectorEntry (Vector *X, int h)
{
printf("Value %d.\n",h);
scanf("%lf",&X->entry[h]);
}
main()
{
Vector *h = NULL;
h = newVector(3);
printf("%f\n",h->entry[1]);
printf("%f\n",getVectorEntry(h,1));
setVectorEntry(h,1);
printf("%f\n",getVectorEntry(h,1));
printf("%d\n",getVectorLength(h));
setVectorLength(h,6);
printf("%f\n",getVectorEntry(h,6));
setVectorEntry(h,6);
printf("so far.\n");
printf("%f\n",getVectorEntry(h,6));
}
Running the code would make it crash once it reaches delVector. If I were to comment the delVector out it would crash at X = newVector(k); for a reason I also cannot find out (it starts the function newVector, but crashes before I can input). So what is causing the errors?
Thanks a lot in advance!
Let's just focus on the setVectorLength function
void setVectorLength(Vector *X, int k)
{
assert(X != NULL);
assert(k>0);
delVector(X);
printf("so far.\n");
X = newVector(k);
}
Here, Vector *X is a local variable that only exists in this function. It points to some location in memory, but using it you can't modify the original pointer. So there are two ways around it.
1. You don't delete and recreate the vector. Instead do something like this
void setVectorLength(Vector *X, int k)
{
X->entry = realloc(X->entry, k*sizeof(double)); // frees X->entry and allocates it to a new heap array
X->k = k;
}
2. Use a pointer pointer. Do note, that this will be slower and is unnecessary. Go for the 1 solution.
void setVectorLength(Vector **X, int k)
{
assert(*X != NULL);
assert(k>0);
delVector(*X);
printf("so far.\n");
*X = newVector(k);
}
The variable Vector **X points to the original variable, so you can modify it.
Problem seem to be in function
Vector *newVector(int n)
{
int i = 0;
Vector *X = NULL;
assert(n > 0);
X = malloc(sizeof(Vector));
assert(X != NULL);
X->n = n;
X->entry = malloc(n+1*sizeof(double));
assert(X->entry != NULL);
X->entry[0] = 0;
for (i=1; i<n+1; ++i) {
scanf("%lf",&X->entry[i]);
}
return X;
}
In above statement, you are allocating X->entry with n + sizeof(double) but as per your requirement you should allocated (n + 1) * sizeof(double)
You should allocate memory for X->entry as
X->entry = malloc((n + 1) * sizeof(double));
Your allocation is wrong, you have to group the n and the 1 like you do it on paper so. Not that the expression you have n + 1 * sizeof(double) means the same as n + sizeof(double).
X->entry = malloc((n + 1) * sizeof(*X->entry));
would be the right way!
As it was, your code invokes undefined behavior and once possible consequence is for the program to crash. Most likely because trying to access memory out of it's memory space so a segmentation fault occurs.
void setVectorLength(Vector *X, int k)
{
assert(X != NULL);
assert(k>0);
delVector(X);
printf("so far.\n");
X = newVector(k);
}
In this function, X is a copy of the pointer variable the caller passes in. It contains a copy of the address. Modifying a function parameter doesn't change the caller's value. Just as doing k = 0 would have no effect on the caller, reassigning X is a local change that is invisible to the caller.
If you want the caller to see the result of newVector(k), you have a couple options. You could change setVectorLength() to work like newVector() and return the new pointer.
Vector *setVectorLength(Vector *X, int k) {
...
printf("so far.\n");
return newVector(k);
}
int main() {
...
h = setVectorLength(h, 6);
}
Or you could add another layer of indirection—make X a pointer to a pointer. That would require the caller to pass the address of the variable they want changed. Instead of passing h you'd pass &h.
void setVectorLength(Vector *X, int k) {
...
printf("so far.\n");
*X = newVector(k);
}
int main() {
...
setVectorLength(&h, 6);
}
By the way, the way you've written setVectorLength(), it will destroy any existing data in the vector. You may want to copy the data from the old vector to the new one. (Or, as #pi_pi3 suggests, modify the vector in place instead of destroying it and creating a new one.)
so I was trying to make a GPGPU emulator with c & pthreads but ran into a rather strange problem which I have no idea why its occurring. The code is as below:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <assert.h>
// simplifies malloc
#define MALLOC(a) (a *)malloc(sizeof(a))
// Index of x/y coordinate
#define x (0)
#define y (1)
// Defines size of a block
#define BLOCK_DIM_X (3)
#define BLOCK_DIM_Y (2)
// Defines size of the grid, i.e., how many blocks
#define GRID_DIM_X (5)
#define GRID_DIM_Y (7)
// Defines the number of threads in the grid
#define GRID_SIZE (BLOCK_DIM_X * BLOCK_DIM_Y * GRID_DIM_X * GRID_DIM_Y)
// execution environment for the kernel
typedef struct exec_env {
int threadIdx[2]; // thread location
int blockIdx[2];
int blockDim[2];
int gridDim[2];
float *A,*B; // parameters for the thread
float *C;
} exec_env;
// kernel
void *kernel(void *arg)
{
exec_env *env = (exec_env *) arg;
// compute number of threads in a block
int sz = env->blockDim[x] * env->blockDim[y];
// compute the index of the first thread in the block
int k = sz * (env->blockIdx[y]*env->gridDim[x] + env->blockIdx[x]);
// compute the index of a thread inside a block
k = k + env->threadIdx[y]*env->blockDim[x] + env->threadIdx[x];
// check whether it is in range
assert(k >= 0 && k < GRID_SIZE && "Wrong index computation");
// print coordinates in block and grid and computed index
/*printf("tx:%d ty:%d bx:%d by:%d idx:%d\n",env->threadIdx[x],
env->threadIdx[y],
env->blockIdx[x],
env->blockIdx[y], k);
*/
// retrieve two operands
float *A = &env->A[k];
float *B = &env->B[k];
printf("%f %f \n",*A, *B);
// retrieve pointer to result
float *C = &env->C[k];
// do actual computation here !!!
// For assignment replace the following line with
// the code to do matrix addition and multiplication.
*C = *A + *B;
// free execution environment (not needed anymore)
free(env);
return NULL;
}
// main function
int main(int argc, char **argv)
{
float A[GRID_SIZE] = {-1};
float B[GRID_SIZE] = {-1};
float C[GRID_SIZE] = {-1};
pthread_t threads[GRID_SIZE];
int i=0, bx, by, tx, ty;
//Error location
/*for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
}*/
// Step 1: create execution environment for threads and create thread
for (bx=0;bx<GRID_DIM_X;bx++) {
for (by=0;by<GRID_DIM_Y;by++) {
for (tx=0;tx<BLOCK_DIM_X;tx++) {
for (ty=0;ty<BLOCK_DIM_Y;ty++) {
exec_env *e = MALLOC(exec_env);
assert(e != NULL && "memory exhausted");
e->threadIdx[x]=tx;
e->threadIdx[y]=ty;
e->blockIdx[x]=bx;
e->blockIdx[y]=by;
e->blockDim[x]=BLOCK_DIM_X;
e->blockDim[y]=BLOCK_DIM_Y;
e->gridDim[x]=GRID_DIM_X;
e->gridDim[y]=GRID_DIM_Y;
// set parameters
e->A = A;
e->B = B;
e->C = C;
// create thread
pthread_create(&threads[i++],NULL,kernel,(void *)e);
}
}
}
}
// Step 2: wait for completion of all threads
for (i=0;i<GRID_SIZE;i++) {
pthread_join(threads[i], NULL);
}
// Step 3: print result
for (i=0;i<GRID_SIZE;i++) {
printf("%f ",C[i]);
}
printf("\n");
return 0;
}
Ok this code here runs fine, but as soon as I uncomment the "Error Location" (for loop which assigns A[i] = i and B[i] = i + 1, I get snapped by a segmentation fault in unix, and by these random 0s within C in cygwin. I must admit my fundamentals in C is pretty poor, so it may be highly likely that I missed something. If someone can give an idea on what's going wrong it'd be greatly appreciated. Thanks.
It works when you comment that because i is still 0 when the 4 nested loops start.
You have this:
for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
}
/* What value is `i` now ? */
And then
pthread_create(&threads[i++],NULL,kernel,(void *)e);
^
So pthread_create will try to access some interesting indexes indeed.