error C2102: '&' requires l-value - c

The code line: gsl_blas_daxpy(-a,&gsl_matrix_column(D, q).vector,y);
cause the error
error C2102: '&' requires l-value
, now the problem is that I have no control of the GSL functions so I don't know how to figure this out (removing the "&" didn't work)
afterwards i get
error C2198: 'gsl_blas_daxpy' : too few arguments for call
I'm using Visual studio 2010.
GSL_EXPORT int gsl_blas_daxpy (double alpha,
const gsl_vector * X,
gsl_vector * Y);
#include <stdio.h>
#include <math.h>
#include <time.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
#define M (10) // Number of columns in dictionary */
#define N ((int)(M/2)) // Number of rows in dictionary */
int K = 0.07*M; //Number of non-zero elements in signal - the sparsity
int P=1; //number of signals
double epsilon = 1.0e-7; // Residual error
int numOfIterations = N; /* Max num of iterations - same as num of elements in signal */
double sign(double x){return (x>=0) - (x<0);} // Sign function
int main(int argc, char** argv)
int n, m, k, iter, q;
double normi, normf, tmp , norm=sqrt(N), htime;
gsl_matrix *D; // A random dictionary used for encoding the sparse signal NxM
gsl_vector *x; // Sparse info signal (encoder input) MxP
gsl_vector *z; // Evaluated Sparse info signal (decoder output) MxP
gsl_vector *r; // Residual error vector MxP
gsl_vector *y; // Sparse representation of signal (encoder output) NxP
gsl_vector_view v;
clock_t start; //for measuring performance
printf("\nDictionary is:NxM=%dx%d,and the signal sparsity is K=%d", N, M, K);
srand(time(NULL)); //Initialize srand
start =clock(); //Initialize clock
/* Initiallize D as a Bernoulli random dictionary */
D = gsl_matrix_alloc (N, M);
for(m=0; m<M; m++)
for(n=0; n<N; n++)
gsl_matrix_set (D, n, m, tmp); //D[n,m]=tmp
/* Create a random K-sparse info signal */
x = gsl_vector_alloc(M);
for(k=0; k<K; k++)
gsl_vector_set(x, rand()%M, 2.0*rand()/(float)RAND_MAX - 1.0); //put random values at k random positions
/* Allocate memory for solution (evaluated signal) */
z = gsl_vector_calloc(M);
/* Allocate memory for residual vector */
r = gsl_vector_calloc(M);
/* Allocate memory for the encoded signal vector (its representation) */
y = gsl_vector_alloc(N);
printf("\nTime data allocation: %f", htime);
/* Encoding the signal (x to y) */
start = clock();
gsl_blas_dgemv(CblasNoTrans, 1, D, x, 0, y); // y = Dx
printf("\nTime for encoding: %f", htime);
/* Decoding the signal */
start = clock();
normi = gsl_blas_dnrm2(y); // ||y|| (L2 norm)
epsilon = sqrt(epsilon * normi);
normf = normi;
iter = 0;
/*iterate till the computational error is small enough*/
while(normf > epsilon && iter < numOfIterations)
gsl_blas_dgemv(CblasTrans, 1, D, y, 0, r); // r=D'*y
q = gsl_blas_idamax(r); //index of max element in residual vector
tmp = gsl_vector_get(r, q); //the max element in r
gsl_vector_set(z, q, gsl_vector_get(z, q)+tmp); // z[q]=z[q]+ tmp
v=gsl_matrix_column(D, q); // choose the dictrionary's atom (coloum) with the index of largest element in r
gsl_blas_daxpy(-tmp,&v.vector,y); // y = y-tmp*v
normf = gsl_blas_dnrm2(y); // ||y|| (L2 norm)
htime = ((double)clock()-start)/CLOCKS_PER_SEC;
printf("\nTime for decoding: %f", htime);
tmp = 100.0*(normf*normf)/(normi*normi); // the error at end of algorithm
printf("\nComputation residual error: %f",tmp);
/* Check the solution (evaluated signal) against the original signal */
printf("\nSolution (first column),Reference (second column):");
getchar(); // wait for pressing a key
for(m=0; m<M; m++)
printf("\n%.3f\t%.3f", gsl_vector_get(x, m),gsl_vector_get(z, m));
normi = gsl_blas_dnrm2(x);
gsl_blas_daxpy(-1.0, x, z); // z = z-x
normf = gsl_blas_dnrm2(z); // ||z|| (L2 norm)
tmp = 100.0*(normf*normf)/(normi*normi); //final error
printf("\nSolution residual error: %f\n",tmp);
/* Memory clean up and shutdown*/
gsl_vector_free(y); gsl_vector_free(r);
gsl_vector_free(z); gsl_vector_free(x);

gsl_matrix_column(D, q).vector is an R-value. You can't take its address. You need an L-value, so assign it to a named variable first, then pass the address of that variable to the function.

If you make a more permanent home for the return value of gsl_matrix_column, (this particular) problem will go away.
Here is some simplified code that illustrates how one might capture a return value in an addressable slot:
struct _foo {
int i;
struct _foo bar () {
struct _foo result = { 5 };
return result;
/* won't compile; 'lvalue required as unary & operand */
void qux () {
int *j = &bar().i;
/* compiles OK */
void qal () {
struct _foo result = bar();
int* j = &result.i;

gsl_vector_view c=gsl_matrix_column(D, q);
I think, introducing a temporal variable led you pass a pointer to it to the function.
EDIT: Well, trying to understand the problem, I wanted to know what the function expect:
int gsl_blas_daxpy (double alpha, const gsl_vector * x, gsl_vector * y)
gsl_vector_view gsl_matrix_column (gsl_matrix * m, size_t j)
witj some explanation:
A vector view can be passed to any subroutine which takes a vector
argument just as a directly allocated vector would be, using
and an example:
for (j = 0; j < 10; j++)
gsl_vector_view column = gsl_matrix_column (m, j);
double d;
d = gsl_blas_dnrm2 (&column.vector);
printf ("matrix column %d, norm = %g\n", j, d);

Now we have another problem:
Here another answer:
Are you aware that int K= 0.7 is K=0 ??
#define M (10) // Number of columns in dictionary */
int K = 0.07*M; //Number of non-zero elements in signal - the sparsity
alloc do not initialice the vector x. x will contain garbage values, not 0. Did you meant x = gsl_vector_calloc(M); with c? It will set x to 0.
/* Create a random K-sparse info signal */
x = gsl_vector_alloc(M);
for(k=0; k<K; k++) // K=0, for get skiped and x not modified.
gsl_vector_set(x, rand()%M, 2.0*rand()/(float)RAND_MAX - 1.0); //put random values at k random positions
(And here you will have at most K random values, but possible lest)


Adding two vectors using pthreads without a global sum variable

I am trying to calculate the sum of two vectors a and b using pthreads in C. I am given a function that computes the sum in sequential form and another which does so in parallel form. My program is working properly but computing different sums when there are multiple threads. I have used proper thread synchronization on the critical area, but still cannot see where I am going wrong. I get the correct answer on the first thread since there is only one thread doing the job and then I get wrong answers on multiple threads. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
// type for value of vector element
typedef short value_t;
// type for vector dimension / indices
typedef long index_t;
// function type to combine two values
typedef value_t (*function_t)(const value_t x, const value_t y);
// struct to store the respective values of the vectors a,b and c
typedef struct{
index_t start;
index_t end;
value_t *arr;
value_t *brr;
value_t *crr;
value_t *part_sum;
pthread_mutex_t *mutex;
// function to combine two values
value_t add(const value_t x, const value_t y) {
return ((x+y)*(x-y)) % ((int)x+1) +27;
// function to initialize the vectors a,b and c
void vectorInit(index_t n, value_t a[n], value_t b[n], value_t c[n]) {
for(index_t i=0; i<n; i++) {
a[i] = (value_t)(2*i);
b[i] = (value_t)(n-i);
c[i] = 0;
// function to count the sum of two variables sequentially
value_t vectorOperation(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f) {
value_t sum = 0;
for(index_t i=0; i<n; i++) {
sum += (c[i] = f(a[i], b[i]));
return sum;
/* Thread function */
void* vector_sum(void* arg)
arg_struct *param = (arg_struct*)arg;
for(index_t i= param->start; i<param->end; i++)
*param->part_sum += vectorOperation(i,param->arr,param->brr,param->crr,add);
index_t n = param->end - param->start;
// Each thread uses the vectorOperation function to calculate the sum sequentially(Also the critical area)
*param->part_sum = *param->part_sum + vectorOperation(n,param->arr,param->brr,param->crr,add);
//*param->part_sum += vectorOperation(param->end-param->start,param->arr,param->brr,param->crr,add);
// Sum of two vectors in parallel.
value_t vectorOperationParallel(index_t n, value_t a[n], value_t b[n], value_t c[n], function_t f, int p) {
value_t sum = 0;
pthread_t threads[p];
arg_struct thread_args[p];
pthread_mutex_t mutex;
index_t div = (n+p-1)/p;
for(int i=0; i<p; i++)
thread_args[i].start = i*div;
thread_args[i].end = (i+1)*div;
thread_args[i].arr = a;
thread_args[i].brr = b;
thread_args[i].crr = c;
for(int j =0; j<div; j++)
thread_args[i].arr[j] = a[thread_args[i].start+j];
thread_args[i].brr[j] = b[thread_args[i].start+j];
thread_args[i].crr[j] = c[thread_args[i].start+j];
thread_args[i].part_sum = ∑
thread_args[i].mutex = &mutex;
pthread_create(&threads[i],NULL,vector_sum, (void*)&thread_args[i]);
for(int i=0; i<p; i++)
return sum;
int main(int argc, char **argv)
// check for correct argument count
if (argc != 3)
printf ("usage: %s vector_size n_threads\n", argv[0]);
// get arguments
// vector size
index_t n = (index_t)atol (argv[1]);
// number of threads
int p = atoi (argv[2]);
// check for plausible values
if((p < 1) || (p > 1000)) {
printf("illegal number of threads\n");
// allocate memory
value_t *a = malloc(n * sizeof(*a));
value_t *b = malloc(n * sizeof(*b));
value_t *c = malloc(n * sizeof(*c));
if((a == NULL) || (b == NULL) || (c == NULL)) {
printf("no more memory\n");
// initialize vectors a,b,c
vectorInit(n, a, b, c);
// work on vectors sequentially
value_t c1sum = vectorOperation(n, a, b, c, add);
// work on vectors parallel for all thread counts from 1 to p
for(int thr=1; thr<= p; thr++) {
// do operation
value_t c2sum = vectorOperationParallel(n, a, b, c, add, thr);
// check result
if(c1sum != c2sum) {
printf("!!! error: vector results are not identical !!!\nsum1=%ld, sum2=%ld\n", (long)c1sum, (long)c2sum);
printf("The results are equal: sum1=%ld, sum2=%ld\n",(long)c1sum, (long)c2sum);
Okay I am not sure but this seems to be what is wrong.
At first the names for the variables are horrible.
then n.m. commented:
pthread_mutex_init in a loop is probably a bad idea
you calculate index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
And later you use div * num_of_threads to distrubute the elements. This way you may try to access more elements than there are available.
index_t div = (elements_in_vector + num_of_threads - 1) / num_of_threads;
//(13 * 5 - 1) / 5 = 3
thread_args[i].end = (i + 1) * div; // for the last i ( = 2)
//(2 + 1) * 5 = 15
As soon as you access i >= 13 you get garbage values (undefined behaviour)
Then you make a copy of parts of your original array (I would assume this is slower then just passing a reference to the original).
You don't seem to use the result array *thread_args[i].crr at all.
You only need the mutex for the sum of all values as you have dedicated memory for every array you pass in the thread. You could even pass pointers of the original arrays to the threads without a mutex if you would not use the sum variable in all of them. Because as every addition is self contained and does not access memory of another addition, no mutex is needed.
To calculate the sum of all value you could just use the return value of the thread instead of a reference to a value you pass to every one. This way it would be much faster.
I am not sure if I found everything, but this may help you improve this a good bit.

C Code for numerical integration works on one computer but blows up on another [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have written a code for a simple pendulum with numerical integration using rk4 method. Here's an image of expected result.
It works on my laptop, running Ubuntu 14.04, 64 bit, (it gives a sine wave as the result), but doesn't work on my PC, which runs Debian 8 and is also 64 bit.
Here's an image of the wrong plot.
Any reason why this would be happening?
Here's the code:
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
int N = 2;
float h = 0.001;
struct t_y_couple {
float t;
float *y;
struct t_y_couple integrator_rk4(float dt, float t, float *p1);
void oscnetwork_opt(float t, float *y, float *dydt);
int main(void) {
/* initializations*/
struct t_y_couple t_y;
int i, iter, j;
// time span for which to run simulation
int tspan = 20;
// total number of time iterations = tspan*step_size
int tot_time = (int)ceil(tspan / h);
// Time array
float T[tot_time];
// pointer definitions
float *p, *q;
// vector to hold values for each differential variable for all time
// iterations
float Y[tot_time][2];
// N = total number of coupled differential equations to solve
// initial conditions vector for time = 0
Y[0][0] = 0;
Y[0][1] = 3.14;
// set the time array
T[0] = 0;
// This loop calls the RK4 code
for (i = 0; i < tot_time - 1; i++) {
p = &Y[i][0]; // current time
q = &Y[i + 1][0]; // next time step
// printf("\n\n");
// for (j=0;j<N;j++)
// call the RK4 integrator with current time value, and current
// values of voltage
t_y = integrator_rk4(h, T[i], p);
// Return the time output of integrator into the next iteration of time
T[i + 1] = t_y.t;
// copy the output of the integrator into the next iteration of voltage
q = memcpy(q, t_y.y, (2) * sizeof(float));
printf("%f ", T[i + 1]);
for (iter = 0; iter < N; iter++)
printf("%f ", *(p + iter));
return 0;
struct t_y_couple integrator_rk4(float dt, float t, float y[2]) {
// initialize all the pointers
float y1[2], y2[2], y3[2], yout[2];
float tout, dt_half;
float k1[2], k2[2], k3[2], k4[2];
// initialize iterator
int i;
struct t_y_couple ty1;
tout = t + dt;
dt_half = 0.5 * dt;
float addition[2];
// return the differential array into k1
oscnetwork_opt(t, y, k1);
// multiply the array k1 by dt_half
for (i = 0; i < 2; i++)
y1[i] = y[i] + (k1[i]) * dt_half;
// add k1 to each element of the array y
// do the same thing 3 times
oscnetwork_opt(t + dt_half, y1, k2);
for (i = 0; i < 2; i++)
y2[i] = y[i] + (k2[i]) * dt_half;
oscnetwork_opt(t + dt_half, y2, k3);
for (i = 0; i < 2; i++)
y3[i] = y[i] + (k3[i]) * dt_half;
oscnetwork_opt(tout, y3, k4);
// Make the final additions with k1,k2,k3 and k4 according to the RK4 code
for (i = 0; i < 2; i++) {
addition[i] = ((k1[i]) + (k2[i]) * 2 + (k3[i]) * 2 + (k4[i])) * dt / 6;
// add this to the original array
for (i = 0; i < 2; i++)
yout[i] = y[i] + addition[i];
// return a struct with the current time and the updated voltage array
ty1.t = tout;
ty1.y = yout;
return ty1;
// function to return the vector with coupled differential variables for each
// time iteration
void oscnetwork_opt(float t, float y[2], float *dydt) {
int i;
dydt[0] = y[1];
dydt[1] = -(1) * sin(y[0]);
You have a problem of lifetime with your variable yout in integrator_rk4(). You assign address of yout to ty1.y but you use it outside this function. This is undefined behavior.
quick fix:
struct t_y_couple {
float t;
float y[2];
struct t_y_couple integrator_rk4(float dt, float t, float y[2]) {
float y1[2], y2[2], y3[2], yout[2];
// ...
ty1.t = tout;
ty1.y[0] = yout[0];
ty1.y[1] = yout[1];
return ty1;
You have a lot of useless allocation and you made "spaghetti code" with your global variable. You should not cast the return of malloc.

Matrix multiplication with MKL

I have the CSR coordinates of a matrix.
/* alloc space for COO matrix */
int *coo_rows = (int*) malloc(K.n_rows * sizeof(int));
int *coo_cols = (int*) malloc(K.n_rows * sizeof(int));
float *coo_vals = (float*) malloc(K.n_rows * sizeof(float));
/*Load coo values*/
int *rowptrs = (int*) malloc((N_unique+1)*sizeof(int));
int *colinds = (int*) malloc(K.n_rows *sizeof(int));
double *vals = (double*) malloc(K.n_rows *sizeof(double));
/* take csr values */
int job[] = {
2, // job(1)=2 (coo->csr with sorting)
0, // job(2)=1 (one-based indexing for csr matrix)
0, // job(3)=1 (one-based indexing for coo matrix)
0, // empty
n1, // job(5)=nnz (sets nnz for csr matrix)
0 // job(6)=0 (all output arrays filled)
int info;
mkl_scsrcoo(job, &n, vals, colinds, rowptrs, &n1, coo_vals, coo_rows, coo_cols, &info);
assert(info == 0 && "Converted COO->CSR");
Now I want to apply the mkl_dcsrmm function to compute C := alpha*A*B + beta*C with beta = 0;
/* function declaration */
void mkl_dcsrmm (char *transa, MKL_INT *m, MKL_INT *n, MKL_INT *k, double *alpha, char *matdescra, double *val, MKL_INT *indx, MKL_INT *pntrb, MKL_INT *pntre, double *b, MKL_INT *ldb, double *beta, double *c, MKL_INT *ldc);
Since now I have.
int A_rows = ..., A_cols = ..., C_cols = ...
double alpha = 1.0;
mkl_dcsrmm ((char*)"N", &A_rows, &C_cols, &A_cols, &alpha, char *matdescra, vals, coo_cols, rowptrs, colinds , double *b, MKL_INT *ldb, double *beta, double *c, MKL_INT *ldc);
I found some difficulties on filling the inputs. Could you please help me to fill the rest of the inputs?
A specific input for which I want to go in more details is the matdescra. I borrowed the following code from cspblas_ccsr example
char matdescra[6];
matdescra[0] = 'g';
matdescra[1] = 'l';
matdescra[2] = 'n';
matdescra[3] = 'c';
but I have some questions about that. The matrix A I am working is not triangular and the initialization of this char array engage you to make such a declaration, how should I configure the parameters of the matdescra array?
Here is what I use, and what works for me.
char matdescra[6] = {'g', 'l', 'n', 'c', 'x', 'x'};
G: General. D: Diagonal
L/U Lower/Upper triangular (ignored with G)
N: non-unit diagonal (ignored with G)
C: zero-based indexing.
Complete Example
Here is a complete example. I first create a random matrix by filling a dense matrix with a specified density of Non-Zero elements. Then I convert it to a sparse matrix in CSR-format. Finally, I do the multiplication using mkl_dcsrmm. As a possible check (check not done), I do the same multiplication using the cblas_dgemm function with the dense matrix.
#include "mkl.h"
#include "mkl_spblas.h"
#include <stddef.h> // For NULL
#include <stdlib.h> // for rand()
#include <assert.h>
#include <stdio.h>
#include <limits.h>
// Compute C = A * B; where A is sparse and B is dense.
int main() {
MKL_INT m=10, n=5, k=11;
const double sparsity = 0.9; ///< #param sparsity Values below which are set to zero (sampled from uniform(0,1)-distribution).
double *A_dense;
double *B;
double *C;
double alpha = 1.0;
double beta = 0.0;
const int allignment = 64;
// Seed the RNG to always be the same
// Allocate memory to matrices
A_dense = (double *)mkl_malloc( m*k*sizeof( double ), allignment);
B = (double *)mkl_malloc( k*n*sizeof( double ), allignment);
C = (double *)mkl_malloc( m*n*sizeof( double ), allignment);
if (A_dense == NULL || B == NULL || C == NULL) {
printf("ERROR: Can't allocate memory for matrices. Aborting... \n\n");
return 1;
// Initializing matrix data
int i;
int nzmax = 0;
for (i = 0; i < (m*k); i++) {
double val = rand() / (double)RAND_MAX;
if ( val < sparsity ) {
A_dense[i] = 0.0;
} else {
A_dense[i] = val;
for (i = 0; i < (k*n); i++) {
B[i] = rand();
for (i = 0; i < (m*n); i++) {
C[i] = 0.0;
// Convert A to a sparse matrix in CSR format.
// INFO:
MKL_INT job[6];
job[0] = 0; // convert TO CSR.
job[1] = 0; // Zero-based indexing for input.
job[2] = 0; // Zero-based indexing for output.
job[3] = 2; // adns is a whole matrix A.
job[4] = nzmax; // Maximum number of non-zero elements allowed.
job[5] = 3; // all 3 arays are generated for output.
/* JOB: conversion parameters
* m: number of rows of A.
* k: number of columns of A.
* adns: (input/output). Array containing non-zero elements of the matrix A.
* lda: specifies the leading dimension of adns. must be at least max(1, m).
* acsr: (input/output) array containing non-zero elements of the matrix A.
* ja: array containing the column indices.
* ia length m+1, rowIndex.
* info: 0 if successful. i if interrupted at i-th row because of lack of space.
int info = -1;
printf("nzmax:\t %d\n", nzmax);
double *A_sparse = mkl_malloc(nzmax * sizeof(double), allignment);
if (A_sparse == NULL) {
printf("ERROR: Could not allocate enough space to A_sparse.\n");
return 1;
MKL_INT *A_sparse_cols = mkl_malloc(nzmax * sizeof(MKL_INT), allignment);
if (A_sparse_cols == NULL) {
printf("ERROR: Could not allocate enough space to A_sparse_cols.\n");
return 1;
MKL_INT *A_sparse_rowInd = mkl_malloc((m+1) * sizeof(MKL_INT), allignment);
if (A_sparse_rowInd == NULL) {
printf("ERROR: Could not allocate enough space to A_sparse_rowInd.\n");
return 1;
mkl_ddnscsr(job, &m, &k, A_dense, &k, A_sparse, A_sparse_cols, A_sparse_rowInd, &info);
if(info != 0) {
printf("WARNING: info=%d, expected 0.\n", info);
assert(info == 0);
char transa = 'n';
MKL_INT ldb = n, ldc=n;
char matdescra[6] = {'g', 'l', 'n', 'c', 'x', 'x'};
G: General. D: Diagonal
L/U Lower/Upper triangular (ignored with G)
N: non-unit diagonal (ignored with G)
C: zero-based indexing.
mkl_dcsrmm(&transa, &m, &n, &m, &alpha, matdescra, A_sparse, A_sparse_cols,
A_sparse_rowInd, &(A_sparse_rowInd[1]), B, &ldb, &beta, C, &ldc);
// The same computation in dense format
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, A_dense, k, B, n, beta, C, n);
return 0;

C pthread Segmentation fault

so I was trying to make a GPGPU emulator with c & pthreads but ran into a rather strange problem which I have no idea why its occurring. The code is as below:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <assert.h>
// simplifies malloc
#define MALLOC(a) (a *)malloc(sizeof(a))
// Index of x/y coordinate
#define x (0)
#define y (1)
// Defines size of a block
#define BLOCK_DIM_X (3)
#define BLOCK_DIM_Y (2)
// Defines size of the grid, i.e., how many blocks
#define GRID_DIM_X (5)
#define GRID_DIM_Y (7)
// Defines the number of threads in the grid
// execution environment for the kernel
typedef struct exec_env {
int threadIdx[2]; // thread location
int blockIdx[2];
int blockDim[2];
int gridDim[2];
float *A,*B; // parameters for the thread
float *C;
} exec_env;
// kernel
void *kernel(void *arg)
exec_env *env = (exec_env *) arg;
// compute number of threads in a block
int sz = env->blockDim[x] * env->blockDim[y];
// compute the index of the first thread in the block
int k = sz * (env->blockIdx[y]*env->gridDim[x] + env->blockIdx[x]);
// compute the index of a thread inside a block
k = k + env->threadIdx[y]*env->blockDim[x] + env->threadIdx[x];
// check whether it is in range
assert(k >= 0 && k < GRID_SIZE && "Wrong index computation");
// print coordinates in block and grid and computed index
/*printf("tx:%d ty:%d bx:%d by:%d idx:%d\n",env->threadIdx[x],
env->blockIdx[y], k);
// retrieve two operands
float *A = &env->A[k];
float *B = &env->B[k];
printf("%f %f \n",*A, *B);
// retrieve pointer to result
float *C = &env->C[k];
// do actual computation here !!!
// For assignment replace the following line with
// the code to do matrix addition and multiplication.
*C = *A + *B;
// free execution environment (not needed anymore)
return NULL;
// main function
int main(int argc, char **argv)
float A[GRID_SIZE] = {-1};
float B[GRID_SIZE] = {-1};
float C[GRID_SIZE] = {-1};
pthread_t threads[GRID_SIZE];
int i=0, bx, by, tx, ty;
//Error location
/*for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
// Step 1: create execution environment for threads and create thread
for (bx=0;bx<GRID_DIM_X;bx++) {
for (by=0;by<GRID_DIM_Y;by++) {
for (tx=0;tx<BLOCK_DIM_X;tx++) {
for (ty=0;ty<BLOCK_DIM_Y;ty++) {
exec_env *e = MALLOC(exec_env);
assert(e != NULL && "memory exhausted");
// set parameters
e->A = A;
e->B = B;
e->C = C;
// create thread
pthread_create(&threads[i++],NULL,kernel,(void *)e);
// Step 2: wait for completion of all threads
for (i=0;i<GRID_SIZE;i++) {
pthread_join(threads[i], NULL);
// Step 3: print result
for (i=0;i<GRID_SIZE;i++) {
printf("%f ",C[i]);
return 0;
Ok this code here runs fine, but as soon as I uncomment the "Error Location" (for loop which assigns A[i] = i and B[i] = i + 1, I get snapped by a segmentation fault in unix, and by these random 0s within C in cygwin. I must admit my fundamentals in C is pretty poor, so it may be highly likely that I missed something. If someone can give an idea on what's going wrong it'd be greatly appreciated. Thanks.
It works when you comment that because i is still 0 when the 4 nested loops start.
You have this:
for (i = 0; i < GRID_SIZE;i++){
A[i] = i;
B[i] = i+1;
printf("%f %f\n ", A[i], B[i]);
/* What value is `i` now ? */
And then
pthread_create(&threads[i++],NULL,kernel,(void *)e);
So pthread_create will try to access some interesting indexes indeed.

C File Input/Trapezoid Rule Program

Little bit of a 2 parter. First of all im trying to do this in all c. First of all I'll go ahead and post my program
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
#include <string.h>
double f(double x);
void Trap(double a, double b, int n, double* integral_p);
int main(int argc, char* argv[]) {
double integral=0.0; //Integral Result
double a=6, b=10; //Left and Right Points
int n; //Number of Trapezoids (Higher=more accurate)
int degree;
if (argc != 3) {
printf("Error: Invalid Command Line arguements, format:./trapezoid N filename");
n = atoi(argv[2]);
FILE *fp = fopen( argv[1], "r" );
# pragma omp parallel
Trap(a, b, n, &integral);
printf("With n = %d trapezoids....\n", n);
printf("of the integral from %f to %f = %.15e\n",a, b, integral);
return 0;
double f(double x) {
double return_val;
return_val = pow(3.0*x,5)+pow(2.5*x,4)+pow(-1.5*x,3)+pow(0*x,2)+pow(1.7*x,1)+4;
return return_val;
void Trap(double a, double b, int n, double* integral_p) {
double h, x, my_integral;
double local_a, local_b;
int i, local_n;
int my_rank = omp_get_thread_num();
int thread_count = omp_get_num_threads();
h = (b-a)/n;
local_n = n/thread_count;
local_a = a + my_rank*local_n*h;
local_b = local_a + local_n*h;
my_integral = (f(local_a) + f(local_b))/2.0;
for (i = 1; i <= local_n-1; i++) {
x = local_a + i*h;
my_integral += f(x);
my_integral = my_integral*h;
# pragma omp critical
*integral_p += my_integral;
As you can see, it calculates trapezoidal rule given an interval.
First of all it DOES work, if you hardcode the values and the function. But I need to read from a file in the format of
3.0 2.5 -1.5 0.0 1.7 4.0
6 10
Which means:
It is of degree 5 (no more than 50 ever)
3.0x^5 +2.5x^4 −1.5x^3 +1.7x+4 is the polynomial (we skip ^2 since it's 0)
and the Interval is from 6 to 10
My main concern is the f(x) function which I have hardcoded. I have NO IDEA how to make it take up to 50 besides literally typing out 50 POWS and reading in the values to see what they could be.......Anyone else have any ideas perhaps?
Also what would be the best way to read in the file? fgetc? Im not really sure when it comes to reading in C input (especially since everything i read in is an INT, is there some way to convert them?)
For a large degree polynomial, would something like this work?
double f(double x, double coeff[], int nCoeff)
double return_val = 0.0;
int exponent = nCoeff-1;
int i;
for(i=0; i<nCoeff-1; ++i, --exponent)
return_val = pow(coeff[i]*x, exponent) + return_val;
/* add on the final constant, 4, in our example */
return return_val + coeff[nCoeff-1];
In your example, you would call it like:
double coefficients[] = {3.0, 2.5, -1.5, 0, 1.7, 4};
/* This expresses 3x^5 + 2.5x^4 + (-1.5x)^3 + 0x^2 + 1.7x + 4 */
my_integral = f(x, coefficients, 6);
By passing an array of coefficients (the exponents are assumed), you don't have to deal with variadic arguments. The hardest part is constructing the array, and that is pretty simple.
It should go without saying, if you put the coefficients array and number-of-coefficients into global variables, then the signature of f(x) doesn't need to change:
double f(double x)
// access glbl_coeff and glbl_NumOfCoeffs, instead of parameters
For you f() function consider making it variadic (varargs is another name)
This way you could pass the function 1 arg telling it how many "pows" you want, with each susequent argument being a double value. Is this what you are asking for with the f() function part of your question?
