I keep getting this error for >6 hours now when trying to compile C-code with -fopenmp flag using gcc.
error: invalid controlling predicate
for ( int i = 0; i < N; i++ )
I browsed stackoverflow and I stripped down my code up till the point where it is an exact copy of an example from an OpenMP handbook, but still it doesn't compile.
#include <stdio.h>
#include <math.h>
#ifdef _OPENMP
#include <omp.h>
#endif
int main(int argc, char *argv[]) {
double N; sscanf (argv[1]," %lf", &N);
double integral = 0.0;
#pragma omp parallel for reduction(+: integral)
for ( int i = 0; i < N; i++ )
integral = integral + i;
printf("%20.18lf\n", integral);
return 0;
}
Any suggestions..?
Found it, sorry for the clutter..
To all other C newbies like myself: The error was in the double N. OpenMP wants your loop to run op to an INTEGER N, and not a double.
Related
I'm writing some simple example to understand how the things work with OpenMP programs.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <omp.h>
int main (int argc ,char* argv[]){
omp_set_num_threads(4);
int j =0;
#pragma omp parallel private (j)
{
int i;
for(i=1;i<2;i++){
printf("from thread %d : i is equel to %d and j is equal to %d\n ",omp_get_thread_num(),i,j);
}
}
}
So in this example I should get j=0 each time,
unfortunately the result is j == 0 3 times , and j == 32707 one time.
What is wrong with my example?
Use firstprivate(j) rather than private(j) if you want that each thread has a private copy of j with the initial value being the value before entering the parallel region.
I noticed that two different ways of initializing an array in C seems to result in very different running time after compiling with O3 optimization. Here is a minimum (albeit meaningless) example to replicate such difference:
#include <stdio.h>
#include <time.h>
int main(void) {
int i, j, k;
int size=10000;
int a[size];
clock_t time1 = clock();
for (i=0; i<size; i++) {
for (j=0; j<300000; j++) {
for (k=0; k<700000; k++) {
a[i] = j+k;
}
}
}
clock_t time2 = clock();
double time = (double)(time2-time1)/CLOCKS_PER_SEC*1000.0;
printf("%f\n", time);
getchar();
return 0;
}
Compile this program with gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 with O3 optimization turned on. This program takes about 0.02s to finish on my computer.
Now, change the array initialization from "int a[size];" to "static int a[10000];" and keep everything else the same. Again compile with the same environment and O3 optimization. This time, the program runs for about 0.001s.
Can anyone explain why there is such a different? Thanks!
I think this largely depends on compiler. My GCC 5.4 completely removes the loop when static is present, probly because it can figure out that computations have no side-effects ("dead code elimination"). For some reason it fails to do so when VLA is present (that's a missing optimization).
As a side note, to reliably measure performance you need to prevent compiler from optimizing too much. In your case I'd suggest to separate array creation and computations e.g. like
void __attribute__((noinline, noclone)) benchmark(int *a, int size) {
for (i=0; i<size; i++)
for (j=0; j<300000; j++)
for (k=0; k<700000; k++)
a[i] = j+k;
}
int main(void) {
int i, j, k;
int size=10000;
int a[size];
clock_t time1 = clock();
benchmark(a, size);
clock_t time2 = clock();
double time = (double)(time2-time1)/CLOCKS_PER_SEC*1000.0;
printf("%f\n", time);
getchar();
return 0;
}
I have a function, which adds the given arguments and prints the result.
With integer numbers, there were no problems at all. Used atoi to change string argument -> int.
e.g. : ./main 3 4 5 will print 12.
But if I have ./main 4.5 6 5.5 ?how do I do something like this in C? How can the function "see", that it has to change the argument types now to float?
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char* argv[] )
{
int i , sum = 0;
for(i=1; i < (argc); ++i)
sum += atol(argv[i]);
printf("%d\n", sum);
return 0;
}
In c, there is no function overloading as in c++, thus you should use atof, like this:
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char* argv[] )
{
int i;
double sum = 0;
for(i = 1; i < (argc); ++i)
sum += atof(argv[i]);
printf("%f\n", sum);
return 0;
}
to treat numbers as reals, not integers.
Output:
gsamaras#gsamaras-A15:~$ ./a.out 4.5 6 5.5
16.000000
since now 6 is treated like 6.0.
You might want to read this as well: How to convert string to float?
I have tested the code below. It will print the float number upto 2 decimal places.
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char* argv[] )
{
int i;
double sum = 0;
for(i=1; i<argc; i++)
sum += atof(argv[i]);
printf("%.2f\n", sum);
return 0;
}
You should use double to store floating point numbers, atof to parse strings and the %f printf specifier.
Although I get an implicit declaration warning for strtod (because the linux manual doesn't tell me the correct includes to use), this code below does work:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int i;
double sum=0;
for(i=1; i < argc; ++i)
sum += strtod(argv[i],NULL);
printf("%f\n", sum);
return 0;
}
The manual also states the following as an issue with using atoi():
The atoi() function converts the initial portion of the string pointed to by nptr to int.
The behavior is the same as
strtol(nptr, (char **) NULL, 10);
except that atoi() does not detect errors.
I wrote a C code that I would like to parallelize using OpenMP (I am a beginner and I have just a few days to solve this task); let's start from the main: first of all I have initialized 6 vectors (Vx,Vy,Vz,thetap,phip,theta); then there is a for loop that cycles over Nmax; inside of this loop I allocate some memory for the structure I have defined at the very top of the code; the structure is called coll_CPU and increases its size every cycle; then I pick some of the values from the vectors I have mentioned before and I place them into the structure; so at this point my structure coll_CPU is filled with Ncoll elements; during this process I used some of the functions declared outside of the main (these functions are random number generators). Now comes the important part: in my serial code I use a for loop to pass every single element of the structure to a function called collisionCPU (this function just gets the inputs and multiplies them by 2); My goal is to parallelize this loop so that each of my CPUs gives its contribution to do this operation and speed up the process.
Here are the codes:
main.c
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <memory.h>
#include <string.h>
#include <time.h>
#include <omp.h>
#define pi2 6.283185307
#define pi 3.141592654
#define IMUL(a,b) __mul24(a,b)
typedef struct {
int seme;
} iniran;
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
float vAx;
float vAy;
float vAz;
float vBx;
float vBy;
float vBz;
float tetaAp;
float phiAp;
float tetaA;
float tetaBp;
float phiBp;
float tetaB;
float kAx;
float kAy;
float kAz;
float kBx;
float kBy;
float kBz;
int caso;
} stato_struct;
stato_struct *coll_CPU=0;
unsigned int timer;
#include "DSMC_kernel_float.c"
//=============================================================
float min(float *a, float*b){
if(*a<*b){
return *a;
}
else{
return *b;
}
}
//=============================================================
float max(float *a, float*b){
if(*a>*b){
return *a;
}
else{
return *b;
}
}
//=============================================================
float rf(int *idum){
static int iff=0;
static int inext, inextp, ma[55];
int mj, mk;
int i, k, ii;
float ret_val;
if (*idum<0 || iff==0) {
iff=1;
mj=161803398 - abs(*idum);
mj %= 1000000000;
ma[54]=mj;
mk=1;
for (i=1; i<=54; ++i){
ii=(i*21)%55;
ma[ii-1]=mk;
mk=mj-mk;
if (mk<0) {
mk += 1000000000;
}
mj= ma[ii-1];
}
for(k=1; k<=4; ++k) {
for(i=1; i<=55; ++i){
ma[i-1] -= ma[(i+30)%55];
if (ma[i-1]<0){
ma[i-1] += 1000000000;
}
}
}
inext=0;
inextp=31;
*idum=1;
}
++inext;
if (inext==56){
inext=1;
}
++inextp;
if (inextp==56){
inextp=1;
}
mj=ma[inext-1]-ma[inextp-1];
if (mj<0){
mj += 1000000000;
}
ma[inext-1]=mj;
ret_val=mj*1.0000000000000001e-9;
return ret_val;
}
//============================================================
int genk(float *kx, float *ky, float *kz, int *p2seme){
// float sqrtf(float), sinf(float), cosf(float);
extern float rf(int *);
static float phi;
*kx=rf(p2seme) * 2. -1.f;
*ky= sqrtf(1. - *kx * *kx);
phi=pi2*rf(p2seme);
*kz=*ky * sinf(phi);
*ky *= cosf(phi);
return 0;
}
//==============================================================
int main(void){
float msec_kernel;
int Np=10000, Nmax=512;
int id,jp,jcoll,Ncoll,jp1, jp2, ind;
float Vx[Np],Vy[Np],Vz[Np],teta[Np],tetap[Np],phip[Np];
float kx, ky, kz, Vrx, Vry, Vrz, scalprod, fk;
float kAx, kAy, kAz, kBx, kBy, kBz;
iniran1.seme=7593;
for(jp=1;jp<=Np;jp++){
if(jp<=Np/2){
Vx[jp-1]=2.5;
Vy[jp-1]=0;
Vz[jp-1]=0;
tetap[jp-1]=0;
phip[jp-1]=0;
teta[jp-1]=0;
}
for (Ncoll=1;Ncoll<=Nmax;Ncoll += 10){
coll_CPU=(stato_struct*) malloc(Ncoll*sizeof(stato_struct));
jcoll=0;
while (jcoll<Ncoll){
jp1=1+floorf(Np*rf(&iniran1.seme));
jp2=1+floorf(Np*rf(&iniran1.seme));
genk(&kx,&ky,&kz,&iniran1.seme);
Vrx=Vx[jp2-1]-Vx[jp1-1];
Vry=Vy[jp2-1]-Vy[jp1-1];
Vrz=Vz[jp2-1]-Vz[jp1-1];
scalprod=Vrx*kx+Vry*ky+Vrz*kz;
if (scalprod<0) {
genk(&kAx,&kAy,&kAz,&iniran1.seme);
genk(&kBx,&kBy,&kBz,&iniran1.seme);
coll_CPU[jcoll].jp1= jp1;
coll_CPU[jcoll].jp2=jp2;
coll_CPU[jcoll].kx=kx;
coll_CPU[jcoll].ky=ky;
coll_CPU[jcoll].kz=kz;
coll_CPU[jcoll].vAx=Vx[jp1-1];
coll_CPU[jcoll].vAy=Vy[jp1-1];
coll_CPU[jcoll].vAz=Vz[jp1-1];
coll_CPU[jcoll].vBx=Vx[jp2-1];
coll_CPU[jcoll].vBy=Vy[jp2-1];
coll_CPU[jcoll].vBz=Vz[jp2-1];
coll_CPU[jcoll].tetaAp=tetap[jp1-1];
coll_CPU[jcoll].phiAp=phip[jp1-1];
coll_CPU[jcoll].tetaA=teta[jp1-1];
coll_CPU[jcoll].tetaBp=tetap[jp2-1];
coll_CPU[jcoll].phiBp=phip[jp2-1];
coll_CPU[jcoll].tetaB=teta[jp2-1];
coll_CPU[jcoll].kAx=kAx;
coll_CPU[jcoll].kAy=kAy;
coll_CPU[jcoll].kAz=kAz;
coll_CPU[jcoll].kBx=kBx;
coll_CPU[jcoll].kBy=kBy;
coll_CPU[jcoll].kBz=kBz;
coll_CPU[jcoll].caso=1;
jcoll++;
}
}
clock_t t;
t = clock();
#pragma omp parallel for private(id) //HERE IS WHERE I TRIED TO DO THE PARALLELIZATION BUT WITH NO SUCCESS. WHAT DO I HAVE TO TYPE INSTEAD???
for(id=0;id<Nmax;id++){
CollisioniCPU(coll_CPU,id);
}
t = clock() - t;
msec_kernel = ((float)t*1000)/CLOCKS_PER_SEC;
printf("Tempo esecuzione kernel:%e s\n",msec_kernel*1e-03);
for (ind=0;ind<Ncoll;ind++){
if (coll_CPU[ind].caso==4)
Ncoll_eff++;
else if (coll_CPU[ind].caso==0)
Ncoll_div++;
else
Ncoll_dim++;
}
free(coll_CPU);
}
return 0;
}
DSMC_kernel_float.c
void CollisioniCPU(stato_struct *coll_CPU, int id){
float vettA[6], vettB[6];
vettA[0]=coll_CPU[id].vAx;
vettA[1]=coll_CPU[id].vAy;
vettA[2]=coll_CPU[id].vAz;
vettA[3]=coll_CPU[id].tetaAp;
vettA[4]=coll_CPU[id].phiAp;
vettA[5]=coll_CPU[id].tetaA;
vettB[0]=coll_CPU[id].vBx;
vettB[1]=coll_CPU[id].vBy;
vettB[2]=coll_CPU[id].vBz;
vettB[3]=coll_CPU[id].tetaBp;
vettB[4]=coll_CPU[id].phiBp;
vettB[5]=coll_CPU[id].tetaB;
coll_CPU[id].vAx=2*vettA[0];
coll_CPU[id].vAy=2*vettA[1];
coll_CPU[id].vAz=2*vettA[2];
coll_CPU[id].tetaAp=2*vettA[3];
coll_CPU[id].phiAp=2*vettA[4];
coll_CPU[id].tetaA=2*vettA[5];
coll_CPU[id].vBx=2*vettB[0];
coll_CPU[id].vBy=2*vettB[1];
coll_CPU[id].vBz=2*vettB[2];
coll_CPU[id].tetaBp=2*vettB[3];
coll_CPU[id].phiBp=2*vettB[4];
coll_CPU[id].tetaB=2*vettB[5];
}
In order to compile the program I type this line on the terminal: gcc -fopenmp time_analysis.c -o time_analysis -lm fallowed by export OMP_NUM_THREADS=1; however once I run the executable I get this error message:
Error in `./time_analysis': double free or corruption (!prev): 0x00000000009602c0 ***
Aborted
What does this error mean? what I have done wrong in the main function when I tried to parallelize the for loop? and most important: what should I type instead in order to make my code go on parallel? please help me out if you can because I seriously have no time to study OpenMP from scratch and I need to get this job done right away.
Changing the inner loop as follows should bring you one step further.
#pragma omp parallel for private(id)
for(id=0;id<Ncoll;id++){
CollisioniCPU(coll_CPU,id);
}
Your OpenMP line seems okay, but I doubt that it will lead to significant improvements in runtime. You should optimize the surrounding code as well. Allocating the memory once outside of your loops would be a good start.
By the way, is there any reason for this verbose coding style and not using a more compact and readable version as this one?
void CollisioniCPU(stato_struct *coll_CPU, int id) {
stato_struct *ptr = coll_CPU + id;
ptr->vAx *= 2;
ptr->vAy *= 2;
ptr->vAz *= 2;
ptr->tetaAp *= 2;
ptr->phiAp *= 2;
ptr->tetaA *= 2;
ptr->vBx *= 2;
ptr->vBy *= 2;
ptr->vBz *= 2;
ptr->tetaBp *= 2;
ptr->phiBp *= 2;
ptr->tetaB *= 2;
}
im a 1st grader when it comes to c and need help with storing 5 random values in an array and outputting them. Heres where am at.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
struct score_card {int A_ones; int B_twos; int C_threes; int D_fours; int E_fives; int F_sixes; int G_chance;};
int dice_rolls[5];
int randomize(void);
int value;
int main(void) {
struct score_card test;
randomize;
int i;
for(i = 0; i <= 4; i++){
printf("%d\n", dice_rolls[i]);
}
printf("\n");
return 0;
}
int randomize(void){
int i;
srand(time(0));
for(i = 0; i <= 4; i++){
value = rand() % 6 + 1;
dice_rolls[i] = value;
}
}
The output is :
6294304
6294308
6294312
6294316
6294320
the goal was to use modular division to get values from 1 -->6 and store them in the dicerolls array.
I see two immediate problems.
First. you're not terminating your random numbers with a newline. That's why they're all strung together in a big sequence. Change your output line to:
printf("%d\n", &dice_rolls[i]);
Secondly, you're not actually calling randomize. The correct way to call it is with:
randomize();
The statement randomize; is simply an expression giving you the address of the function. It's as useless in this case as the expression 42; which also does nothing. However it's valid C so the compiler doesn't necessarily complain.