I am having an issue with "gprof" while using WSL1 and "gcc" in Ubuntu. The only information it displays is the "calls" and anything else is set as 0.00. I do not think that is because the program is running too fast because when typing time ./go it returns:
real = 0.569s; user = 0.547s; sys = 0.000s.
The main program (go.c) is:
#include <stdio.h>
#include "functions.h"
#define maxloop 1e7
int main(int argc, char*argv[]) {
int i;
double x;
double xsum = 0.0;
for (i = 1; i < maxloop; i++) {
x = myFun1(i) + myFun2(i) + myFun3(i);
xsum += x;
}
printf("xsum = %.6f\n", xsum);
return 0;
}
The file with the functions (functions.c) is:
#include <math.h>
double myFun1(double x) {
double a = sin(x);
return a;
}
double myFun2(double x){
double a = pow(x,3);
return a;
}
double myFun3(double x){
double a = sqrt(x);
return a;
}
The header (functions.h) is:
double myFun1(double x);
double myFun2(double x);
double myFun3(double x);
I am compiling in the terminal as:
gcc -pg -o go go.c functions.c -lm
Running the gprof as:
gprof ./go -p -b
Upgrade to WSL2 for support of the profiling features required by gprof.
Related
Suppose that we call main.c in another c file in the compilation unit.
I understand that this is illegal since we can only call header file. However, why does it cause a bad linking but not a bad compiling when we make build everything?
The C file that calls main should be something like this:
#include "main.c"
int add (int x, int y) { return x+y; }
}
It is generally a bad idea to #include C files, but there is nothing strictly illegal about it. The #include directive effectively copies the source code directly into the current file. What you currently have can be made to work:
$ cat main.c
#include <stdlib.h>
#include <stdio.h>
int add(int, int);
int
main(int argc, char **argv)
{
int a = argc > 1 ? strtol(argv[1], NULL, 10) : 1;
int b = argc > 2 ? strtol(argv[2], NULL, 10) : 1;
printf("%d + %d = %d\n", a, b, add(a, b));
return 0;
}
$ cat add.c
#include "main.c"
int add(int x, int y) { return x + y; }
$ cc add.c
$ ./a.out
1 + 1 = 2
$ ./a.out 3 -5
3 + -5 = -2
If you are getting linkage problems, it is probably because you are trying to do something like:
$ cc main.c add.c
Which is problematic since you now have duplicate definitions of main
Possibly what you want to do is to remove the #include from the add.c and do something like:
$ cat add1.c
int add(int x, int y) { return x + y; }
$ cc -c add1.c
$ cc main.c add1.o
I am trying to write a simple hypergeometric test in c.
This code works for small numbers but fails for large numbers.
for example: hyperG 35 50 90 3400 -- works
but hyperG ./hyperG 307107 486302 9073845 12147105 --fails
using an online debugger I get:
Program received signal SIGSEGV, Segmentation fault.
__ieee754_log_avx (x=2898563) at ../sysdeps/ieee754/dbl-64/e_log.c:76
76 ../sysdeps/ieee754/dbl-64/e_log.c: No such file or directory.
Any suggestions would be appreciated.
code follows:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/*
* program: hyperG
* purpose: Calculate a probability based on a hypergeometric distribution
* input : k, n, M, N
* output : Return a probability
*
*/
// Variables
double k = 0.0;
double n = 0.0;
double M = 0.0;
double N = 0.0;
double logchoos( double n, double k);
double logfact( double n);
//Probability of at least k out of n tries of having something
//that occurs M of of N times
double hypergeometric( double k, double n, double M, double N)
{
double p = 0.0;
int i = 0;
for(i = k; i <= n; i++){
p += exp(logchoos((N-M),(n-i)) + logchoos(M,i) - logchoos(N,n));
}
return p;
}
// compute log "n choose k" or n = n!/(n-k)!k!
double logchoos( double n, double k)
{
double result = logfact(n) - logfact(n-k) - logfact(k);
return result;
}
// Calculate the log factorial
double logfact( double n )
{
double fac;
if( n < 1)
return 0.0;
else
fac = log(n) + logfact(n-1);
return(fac);
}
int main(int argc, char *argv[])
{
// Get command line arguments
k = atof(argv[1]);
n = atof(argv[2]);
M = atof(argv[3]);
N = atof(argv[4]);
double prob = hypergeometric(k,n,M,N);
printf("%e\n",prob);
return 0;
}
My MakeFile:
Makefile
CC = gcc
CFLAGS = -Wall -g
LDFLAGS = -lm
FILES = hyperG.c
build: $(FILES)
$(CC) $(FILES) -o hyperG $(CFLAGS) $(LDFLAGS)
clean:
rm -f hyperG
As its working for smaller values but fails for larger values, its probably running out of stack, if you change recursion to while loop, it does not segfaults for 307107 486302 9073845 12147105 as input, although it takes lot of time to complete execution
double logfact( double n )
{
double fac = 0;
while(n > 1)
{
fac += log(n);
n = n-1;
}
return(fac);
}
I wrote a C code that I would like to parallelize using OpenMP (I am a beginner and I have just a few days to solve this task); let's start from the main: first of all I have initialized 6 vectors (Vx,Vy,Vz,thetap,phip,theta); then there is a for loop that cycles over Nmax; inside of this loop I allocate some memory for the structure I have defined at the very top of the code; the structure is called coll_CPU and increases its size every cycle; then I pick some of the values from the vectors I have mentioned before and I place them into the structure; so at this point my structure coll_CPU is filled with Ncoll elements; during this process I used some of the functions declared outside of the main (these functions are random number generators). Now comes the important part: in my serial code I use a for loop to pass every single element of the structure to a function called collisionCPU (this function just gets the inputs and multiplies them by 2); My goal is to parallelize this loop so that each of my CPUs gives its contribution to do this operation and speed up the process.
Here are the codes:
main.c
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <memory.h>
#include <string.h>
#include <time.h>
#include <omp.h>
#define pi2 6.283185307
#define pi 3.141592654
#define IMUL(a,b) __mul24(a,b)
typedef struct {
int seme;
} iniran;
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
float vAx;
float vAy;
float vAz;
float vBx;
float vBy;
float vBz;
float tetaAp;
float phiAp;
float tetaA;
float tetaBp;
float phiBp;
float tetaB;
float kAx;
float kAy;
float kAz;
float kBx;
float kBy;
float kBz;
int caso;
} stato_struct;
stato_struct *coll_CPU=0;
unsigned int timer;
#include "DSMC_kernel_float.c"
//=============================================================
float min(float *a, float*b){
if(*a<*b){
return *a;
}
else{
return *b;
}
}
//=============================================================
float max(float *a, float*b){
if(*a>*b){
return *a;
}
else{
return *b;
}
}
//=============================================================
float rf(int *idum){
static int iff=0;
static int inext, inextp, ma[55];
int mj, mk;
int i, k, ii;
float ret_val;
if (*idum<0 || iff==0) {
iff=1;
mj=161803398 - abs(*idum);
mj %= 1000000000;
ma[54]=mj;
mk=1;
for (i=1; i<=54; ++i){
ii=(i*21)%55;
ma[ii-1]=mk;
mk=mj-mk;
if (mk<0) {
mk += 1000000000;
}
mj= ma[ii-1];
}
for(k=1; k<=4; ++k) {
for(i=1; i<=55; ++i){
ma[i-1] -= ma[(i+30)%55];
if (ma[i-1]<0){
ma[i-1] += 1000000000;
}
}
}
inext=0;
inextp=31;
*idum=1;
}
++inext;
if (inext==56){
inext=1;
}
++inextp;
if (inextp==56){
inextp=1;
}
mj=ma[inext-1]-ma[inextp-1];
if (mj<0){
mj += 1000000000;
}
ma[inext-1]=mj;
ret_val=mj*1.0000000000000001e-9;
return ret_val;
}
//============================================================
int genk(float *kx, float *ky, float *kz, int *p2seme){
// float sqrtf(float), sinf(float), cosf(float);
extern float rf(int *);
static float phi;
*kx=rf(p2seme) * 2. -1.f;
*ky= sqrtf(1. - *kx * *kx);
phi=pi2*rf(p2seme);
*kz=*ky * sinf(phi);
*ky *= cosf(phi);
return 0;
}
//==============================================================
int main(void){
float msec_kernel;
int Np=10000, Nmax=512;
int id,jp,jcoll,Ncoll,jp1, jp2, ind;
float Vx[Np],Vy[Np],Vz[Np],teta[Np],tetap[Np],phip[Np];
float kx, ky, kz, Vrx, Vry, Vrz, scalprod, fk;
float kAx, kAy, kAz, kBx, kBy, kBz;
iniran1.seme=7593;
for(jp=1;jp<=Np;jp++){
if(jp<=Np/2){
Vx[jp-1]=2.5;
Vy[jp-1]=0;
Vz[jp-1]=0;
tetap[jp-1]=0;
phip[jp-1]=0;
teta[jp-1]=0;
}
for (Ncoll=1;Ncoll<=Nmax;Ncoll += 10){
coll_CPU=(stato_struct*) malloc(Ncoll*sizeof(stato_struct));
jcoll=0;
while (jcoll<Ncoll){
jp1=1+floorf(Np*rf(&iniran1.seme));
jp2=1+floorf(Np*rf(&iniran1.seme));
genk(&kx,&ky,&kz,&iniran1.seme);
Vrx=Vx[jp2-1]-Vx[jp1-1];
Vry=Vy[jp2-1]-Vy[jp1-1];
Vrz=Vz[jp2-1]-Vz[jp1-1];
scalprod=Vrx*kx+Vry*ky+Vrz*kz;
if (scalprod<0) {
genk(&kAx,&kAy,&kAz,&iniran1.seme);
genk(&kBx,&kBy,&kBz,&iniran1.seme);
coll_CPU[jcoll].jp1= jp1;
coll_CPU[jcoll].jp2=jp2;
coll_CPU[jcoll].kx=kx;
coll_CPU[jcoll].ky=ky;
coll_CPU[jcoll].kz=kz;
coll_CPU[jcoll].vAx=Vx[jp1-1];
coll_CPU[jcoll].vAy=Vy[jp1-1];
coll_CPU[jcoll].vAz=Vz[jp1-1];
coll_CPU[jcoll].vBx=Vx[jp2-1];
coll_CPU[jcoll].vBy=Vy[jp2-1];
coll_CPU[jcoll].vBz=Vz[jp2-1];
coll_CPU[jcoll].tetaAp=tetap[jp1-1];
coll_CPU[jcoll].phiAp=phip[jp1-1];
coll_CPU[jcoll].tetaA=teta[jp1-1];
coll_CPU[jcoll].tetaBp=tetap[jp2-1];
coll_CPU[jcoll].phiBp=phip[jp2-1];
coll_CPU[jcoll].tetaB=teta[jp2-1];
coll_CPU[jcoll].kAx=kAx;
coll_CPU[jcoll].kAy=kAy;
coll_CPU[jcoll].kAz=kAz;
coll_CPU[jcoll].kBx=kBx;
coll_CPU[jcoll].kBy=kBy;
coll_CPU[jcoll].kBz=kBz;
coll_CPU[jcoll].caso=1;
jcoll++;
}
}
clock_t t;
t = clock();
#pragma omp parallel for private(id) //HERE IS WHERE I TRIED TO DO THE PARALLELIZATION BUT WITH NO SUCCESS. WHAT DO I HAVE TO TYPE INSTEAD???
for(id=0;id<Nmax;id++){
CollisioniCPU(coll_CPU,id);
}
t = clock() - t;
msec_kernel = ((float)t*1000)/CLOCKS_PER_SEC;
printf("Tempo esecuzione kernel:%e s\n",msec_kernel*1e-03);
for (ind=0;ind<Ncoll;ind++){
if (coll_CPU[ind].caso==4)
Ncoll_eff++;
else if (coll_CPU[ind].caso==0)
Ncoll_div++;
else
Ncoll_dim++;
}
free(coll_CPU);
}
return 0;
}
DSMC_kernel_float.c
void CollisioniCPU(stato_struct *coll_CPU, int id){
float vettA[6], vettB[6];
vettA[0]=coll_CPU[id].vAx;
vettA[1]=coll_CPU[id].vAy;
vettA[2]=coll_CPU[id].vAz;
vettA[3]=coll_CPU[id].tetaAp;
vettA[4]=coll_CPU[id].phiAp;
vettA[5]=coll_CPU[id].tetaA;
vettB[0]=coll_CPU[id].vBx;
vettB[1]=coll_CPU[id].vBy;
vettB[2]=coll_CPU[id].vBz;
vettB[3]=coll_CPU[id].tetaBp;
vettB[4]=coll_CPU[id].phiBp;
vettB[5]=coll_CPU[id].tetaB;
coll_CPU[id].vAx=2*vettA[0];
coll_CPU[id].vAy=2*vettA[1];
coll_CPU[id].vAz=2*vettA[2];
coll_CPU[id].tetaAp=2*vettA[3];
coll_CPU[id].phiAp=2*vettA[4];
coll_CPU[id].tetaA=2*vettA[5];
coll_CPU[id].vBx=2*vettB[0];
coll_CPU[id].vBy=2*vettB[1];
coll_CPU[id].vBz=2*vettB[2];
coll_CPU[id].tetaBp=2*vettB[3];
coll_CPU[id].phiBp=2*vettB[4];
coll_CPU[id].tetaB=2*vettB[5];
}
In order to compile the program I type this line on the terminal: gcc -fopenmp time_analysis.c -o time_analysis -lm fallowed by export OMP_NUM_THREADS=1; however once I run the executable I get this error message:
Error in `./time_analysis': double free or corruption (!prev): 0x00000000009602c0 ***
Aborted
What does this error mean? what I have done wrong in the main function when I tried to parallelize the for loop? and most important: what should I type instead in order to make my code go on parallel? please help me out if you can because I seriously have no time to study OpenMP from scratch and I need to get this job done right away.
Changing the inner loop as follows should bring you one step further.
#pragma omp parallel for private(id)
for(id=0;id<Ncoll;id++){
CollisioniCPU(coll_CPU,id);
}
Your OpenMP line seems okay, but I doubt that it will lead to significant improvements in runtime. You should optimize the surrounding code as well. Allocating the memory once outside of your loops would be a good start.
By the way, is there any reason for this verbose coding style and not using a more compact and readable version as this one?
void CollisioniCPU(stato_struct *coll_CPU, int id) {
stato_struct *ptr = coll_CPU + id;
ptr->vAx *= 2;
ptr->vAy *= 2;
ptr->vAz *= 2;
ptr->tetaAp *= 2;
ptr->phiAp *= 2;
ptr->tetaA *= 2;
ptr->vBx *= 2;
ptr->vBy *= 2;
ptr->vBz *= 2;
ptr->tetaBp *= 2;
ptr->phiBp *= 2;
ptr->tetaB *= 2;
}
i have been trying for hours and it drives me crazy. The last error I get is :
demo_cblas.c:(.text+0x83): undefined reference to `clapack_sgetrf'
demo_cblas.c:(.text+0xa3): undefined reference to `clapack_sgetri'
I am compiling the code using
/usr/bin/gcc -o demo_cblas demo_cblas.c -L /usr/lib64 -l :libgfortran.so.3 -L /usr/lib64 \
-llapack -L /usr/lib64 -lblas
I try with and without libgfortran, with different compilers gcc-33, gcc-47, gcc-48. The test code is not from me but comes from this forum ...
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include "clapack.h"
#include "cblas.h"
void invertMatrix(float *a, unsigned int height){
int info, ipiv[height];
info = clapack_sgetrf(CblasColMajor, height, height, a, height, ipiv);
info = clapack_sgetri(CblasColMajor, height, a, height, ipiv);
}
void displayMatrix(float *a, unsigned int height, unsigned int width)
{
int i, j;
for(i = 0; i < height; i++){
for(j = 0; j < width; j++)
{
printf("%1.3f ", a[height*j + i]);
}
printf("\n");
}
printf("\n");
}
int main(int argc, char *argv[])
{
int i;
float a[9], b[9], c[9];
srand(time(NULL));
for(i = 0; i < 9; i++)
{
a[i] = 1.0f*rand()/RAND_MAX;
b[i] = a[i];
}
displayMatrix(a, 3, 3);
return 0;
}
I am on Suse 12.3 64bits. In /usr/lib64 I have liblapack.a liblapack.so, ... and libblas.a libblas.so, ... and libgfortran.so.3
The same code without the function "invertMatrix" (the one using the library) compiles fine.
Any idea or suggestion ?
Thank you all for your help.
Vava
I'm quite positive that you also need to link to libcblas, which is the c wrapper library for libblas. Note that libblas is a FORTRAN library which therefore does not contain the function clapack_* you're calling.
I've just got this working on FreeBSD with:
gcc -o test test.c \
-llapack -lblas -lalapack -lcblas
I'd installed math/atlas (from ports) and the lapack and blas packages.
See my question here
when I debug it.
Eclipse cdt Variables show "i" is 2 but 2.0000000.
#include <stdio.h>
int main()
{
float i = 2.0;
float j = 2.1;
return 0;
}