Related
I am using Ruby Fiddle to access a C function to perform some heavy calculations. The C function works perfectly well when called directly, but when used via Fiddle it returns various rows of nans and infs in an unpredictable way. The function operates on matrices that are passed as pointers to arrays.
I have debugged the C code and everything works fine. I have also saved the various parameters passed to the C function to file to be sure that Fiddle was not passing some weird values, but there is no obvious (at least for me) problem.
Additionally it seems that for 'smaller' matrices this does not happen.
Apologies in advance for the code being very long, but this is the only way to exactly reproduce what it is happening. Data for testing is in this file. (gist). You can just copy and paste the Ruby and C test data to the code below.
If it helps, I am working on macos Catalina and Ruby 2.2.4 (requirement).
The C code is the following:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
double *mrt(
double *person_total_shortwave,
int rows_pts, int cols_pts,
double *dry_bulb_temperatures,
double *ground_temperatures,
double *atmospheric_longwave,
double *sky_view_factors,
double *shading_view_factors_matrix,
int rows_svf, int cols_svf,
double *shading_temperatures_matrix,
int rows_st, int cols_st,
int size_dbt,
double person_emissivity_shortwave, double person_emissivity_longwave,
double surroundings_emissivity, double ground_emissivity, double ground_person_view_factor);
int save_to_file(char *filename, int m, int n, double *mat)
{
FILE *fp;
int i, j;
if ((fp = freopen(filename, "w", stdout)) == NULL)
{
printf("Cannot open file.\n");
exit(1);
}
for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
if (j == n - 1)
printf("%.17g \n", mat[i * n + j]);
else
printf("%.17g ", mat[i * n + j]);
fclose(fp);
return 0;
}
int main(void)
{
// REPLACE WITH C DATA FROM FILE
double *mrt_result;
mrt_result = mrt(person_total_shortwave[0],
rows_pts, cols_pts,
dry_bulb_temperatures,
ground_temperatures,
atmospheric_longwave,
sky_view_factors,
shading_view_factors_matrix[0],
rows_svf, cols_svf,
shading_temperatures_matrix[0],
rows_st, cols_st,
size_dbt,
person_emissivity_shortwave, person_emissivity_longwave,
surroundings_emissivity, ground_emissivity, ground_person_view_factor);
// save_to_file(rows_pts, cols_pts, mrt_result);
return 0;
}
// https://www.dropbox.com/s/ix06edrrctad421/Calculation%20of%20Mean%20Radiant%20Temperature3.odt?dl=0
double *mrt(
double *person_total_shortwave,
int rows_pts, int cols_pts,
double *dry_bulb_temperatures,
double *ground_temperatures,
double *atmospheric_longwave,
double *sky_view_factors,
double *shading_view_factors_matrix,
int rows_svf, int cols_svf,
double *shading_temperatures_matrix,
int rows_st, int cols_st,
int size_dbt,
double person_emissivity_shortwave, double person_emissivity_longwave,
double surroundings_emissivity, double ground_emissivity, double ground_person_view_factor)
{
save_to_file("PTS.txt", rows_pts, cols_pts, person_total_shortwave);
save_to_file("SVF.txt", 1, cols_svf, sky_view_factors);
save_to_file("SVSM.txt", rows_svf, cols_svf, shading_view_factors_matrix);
save_to_file("STM.txt", rows_st, cols_st, shading_temperatures_matrix);
double *mrt_mat = (double *)calloc(rows_pts * cols_pts, sizeof(double));
save_to_file("MRT_MAT0.txt", rows_pts, cols_pts, mrt_mat);
int t, c, k;
double sigma = 5.67E-8;
double body_area = 1.94;
double tmrt4_shortwave, tmrt4_longwave_ground, tmrt4_atm_longwave, tmrt4_surroundings;
double tmrt4_shading[rows_svf];
memset(tmrt4_shading, 0.0, rows_svf * sizeof(double));
double tmrt4;
double surroundings_view_factor;
// t runs through the timesteps
// c runs through the points in the mesh
for (t = 0; t < rows_pts; t++)
{
for (c = 0; c < cols_pts; c++)
{
tmrt4_shortwave = 1.0 / (sigma * body_area) * (person_emissivity_shortwave / person_emissivity_longwave) * person_total_shortwave[t * cols_pts + c];
// We are assuming that the ground is at ambient temperature
tmrt4_longwave_ground = ground_person_view_factor * ground_emissivity * pow((273.15 + ground_temperatures[t]), 4);
// Here we are using the actual long wave radiation from the sky
tmrt4_atm_longwave = (1.0 - ground_person_view_factor) / sigma * sky_view_factors[c] * atmospheric_longwave[t];
// We need to remove the contribution of all the shading devices
// k runs through the shading devices
surroundings_view_factor = 1.0 - sky_view_factors[c];
for (k = 0; k < rows_svf; k++)
{
surroundings_view_factor -= shading_view_factors_matrix[k * cols_svf + c];
}
tmrt4_surroundings = (1.0 - ground_person_view_factor) * surroundings_view_factor * surroundings_emissivity * pow((273.15 + dry_bulb_temperatures[t] - 0.0), 4);
// We now need to account for all contributions of the shading devices
for (k = 0; k < rows_svf; k++)
{
tmrt4_shading[k] = (1.0 - ground_person_view_factor) * (shading_view_factors_matrix[k * cols_svf + c]) * surroundings_emissivity * pow((273.15 + shading_temperatures_matrix[k * cols_svf + t]), 4);
}
// Finally we add them all (see paper) for the total contribution
tmrt4 = tmrt4_shortwave + tmrt4_longwave_ground + tmrt4_atm_longwave + tmrt4_surroundings;
for (k = 0; k < rows_svf; k++)
{
tmrt4 += tmrt4_shading[k];
}
// Just convert to celsius
mrt_mat[t * cols_pts + c] = pow(tmrt4, 0.25) - 273.15;
}
}
save_to_file("MRT_MAT.txt", rows_pts, cols_pts, mrt_mat);
// double x = 1.5;
// while (1)
// {
// x *= sin(x) / atan(x) * tanh(x) * sqrt(x);
// }
return mrt_mat;
}
and I compile is with clang -g --extra-warnings utils.c -o utils.
The Ruby code is the following
require "fiddle"
require "fiddle/import"
module RG
extend Fiddle::Importer
#handler.handlers.each { |h| h.close unless h.close_enabled? } unless #handler.nil?
GC.start
dlload File.join("utils")
extern "double* mrt(double*, int, int, double*, double*, double*, double*, double*, int, int, double*, int, int, int, double, double, double, double, double)"
def self.mat_to_ptr(matrix)
return Fiddle::Pointer[matrix.flatten.pack("E*")]
end
def self.ptr_to_mat(ptr, rows, cols)
length = rows * cols * Fiddle::SIZEOF_DOUBLE
mat = ptr[0, length]
return mat.unpack("E*").each_slice(cols).to_a
end
def self.mean_radiant_temperature(
person_total_shortwave,
dry_bulb_temperatures,
ground_temperatures,
atmospheric_longwave,
sky_view_factors,
shading_view_factors_matrix,
shading_temperatures_matrix,
person_emissivity_shortwave,
person_emissivity_longwave,
surroundings_emissivity,
ground_emissivity,
ground_person_view_factor
)
rows_pts = person_total_shortwave.size
cols_pts = person_total_shortwave[0].size
person_total_shortwave_pointer = RG.mat_to_ptr(person_total_shortwave)
dry_bulb_temperatures_pointer = RG.mat_to_ptr(dry_bulb_temperatures)
ground_temperatures_pointer = RG.mat_to_ptr(ground_temperatures)
size_dbt = dry_bulb_temperatures.size
atmospheric_longwave_pointer = RG.mat_to_ptr(atmospheric_longwave)
sky_view_factors_pointer = RG.mat_to_ptr(sky_view_factors)
rows_svf = shading_view_factors_matrix.size
if rows_svf > 0
cols_svf = shading_view_factors_matrix[0].size
else
cols_svf = 0
end
shading_view_factors_matrix_pointer = RG.mat_to_ptr(shading_view_factors_matrix)
rows_st = shading_temperatures_matrix.size
if rows_st > 0
cols_st = shading_temperatures_matrix[0].size
else
cols_st = 0
end
shading_temperatures_matrix_pointer = RG.mat_to_ptr(shading_temperatures_matrix)
mrt_pointer = mrt(
person_total_shortwave_pointer,
rows_pts, cols_pts,
dry_bulb_temperatures_pointer,
ground_temperatures_pointer,
atmospheric_longwave_pointer,
sky_view_factors_pointer,
shading_view_factors_matrix_pointer,
rows_svf, cols_svf,
shading_temperatures_matrix_pointer,
rows_st, cols_st,
size_dbt,
person_emissivity_shortwave,
person_emissivity_longwave,
surroundings_emissivity,
ground_emissivity,
ground_person_view_factor
)
return RG.ptr_to_mat(mrt_pointer, rows_pts, cols_pts)
end
end
// REPLACE WITH RUBY DATA FROM FILE
mean_radiant_temperatures = RG.mean_radiant_temperature(
person_total_shortwave,
dry_bulb_temperatures,
ground_temperatures,
atmospheric_longwave,
sky_view_factors,
shading_view_factors_matrix,
shading_temperatures_matrix,
person_emissivity_shortwave,
person_emissivity_longwave,
surroundings_emissivity,
ground_emissivity,
ground_person_view_factor
)
File.open("MRT_MAT_RUBY.txt", "w") do |f|
mean_radiant_temperatures.each do |row|
f.puts row.join(' ')
end
end
If you want to test it, first launch ./utils. It will save a few files on the local folder. Have a look at MRT_MAT.txt.
Now launch the ruby code several times. It will generate the same file, but you will notice that "often" the file will contain random rows with nan and inf. The same data is returned to Ruby and saved in the file MRT_MAT_RUBY.txt in the local directory.
I am quite familiar with Ruby, but C is not really my strength. It would be great being able to debug the C code when called from Ruby, but I don't really know how to do it.
I guess the string created by pack method in mat_to_ptr method is collected by GC.
You should keep the string until pte_to_mat is called.
I am fairly new to C and how arrays and memory allocation works. I'm solving a very simple function right now, vector_average(), which computes the mean value between two successive array entries, i.e., the average between (i) and (i + 1). This average function is the following:
void
vector_average(double *cc, double *nc, int n)
{
//#pragma omp parallel for
double tbeg ;
double tend ;
tbeg = Wtime() ;
for (int i = 0; i < n; i++) {
cc[i] = .5 * (nc[i] + nc[i+1]);
}
tend = Wtime() ;
printf("vector_average() took %g seconds\n", tend - tbeg);
}
My goal is to set int n extremely high, to the point where it actually takes some time to complete this loop (hence, why I am tracking wall time in this code). I'm passing this function a random test function of x, f(x) = sin(x) + 1/3 * sin(3 x), denoted in this code as x_nc, in main() in the following form:
int
main(int argc, char **argv)
{
int N = 1.E6;
double x_nc[N+1];
double dx = 2. * M_PI / N;
for (int i = 0; i <= N; i++) {
double x = i * dx;
x_nc[i] = sin(x) + 1./3. * sin(3.*x);
}
double x_cc[N];
vector_average(x_cc, x_nc, N);
}
But my problem here is that if I set int N any higher than 1.E5, it segfaults. Please provide any suggestions for how I might set N much higher. Perhaps I have to do something with malloc, but, again, I am new to all of this stuff and I'm not quite sure how I would implement this.
-CJW
A function only has 1M stack memory on Windows or other system. Obviously, the size of temporary variable 'x_nc' is bigger than 1M. So, you should use heap to save data of x_nc:
int
main(int argc, char **argv)
{
int N = 1.E6;
double* x_nc = (double*)malloc(sizeof(dounble)*(N+1));
double dx = 2. * M_PI / N;
for (int i = 0; i <= N; i++) {
double x = i * dx;
x_nc[i] = sin(x) + 1./3. * sin(3.*x);
}
double* x_cc = (double*)malloc(sizeof(double)*N);
vector_average(x_cc, x_nc, N);
free(x_nc);
free(x_cc);
return 0;
}
I have known that when encountered with segmentation fault 11, it means the program has attempted to access an area of memory that it is not allowed to access.
Here I am trying to calculate a Fourier transform, using the following code.
It works well when nPoints = 2^15 (or of course with less points) , however it corrupts when I further increase the points to 2^16. I am wondering, is that caused by occupying too much memory? But I did not notice too much memory occupation during the operation. And although it use recursion, it transforms in-place. I thought it would occupy not so much memory. Then, where's the problem?
Thanks in advance
PS: one thing I forgot to say is, the result above was on Max OS (8G memory).
When I running the code on Windows (16G memory), it corrupts when nPoints = 2^14. So it makes me confused whether it's caused by the memory allocation, as the Windows PC has a larger memory (but it's really hard to say, because the two operation systems utilize different memory strategy).
#include <stdio.h>
#include <tgmath.h>
#include <string.h>
// in place FFT with O(n) memory usage
long double PI;
typedef long double complex cplx;
void _fft(cplx buf[], cplx out[], int n, int step)
{
if (step < n) {
_fft(out, buf, n, step * 2);
_fft(out + step, buf + step, n, step * 2);
for (int i = 0; i < n; i += 2 * step) {
cplx t = exp(-I * PI * i / n) * out[i + step];
buf[i / 2] = out[i] + t;
buf[(i + n)/2] = out[i] - t;
}
}
}
void fft(cplx buf[], int n)
{
cplx out[n];
for (int i = 0; i < n; i++) out[i] = buf[i];
_fft(buf, out, n, 1);
}
int main()
{
const int nPoints = pow(2, 15);
PI = atan2(1.0l, 1) * 4;
double tau = 0.1;
double tSpan = 12.5;
long double dt = tSpan / (nPoints-1);
long double T[nPoints];
cplx At[nPoints];
for (int i = 0; i < nPoints; ++i)
{
T[i] = dt * (i - nPoints / 2);
At[i] = exp( - T[i]*T[i] / (2*tau*tau));
}
fft(At, nPoints);
return 0;
}
You cannot allocate very large arrays in the stack. The default stack size on macOS is 8 MiB. The size of your cplx type is 32 bytes, so an array of 216 cplx elements is 2 MiB, and you have two of them (one in main and one in fft), so that is 4 MiB. That fits on the stack, but, at that size, the program runs to completion when I try it. At 217, it fails, which makes sense because then the program has two arrays taking 8 MiB on stack. The proper way to allocate such large arrays is to include <stdlib.h> and use cmplx *At = malloc(nPoints * sizeof *At); followed by if (!At) { /* Print some error message about being unable to allocate memory and terminate the program. */ }. You should do that for At, T, and out. Also, when you are done with each array, you should free it, as with free(At);.
To calculate an integer power of two, use the integer operation 1 << power, not the floating-point operation pow(2, 16). We have designed pow well on macOS, but, on other systems, it may return approximations even when exact results are possible. An approximate result may be slightly less than the exact integer value, so converting it to an integer truncates to the wrong result. If it may be a power of two larger than suitable for an int, then use (type) 1 << power, where type is a suitably large integer type.
the following, instrumented, code clearly shows that the OPs code repeatedly updates the same locations in the out[] array and actually does not update most of the locations in that array.
#include <stdio.h>
#include <tgmath.h>
#include <assert.h>
// in place FFT with O(n) memory usage
#define N_POINTS (1<<15)
double T[N_POINTS];
double At[N_POINTS];
double PI;
// prototypes
void _fft(double buf[], double out[], int step);
void fft( void );
int main( void )
{
PI = 3.14159;
double tau = 0.1;
double tSpan = 12.5;
double dt = tSpan / (N_POINTS-1);
for (int i = 0; i < N_POINTS; ++i)
{
T[i] = dt * (i - (N_POINTS / 2));
At[i] = exp( - T[i]*T[i] / (2*tau*tau));
}
fft();
return 0;
}
void fft()
{
double out[ N_POINTS ];
for (int i = 0; i < N_POINTS; i++)
out[i] = At[i];
_fft(At, out, 1);
}
void _fft(double buf[], double out[], int step)
{
printf( "step: %d\n", step );
if (step < N_POINTS)
{
_fft(out, buf, step * 2);
_fft(out + step, buf + step, step * 2);
for (int i = 0; i < N_POINTS; i += 2 * step)
{
double t = exp(-I * PI * i / N_POINTS) * out[i + step];
buf[i / 2] = out[i] + t;
buf[(i + N_POINTS)/2] = out[i] - t;
printf( "index: %d buf update: %d, %d\n", i, i/2, (i+N_POINTS)/2 );
}
}
}
Suggest running via (where untitled1 is the name of the executable and on linux)
./untitled1 > out.txt
less out.txt
the out.txt file is 8630880 bytes
An examination of that file shows the lack of coverage and shows that any one entry is NOT the sum of the prior two entries, so I suspect this is not a valid Fourier transform,
I've been experimenting with SSE intrinsics and I seem to have run into a weird bug that I can't figure out. I am computing the inner product of two float arrays, 4 elements at a time.
For testing I've set each element of both arrays to 1, so the product should be == size.
It runs correctly, but whenever I run the code with size > ~68000000 the code using the sse intrinsics starts computing the wrong inner product. It seems to get stuck at a certain sum and never exceeds this number. Here is an example run:
joe:~$./test_sse 70000000
sequential inner product: 70000000.000000
sse inner product: 67108864.000000
sequential time: 0.417932
sse time: 0.274255
Compilation:
gcc -fopenmp test_sse.c -o test_sse -std=c99
This error seems to be consistent amongst the handful of computers I've tested it on. Here is the code, perhaps someone might be able to help me figure out what is going on:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>
#include <math.h>
#include <assert.h>
#include <xmmintrin.h>
double inner_product_sequential(float * a, float * b, unsigned int size) {
double sum = 0;
for(unsigned int i = 0; i < size; i++) {
sum += a[i] * b[i];
}
return sum;
}
double inner_product_sse(float * a, float * b, unsigned int size) {
assert(size % 4 == 0);
__m128 X, Y, Z;
Z = _mm_set1_ps(0.0f);
float arr[4] __attribute__((aligned(sizeof(float) * 4)));
for(unsigned int i = 0; i < size; i += 4) {
X = _mm_load_ps(a+i);
Y = _mm_load_ps(b+i);
X = _mm_mul_ps(X, Y);
Z = _mm_add_ps(X, Z);
}
_mm_store_ps(arr, Z);
return arr[0] + arr[1] + arr[2] + arr[3];
}
int main(int argc, char ** argv) {
if(argc < 2) {
fprintf(stderr, "usage: ./test_sse <size>\n");
exit(EXIT_FAILURE);
}
unsigned int size = atoi(argv[1]);
srand(time(0));
float *a = (float *) _mm_malloc(size * sizeof(float), sizeof(float) * 4);
float *b = (float *) _mm_malloc(size * sizeof(float), sizeof(float) * 4);
for(int i = 0; i < size; i++) {
a[i] = b[i] = 1;
}
double start, time_seq, time_sse;
start = omp_get_wtime();
double inner_seq = inner_product_sequential(a, b, size);
time_seq = omp_get_wtime() - start;
start = omp_get_wtime();
double inner_sse = inner_product_sse(a, b, size);
time_sse = omp_get_wtime() - start;
printf("sequential inner product: %f\n", inner_seq);
printf("sse inner product: %f\n", inner_sse);
printf("sequential time: %f\n", time_seq);
printf("sse time: %f\n", time_sse);
_mm_free(a);
_mm_free(b);
}
You are running into the precision limit of single precision floating point numbers. The number 16777216 (2^24), which is the value of each component of the vector Z when reaching the "limit" inner product, is represented in 32-bit floating point as hexadecimal 0x4b800000 or binary 0 10010111 00000000000000000000000, i.e. the 23-bit mantissa is all zeros (implicit leading 1 bit), and the 8-bit exponent part is 151 representing the exponent 151 - 127 = 24. If you add a 1 to that value this would require to increase the exponent but then the added one cannot be represented in the mantissa any longer, so in single precision floating point arithmetic 2^24 + 1 = 2^24.
You do not see that in your sequential function because there you are using a 64-bit double precision value to store the result, and as we are working on a x86 platform, internally most probably an 80-bit excess precision register is used.
You can force to use single precision throughout in your sequential code by rewriting it as
float sum;
float inner_product_sequential(float * a, float * b, unsigned int size) {
sum = 0;
for(unsigned int i = 0; i < size; i++) {
sum += a[i] * b[i];
}
return sum;
}
and you will see 16777216.000000 as maximum computed value.
My aim is to calculate the numerical integral of a probability distribution function (PDF) of the distance of an electron from the nucleus of the hydrogen atom in C programming language. I have written a sample code however it fails to find the numerical value correctly due to the fact that I cannot increase the limit as much as its necessary in my opinion. I have also included the library but I cannot use the values stated in the following post as integral boundaries: min and max value of data type in C . What is the remedy in this case? Should switch to another programming language maybe? Any help and suggestion is appreciated, thanks in advance.
Edit: After some value I get the error segmentation fault. I have checked the actual result of the integral to be 0.0372193 with Wolframalpha. In addition to this if I increment k in smaller amounts I get zero as a result that is why I defined r[k]=k, I know it should be smaller for increased precision.
#include <stdio.h>
#include <math.h>
#include <limits.h>
#define a0 0.53
int N = 200000;
// This value of N is the highest possible number in long double
// data format. Change its value to adjust the precision of integration
// and computation time.
// The discrete integral may be defined as follows:
long double trapezoid(long double x[], long double f[]) {
int i;
long double dx = x[1]-x[0];
long double sum = 0.5*(f[0]+f[N]);
for (i = 1; i < N; i++)
sum+=f[i];
return sum*dx;
}
main() {
long double P[N], r[N], a;
// Declare and initialize the loop variable
int k = 0;
for (k = 0; k < N; k++)
{
r[k] = k ;
P[k] = r[k] * r[k] * exp( -2*r[k] / a0);
//printf("%.20Lf \n", r[k]);
//printf("%.20Lf \n", P[k]);
}
a = trapezoid(r, P);
printf("%.20Lf \n", a);
}
Last Code:
#include <stdio.h>
#include <math.h>
#include <limits.h>
#include <stdlib.h>
#define a0 0.53
#define N LLONG_MAX
// This value of N is the highest possible number in long double
// data format. Change its value to adjust the precision of integration
// and computation time.
// The discrete integral may be defined as follows:
long double trapezoid(long double x[],long double f[]) {
int i;
long double dx = x[1]-x[0];
long double sum = 0.5*(f[0]+f[N]);
for (i = 1; i < N; i++)
sum+=f[i];
return sum*dx;
}
main() {
printf("%Ld", LLONG_MAX);
long double * P = malloc(N * sizeof(long double));
long double * r = malloc(N * sizeof(long double));
// Declare and initialize the loop variable
int k = 0;
long double integral;
for (k = 1; k < N; k++)
{
P[k] = r[k] * r[k] * expl( -2*r[k] / a0);
}
integral = trapezoid(r, P);
printf("%Lf", integral);
}
Edit last code working:
#include <stdio.h>
#include <math.h>
#include <limits.h>
#include <stdlib.h>
#define a0 0.53
#define N LONG_MAX/100
// This value of N is the highest possible number in long double
// data format. Change its value to adjust the precision of integration
// and computation time.
// The discrete integral may be defined as follows:
long double trapezoid(long double x[],long double f[]) {
int i;
long double dx = x[1]-x[0];
long double sum = 0.5*(f[0]+f[N]);
for (i = 1; i < N; i++)
sum+=f[i];
return sum*dx;
}
main() {
printf("%Ld \n", LLONG_MAX);
long double * P = malloc(N * sizeof(long double));
long double * r = malloc(N * sizeof(long double));
// Declare and initialize the loop variable
int k = 0;
long double integral;
for (k = 1; k < N; k++)
{
r[k] = k / 100000.0;
P[k] = r[k] * r[k] * expl( -2*r[k] / a0);
}
integral = trapezoid(r, P);
printf("%.15Lf \n", integral);
free((void *)P);
free((void *)r);
}
In particular I have changed the definition for r[k] by using a floating point number in the division operation to get a long double as a result and also as I have stated in my last comment I cannot go for Ns larger than LONG_MAX/100 and I think I should investigate the code and malloc further to get the issue. I have found the exact value that is obtained analytically by taking the limits; I have confirmed the result with TI-89 Titanium and Wolframalpha (both numerically and analytically) apart from doing it myself. The trapezoid rule worked out pretty well when the interval size has been decreased. Many thanks for all the posters here for their ideas. Having a value of 2147483647 LONG_MAX is not that particularly large as I expected by the way, should the limit not be around ten to power 308?
Numerical point of view
The usual trapezoid method doesn't work with improper integrals. As such, Gaussian quadrature rules are much better, since they not only provide 2n-1 exactness (that is, for a polynomial of degree 2n-1 they will return the correct solution), but also manage improper integrals by using the right weight function.
If your integral is improper in both sides, you should try the Gauss-Hermite quadrature, otherwise use the Gauss-Laguerre quadrature.
The "overflow" error
long double P[N], r[N], a;
P has a size of roughly 3MB, and so does r. That's too much memory. Allocate the memory instead:
long double * P = malloc(N * sizeof(long double));
long double * r = malloc(N * sizeof(long double));
Don't forget to include <stdlib.h> and use free on both P and r if you don't need them any longer. Also, you may not access the N-th entry, so f[N] is wrong.
Using Gauss-Laguerre quadrature
Now Gauss-Laguerre uses exp(-x) as weight function. If you're not familiar with Gaussian quadrature: the result of E(f) is the integral of w * f, where w is the weight function.
Your f looks like this, and:
f x = x^2 * exp (-2 * x / a)
Wait a minute. f already contains exp(-term), so we can substitute x with t = x * a /2 and get
f' x = (t * a/2)^2 * exp(-t) * a/2
Since exp(-t) is already part of our weight function, your function fits now perfectly into the Gauss-Laguerre quadrature. The resulting code is
#include <stdio.h>
#include <math.h>
/* x[] and a[] taken from
* https://de.wikipedia.org/wiki/Gau%C3%9F-Quadratur#Gau.C3.9F-Laguerre-Integration
* Calculating them by hand is a little bit cumbersome
*/
const int gauss_rule_length = 3;
const double gauss_x[] = {0.415774556783, 2.29428036028, 6.28994508294};
const double gauss_a[] = {0.711093009929, 0.278517733569, 0.0103892565016};
double f(double x){
return x *.53/2 * x *.53/2 * .53/2;
}
int main(){
int i;
double sum = 0;
for(i = 0; i < gauss_rule_length; ++i){
sum += gauss_a[i] * f(gauss_x[i]);
}
printf("%.10lf\n",sum); /* 0.0372192500 */
return 0;
}