Gradient descent returning nan

Gradient descent returning nan - c

I need to write a function to get a curve fit of a dataset. The code below is what I have. It attempts to use gradient descent to find polynomial coefficients which best fit the data.
//solves for y using the form y = a + bx + cx^2 ...
double calc_polynomial(int degree, double x, double* coeffs) {
double y = 0;
for (int i = 0; i <= degree; i++)
y += coeffs[i] * pow(x, i);
return y;
}
//find polynomial fit
//returns an array of coefficients degree + 1 long
double* poly_fit(double* x, double* y, int count, int degree, double learningRate, int iterations) {
double* coeffs = malloc(sizeof(double) * (degree + 1));
double* sums = malloc(sizeof(double) * (degree + 1));
for (int i = 0; i <= degree; i++)
coeffs[i] = 0;
for (int i = 0; i < iterations; i++) {
//reset sums each iteration
for (int j = 0; j <= degree; j++)
sums[j] = 0;
//update weights
for (int j = 0; j < count; j++) {
double error = calc_polynomial(degree, x[j], coeffs) - y[j];
//update sums
for (int k = 0; k <= degree; k++)
sums[k] += error * pow(x[j], k);
}
//subtract sums
for (int j = 0; j <= degree; j++)
coeffs[j] -= sums[j] * learningRate;
}
free(sums);
return coeffs;
}
And my testing code:
double x[] = { 0, 1, 2, 3, 4 };
double y[] = { 5, 3, 2, 3, 5 };
int size = sizeof(x) / sizeof(*x);
int degree = 1;
double* coeffs = poly_fit(x, y, size, degree, 0.01, 1000);
for (int i = 0; i <= degree; i++)
printf("%lf\n", coeffs[i]);
The code above works when degree = 1, but anything higher causes the coefficients to come back as nan.
I've also tried replacing
coeffs[j] -= sums[j] * learningRate;
with
coeffs[j] -= (1/count) * sums[j] * learningRate;
but then I get back 0s instead of nan.
Anyone know what I'm doing wrong?

I tried degree = 2, iteration = 10 and got results other than nan (values around a few thousands) Adding one to iteration seems making magnitude of the results larger by about 3 times after that.
From this observation, I guessed that the results are being multiplied by count.
In the expression
coeffs[j] -= (1/count) * sums[j] * learningRate;
Both of 1 and count are integers, so integer division is done in 1/count and it will become zero if count is larger than 1.
Instead of that, you can divide the result of multiplication by count.
coeffs[j] -= sums[j] * learningRate / count;
Another way is using 1.0 (double value) instead of 1.
coeffs[j] -= (1.0/count) * sums[j] * learningRate;

Aside:
A candidate NAN source is adding opposite signed values where one is an infinity. Given OP is using pow(x, k), which grows rapidly, using other techniques help.
Consider a chained multiplication rather than pow(). The result is usually more numerically stable. calc_polynomial() for example:
double calc_polynomial(int degree, double x, double* coeffs) {
double y = 0;
// for (int i = 0; i <= degree; i++)
for (int i = degree; i >= 0; i--)
//y += coeffs[i] * pow(x, i);
y = y*x + coeffs[i];
}
return y;
}
Similar code could be used for the main() body.

Related

Miscalculation of Lagrange interpolation formula for higher degree

I am approximating Runge’s function using Lagrange’s interpolation formula for 50 interpolation points. I have written the following program to do this, but I am getting the wrong value for x= -0.992008. That wrong value is 4817543.091313, but it should be 5197172.55933613. I have got this value from the following link: Link The code used are as follows:
#include <stdio.h>
#include <math.h>
double
runge(double x)
{
return (1 / (1 + (25 * x * x)));
}
double
ab(double x)
{
if (x < 0)
return -1 * x;
return x;
}
double
lag_func(double x, double *y_i, double *x_i, int n)
{
double ex = 0.0;
for (int i = 0; i <= n; i++) {
double numer = 1.0,
denom = 1.0,
prod = 1.0;
for (int j = 0; j <= n; j++) {
if (i != j) {
numer = (x - x_i[j]);
denom = (x_i[i] - x_i[j]);
prod *= numer / denom;
}
}
ex += (prod) * y_i[i];
}
return ex;
}
int
main()
{
int n;
scanf("%d", &n);
double y_i[n + 1],
x_i[n + 1];
for (int i = 0; i < n + 1; i++) {
x_i[i] = ((2 * (double) i) / (double) n) - 1;
y_i[i] = runge(x_i[i]);
}
printf("%lf\n", lag_func(-0.992008, y_i, x_i, n));
return 0;
}

The web site is rounding its Runge coefficients to six digits. Given the magnitudes of the terms involved, up to 3.9978•1011, this introduces multiple errors up to around 2•105.
This can be seen by inserting y_i[i] = round(y_i[i] * 1e6) / 1e6; after y_i[i] = runge(x_i[i]);. Then the output of the program is 5197172.558199, matching the web site’s inaccurate result.
The web site is wrong; the result of the code in the question is better.

Loop to get sum of factorial / exponent

Hello I wrote a loop to get the sum of the factorial of given value n divided by n with increasing exponent. To describe it better it looks like this:
But for some reason my loop is always returning the value 1 whenever I input a number.
Here's my loop:
int nVal, i, j, k, nProduct = 1, nSum = 0, nFactorial = 1;
float fResult;
for (i = 1; i <= nVal; i++)
{
for (j = 1; j <= nVal; j++)
{
nFactorial *= j;
nSum += nFactorial;
}
for (k = 1; k <= nVal; k++)
{
nProduct *= k;
}
fResult += (nSum * 1.0) / (nProduct * 1.0);
}
Any fixes I can try?

OP's code is incorrect in the numerator and denominator calculation. Also, the integer math readily overflows.
To better handle large n, form each term based on the prior term with floating point math.
double sum_fact_expo(int n) {
double sum = 0.0;
double ratio = 1.0;
for (int i = 1; i <= n; i++) {
ratio *= 1.0*i/n;
sum += ratio;
}
return sum;
}

How big will n get? This can dictate what the type for nProduct and nFactorial should be (e.g. int or long long or __int128 or double).
The nested/inner loops are wrong. You only want/need the for loop for i.
And you'll [possibly/probably] overflow/underflow if you wait to calculate the ratio. So, do this at each iteration.
Your factorial calculation line was okay. But, the nProduct calculation was incorrect. The multiplier has to be n [and not j].
You don't initialize fResult, so it starts with a random value (i.e. undefined behavior). This would be flagged if you compiled with -Wall [and -O2] to enable warnings.
Remember that:
k! = (k - 1)! * k
And, that:
n**k = n**(k - 1) * n
So, we can build up nFactorial subterms and nProduct subterms iteratively, in a single loop.
Here's what I think your code should be like:
// pick _one_ of these:
#if 0
typedef int acc_t;
#endif
#if 0
typedef long long acc_t;
#endif
#if 0
typedef __int128 acc_t;
#endif
#if 1
typedef double acc_t;
#endif
double
calc1(int nVal)
{
int i;
acc_t nProduct = 1;
acc_t nFactorial = 1;
double ratio;
double fResult = 0;
for (i = 1; i <= nVal; ++i) {
nFactorial *= i;
nProduct *= nVal;
ratio = (double) nFactorial / (double) nProduct;
fResult += ratio;
}
return fResult;
}

#include <stdio.h>
int main(void) {
int nVal, i, j, k, nProduct = 1, nFactorial = 1;
float fResult = 0;
scanf("%d", &nVal);
for (i = 1; i <= nVal; i++)
{
for (j = 1; j <= i; j++)
{
//calculate 1! 2! 3! ... n!
//actually calculate i!
nFactorial *= j;
}
for (k = 1; k <= i; k++)
{
//calculate n^1 n^2 n^3 ... n^n
//actually calculate n^i
nProduct *= nVal;
}
fResult += (nFactorial * 1.0) / (nProduct * 1.0);
nProduct = 1;
nFactorial = 1;
}
printf("%f\n", fResult);
return 0;
}

How to matrix inversion For 1 Dimension with c code?

hello i used gauss jordan for 1d but i didnt
i want to find 1d matrix inverse. I found determinant but i dont know inverse of this matrix
Hello my dear friends
Our matrixes:
double A[] = {6, 6 ,2, 4, 9 ,7, 4, 3 ,3};
double B[] = {6, 6 ,2, 4, 9 ,7, 4, 3 ,3};
double Final[9];
Function to calculate determinant:
int Inverse(double *A, double *C, int N){
int n = N;
int i, j, k;
float a[10][10] = { 0.0 };
double C[9] = { 0.0 };
float pivot = 0.0;
float factor = 0.0;
double sum = 0.0; ``` variables
for (k = 1; k <= n - 1; k++)
{
if (a[k][k] == 0.0)
{
printf("error");
}
else
{
pivot = a[k][k];
for (j = k; j <= n + 1; j++)
a[k][j] = a[k][j] / pivot;
for (i = k + 1; i <= n; i++)
{
factor = a[i][k];
for (j = k; j <= n + 1; j++)
{
a[i][j] = a[i][j] - factor * a[k][j];
}
}
}
if (a[n][n] == 0)
printf("error");
else
{
C[n] = a[n][n + 1] / a[n][n];
for (i = n - 1; i >= 1; i--)
{
sum = 0.0;
for (j = i + 1; j <= n; j++)
sum = sum + a[i][j] * C[j];
C[i] = (a[i][n + 1] - sum) / a[i][i];
}
}
}
for (i = 1; i <= n; i++)
{
printf("\n\tx[%1d]=%10.4f", i, C[i]);
}
system("PAUSE");
return 0;
}
Although I tried very hard, I couldn't find the opposite in c programming for a 1x1 dimensional matrix. Output always generates 0. Can you help me where I could be making a mistake. Thank you.

It appears you are using C as an output parameter (to store the inverse); however, you also declare a local variable of the same name in the function. This causes the local variable to shadow (i.e.: hide) the output parameter; thus, changes you make to the C in the function do not affect the C the calling function sees.
To fix this issue, you need to remove the line double C[9] = {0}; from your function.

Why is the numerical solution coming same as analytical solution in C language?

I have coded a 1 dimension cfd problem but my numerical solution is coming same as to the analytical solution (up to 6 decimal places).
I am using TDMA method for numerical solution and for the analytical solution I am directly substituting the x value in the function T(x).
Analytical solution T(x) comes out to be T(x) = -(x^2)/2 +11/21(x);
E. g. 4 grid points then ;
x0 = 0.000000, x1 = 0.333333 , x2 = 0.666666 , x3 = 0.999999 .
T(x0) = 0.000000 , T(x1) = 0.119048 , T(x2) = 0.126984 , T(x3) = 0.023810.
And for numerical solution I have used TDMA technique, please refer the code below.
Enter n = 4 for the results.
#include<stdio.h>
void temp_matrix(int n, double *a, double *b, double *c, double *d, double *T);
int main() {
int Bi = 20.0;
int n;
printf("%s ", "Enter the Number of total Grid Points");
scanf("%d", &n);
float t = (n - 1);
double dx = 1.0 / t;
int i;
printf("\n");
double q; // analytical solution below
double z[n];
for (i = 0; i <= n - 1; i++) {
q = (dx) * i;
z[i] = -(q * q) / 2 + q * (11.0 / 21);
printf("\nT analytical %lf ", z[i]);
}
double b[n - 1];
b[n - 2] = -2.0 * Bi * dx - 2.0;
for (i = 0; i <= n - 3; i++) {
b[i] = -2.0;
}
double a[n - 1];
a[n - 2] = 2.0;
a[0] = 0;
for (i = 1; i < n - 2; i++) {
a[i] = 1.0;
}
double c[n - 1];
for (i = 0; i <= n - 2; i++) {
c[i] = 1.0;
}
double d[n - 1];
for (i = 0; i <= n - 2; i++) {
d[i] = -(dx * dx);
}
double T[n];
temp_matrix(n, a, b, c, d, T);
return 0;
}
void temp_matrix(int n, double *a, double *b, double *c, double *d, double *T) {
int i;
double beta[n - 1];
double gama[n - 1];
beta[0] = b[0];
gama[0] = d[0] / beta[0];
for (i = 1; i <= n - 2; i++) {
beta[i] = b[i] - a[i] * (c[i - 1] / beta[i - 1]);
gama[i] = (d[i] - a[i] * gama[i - 1]) / beta[i];
}
int loop;
for (loop = 0; loop < n - 1; loop++)
for (loop = 0; loop < n - 1; loop++)
T[0] = 0;
T[n - 1] = gama[n - 2];
for (i = n - 2; i >= 1; i--) {
T[i] = gama[i - 1] - (c[i - 1] * (T[i + 1])) / beta[i - 1];
}
printf("\n");
for (i = 0; i < n; i++) {
printf("\nT numerical %lf", T[i]);
}
}

Why is the numerical solution coming same as analytical solution in C language?
They differ, by about 3 bits.
Print with enough precision to see the difference.
Using the below, we see a a difference in the last hexdigit of the significand of x620 vs x619 of T[3]. This is only 1 part in 1015 difference.
#include<float.h>
printf("T analytical %.*e\t%a\n", DBL_DECIMAL_DIG - 1, z[i], z[i]);
printf("T numerical %.*e\t%a\n", DBL_DECIMAL_DIG - 1, T[i], T[i]);
C allows double math to be performed at long double math when FLT_EVAL_METHOD == 2 and then the same analytical/numerical results. Your results may differ from mine due to that as well as other subtle FP nuances.
printf("FLT_EVAL_METHOD %d\n", FLT_EVAL_METHOD);
Output
T analytical 0.0000000000000000e+00 0x0p+0
T analytical 1.1904761904761907e-01 0x1.e79e79e79e7ap-4
T analytical 1.2698412698412700e-01 0x1.0410410410411p-3
T analytical 2.3809523809523836e-02 0x1.861861861862p-6
T numerical 0.0000000000000000e+00 0x0p+0
T numerical 1.1904761904761904e-01 0x1.e79e79e79e79ep-4
T numerical 1.2698412698412698e-01 0x1.041041041041p-3
T numerical 2.3809523809523812e-02 0x1.8618618618619p-6
FLT_EVAL_METHOD 0

Improve performance of a construction of p-values matrix for a permutation test

I used an R code which implements a permutation test for the distributional comparison between two populations of functions. We have p univariate p-values.
The bottleneck is the construction of a matrix which contains all the possible CONTIGUOS p-values.
The last row of the matrix of p-values contain all the univariate p-values.
The penultimate row contains all the bivariate p-values in this order:
p_val_c(1,2), p_val_c(2,3), ..., p_val_c(p, 1)
...
The elements of the first row are coincident and the value associated is the p-value of the global test p_val_c(1,...,p)=p_val_c(2,...,p,1)=...=pval(p,1,...,p-1).
For computational reasons, I have decided to implement this component in c and use it in R with .C.
Here the code. The unique important part is the definition of the function Build_pval_asymm_matrix.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <time.h>
void Build_pval_asymm_matrix(int * p, int * B, double * pval,
double * L,
double * pval_asymm_matrix);
// Function used for the sorting of vector T_temp with qsort
int cmp(const void *x, const void *y);
int main() {
int B = 1000; // number Conditional Monte Carlo (CMC) runs
int p = 100; // number univariate tests
// Generate fictitiously data univariate p-values pval and matrix L.
// The j-th column of L is the empirical survival
// function of the statistics test associated to the j-th coefficient
// of the basis expansion. The dimension of L is B * p.
// Generate pval
double pval[p];
memset(pval, 0, sizeof(pval)); // initialize all elements to 0
for (int i = 0; i < p; i++) {
pval[i] = (double)rand() / (double)RAND_MAX;
}
// Construct L
double L[B * p];
// Inizialize to 0 the elements of L
memset(L, 0, sizeof(L));
// Array used to construct the columns of L
double temp_array[B];
memset(temp_array, 0, sizeof(temp_array));
for(int i = 0; i < B; i++) {
temp_array[i] = (double) (i + 1) / (double) B;
}
for (int iter_coeff=0; iter_coeff < p; iter_coeff++) {
// Shuffle temp_array
if (B > 1) {
for (int k = 0; k < B - 1; k++)
{
int j = rand() % B;
double t = temp_array[j];
temp_array[j] = temp_array[k];
temp_array[k] = t;
}
}
for (int i=0; i<B; i++) {
L[iter_coeff + p * i] = temp_array[i];
}
}
double pval_asymm_matrix[p * p];
memset(pval_asymm_matrix, 0, sizeof(pval_asymm_matrix));
// Construct the asymmetric matrix of p-values
clock_t start, end;
double cpu_time_used;
start = clock();
Build_pval_asymm_matrix(&p, &B, pval, L, pval_asymm_matrix);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("TOTAL CPU time used: %f\n", cpu_time_used);
return 0;
}
void Build_pval_asymm_matrix(int * p, int * B, double * pval,
double * L,
double * pval_asymm_matrix) {
int nbasis = *p, iter_CMC = *B;
// Scalar output fisher combining function applied on univariate
// p-values
double T0_temp = 0;
// Vector output fisher combining function applied on a set of
//columns of L
double T_temp[iter_CMC];
memset(T_temp, 0, sizeof(T_temp));
// Counter for elements of T_temp greater than or equal to T0_temp
int count = 0;
// Indexes for columns of L
int inf = 0, sup = 0;
// The last row of matrice_pval_asymm contains the univariate p-values
for(int i = 0; i < nbasis; i++) {
pval_asymm_matrix[i + nbasis * (nbasis - 1)] = pval[i];
}
// Construct the rows from bottom to up
for (int row = nbasis - 2; row >= 0; row--) {
for (int col = 0; col <= row; col++) {
T0_temp = 0;
memset(T_temp, 0, sizeof(T_temp));
inf = col;
sup = (nbasis - row) + col - 1;
// Combining function Fisher applied on
// p-values pval[inf:sup]
for (int k = inf; k <= sup; k++) {
T0_temp += log(pval[k]);
}
T0_temp *= -2;
// Combining function Fisher applied
// on columns inf:sup of matrix L
for (int k = 0; k < iter_CMC; k++) {
for (int l = inf; l <= sup; l++) {
T_temp[k] += log(L[l + nbasis * k]);
}
T_temp[k] *= -2;
}
// Sort the vector T_temp
qsort(T_temp, iter_CMC, sizeof(double), cmp);
// Count the number of elements of T_temp less than T0_temp
int h = 0;
while (h < iter_CMC && T_temp[h] < T0_temp) {
h++;
}
// Number of elements of T_temp greater than or equal to T0_temp
count = iter_CMC - h;
pval_asymm_matrix[col + nbasis * row] = (double) count / (double)iter_CMC;
}
// auxiliary variable for columns of L inf:nbasis-1 and 1:sup
int aux_first = 0, aux_second = 0;
int num_col_needed = 0;
for (int col = row + 1; col < nbasis; col++) {
T0_temp = 0;
memset(T_temp, 0, sizeof(T_temp));
inf = col;
sup = ((nbasis - row) + col) % nbasis - 1;
// Useful indexes
num_col_needed = nbasis - inf + sup + 1;
int index_needed[num_col_needed];
memset(index_needed, -1, num_col_needed * sizeof(int));
aux_first = inf;
for (int i = 0; i < nbasis - inf; i++) {
index_needed[i] = aux_first;
aux_first++;
}
aux_second = 0;
for (int j = 0; j < sup + 1; j++) {
index_needed[j + nbasis - inf] = aux_second;
aux_second++;
}
// Combining function Fisher applied on p-values
// pval[inf:p-1] and pval[0:sup-1]1]
for (int k = 0; k < num_col_needed; k++) {
T0_temp += log(pval[index_needed[k]]);
}
T0_temp *= -2;
// Combining function Fisher applied on columns inf:p-1 and 0:sup-1
// of matrix L
for (int k = 0; k < iter_CMC; k++) {
for (int l = 0; l < num_col_needed; l++) {
T_temp[k] += log(L[index_needed[l] + nbasis * k]);
}
T_temp[k] *= -2;
}
// Sort the vector T_temp
qsort(T_temp, iter_CMC, sizeof(double), cmp);
// Count the number of elements of T_temp less than T0_temp
int h = 0;
while (h < iter_CMC && T_temp[h] < T0_temp) {
h++;
}
// Number of elements of T_temp greater than or equal to T0_temp
count = iter_CMC - h;
pval_asymm_matrix[col + nbasis * row] = (double) count / (double)iter_CMC;
} // end for over col from row + 1 to nbasis - 1
} // end for over rows of asymm p-values matrix except the last row
}
int cmp(const void *x, const void *y)
{
double xx = *(double*)x, yy = *(double*)y;
if (xx < yy) return -1;
if (xx > yy) return 1;
return 0;
}
Here the times of execution in seconds measured in R:
time_original_function
user system elapsed
79.726 1.980 112.817
time_function_double_for
user system elapsed
79.013 1.666 89.411
time_c_function
user system elapsed
47.920 0.024 56.096
The first measure was obtained using an equivalent R function with duplication of the vector pval and matrix L.
What I wanted to ask is some suggestions in order to decrease the execution time with the C function for simulation purposes. The last time I used c was five years ago and consequently there is room for improvement. For instance I sort the vector T_temp with qsort in order to compute in linear time with a while the number of elements of T_temp greater than or equal to T0_temp. Maybe this task could be done in a more efficient way. Thanks in advance!!

I reduced the input size to p to 50 to avoid waiting on it (don't have such a fast machine) -- keeping p as is and reducing B to 100 has a similar effect, but profiling it showed that ~7.5 out of the ~8 seconds used to compute this was spent in the log function.
qsort doesn't even show up as a real hotspot. This test seems to headbutt the machine more in terms of micro-efficiency than anything else.
So unless your compiler has a vastly faster implementation of log than I do, my first suggestion is to find a fast log implementation if you can afford some accuracy loss (there are ones out there that can compute log over an order of magnitude faster with precision loss in the range of ~3% or so).
If you cannot have precision loss and accuracy is critical, then I'd suggest trying to memoize the values you use for log if you can and store them into a lookup table.
Update
I tried the latter approach.
// Create a memoized table of log values.
double log_cache[B * p];
for (int j=0, num=B*p; j < num; ++j)
log_cache[j] = log(L[j]);
Using malloc might be better here, as we're pushing rather large data to the stack and could risk overflows.
Then pass her into Build_pval_asymm_matrix.
Replace these:
T_temp[k] += log(L[l + nbasis * k]);
...
T_temp[k] += log(L[index_needed[l] + nbasis * k]);
With these:
T_temp[k] += log_cache[l + nbasis * k];
...
T_temp[k] += log_cache[index_needed[l] + nbasis * k];
This improved the times for me from ~8 seconds to ~5.3 seconds, but we've exchanged the computational overhead of log for memory overhead which isn't that much better (in fact, it rarely is but calling log for double-precision floats is apparently quite expensive, enough to make this exchange worthwhile). The next iteration, if you want more speed, and it is very possible, involves looking into cache efficiency.
For this kind of huge matrix stuff, focusing on memory layouts and access patterns can work wonders.