CHOLMOD sparse matrix cholesky decomposition: incorrect factor? - c

I have been using CHOLMOD to factorise the matrix A and solve the system Ax = b, for A being the Hessian matrix (printed below) and b = [1, 1, 1] created by the cholmod_ones function.
Unfortunately, the solution for x is incorrect (should be [1.5, 2.0, 1.5]) and to confirm I then multiplied A and x back together and don't get [1, 1, 1]. I don't quite understand what I am doing wrong.
Additionally, I've looked at the factor and the values of the matrix elements don't make sense either.
Output
Hessian:
2.000 -1.000 0.000
-1.000 2.000 -1.000
0.000 -1.000 2.000
Solution:
2.500 0.000 0.000
3.500 0.000 0.000
2.500 0.000 0.000
B vector:
1.500 0.000 0.000
2.000 0.000 0.000
1.500 0.000 0.000
Code
iterate_hessian() is an external function that returns doubles which are read into the CHOLMOD hessian matrix.
The entry point for the code is cholesky_determinant which is called with an argument which gives the dimension of the (square) matrix.
#include <cholmod.h>
#include <string.h>
// Function prototype that gives the next value of the Hessian
double iterate_hessian();
cholmod_sparse *cholmod_hessian(double *hessian, size_t dimension, cholmod_common *common) {
// This function assigns the Hessian matrix from OPTIM to a dense matrix for CHOLMOD to use.
// Allocate a dense cholmod matrix of appropriate size
cholmod_triplet *triplet_hessian;
triplet_hessian = cholmod_allocate_triplet(dimension, dimension, dimension*dimension, 0, CHOLMOD_REAL, common);
// Loop through values of hessian and assign their row/column index and values to triplet_hessian.
size_t loop;
for (loop = 0; loop < (dimension * dimension); loop++) {
if (hessian[loop] == 0) {
continue;
}
((int*)triplet_hessian->i)[triplet_hessian->nnz] = loop / dimension;
((int*)triplet_hessian->j)[triplet_hessian->nnz] = loop % dimension;
((double*)triplet_hessian->x)[triplet_hessian->nnz] = hessian[loop];
triplet_hessian->nnz++;
}
// Convert the triplet to a sparse matrix and return.
cholmod_sparse *sparse_hessian;
sparse_hessian = cholmod_triplet_to_sparse(triplet_hessian, (dimension * dimension), common);
return sparse_hessian;
}
void print_matrix(cholmod_dense *matrix, size_t dimension) {
// matrix->x is a void pointer, so first copy it to a double pointer
// of an appropriate size
double *y = malloc(sizeof(matrix->x));
y = matrix->x;
// Loop variables
size_t i, j;
// Row
for(i = 0; i < dimension; i++) {
// Column
for(j = 0; j < dimension; j++) {
printf("% 8.3f ", y[i + j * dimension]);
}
printf("\n");
}
}
cholmod_dense *factorized(cholmod_sparse *sparse_hessian, cholmod_common *common) {
cholmod_factor *factor;
factor = cholmod_analyze(sparse_hessian, common);
cholmod_factorize(sparse_hessian, factor, common);
cholmod_dense *b, *x;
b = cholmod_ones(sparse_hessian->nrow, 1, sparse_hessian->xtype, common);
x = cholmod_solve(CHOLMOD_LDLt, factor, b, common);
cholmod_free_factor(&factor, common);
// Return the solution, x
return x;
}
double cholesky_determinant(int *dimension) {
// Declare variables
double determinant;
cholmod_sparse *A;
cholmod_dense *B, *Y;
cholmod_common common;
// Start CHOLMOD
cholmod_start (&common);
// Allocate storage for the hessian (we want to copy it)
double *hessian = malloc(*dimension * *dimension * sizeof(hessian));
// Get the hessian from OPTIM
int i = 0;
for (i = 0; i < (*dimension * *dimension); i++) {
hessian[i] = iterate_hessian();
}
A = cholmod_hessian(hessian, *dimension, &common);
printf("Hessian:\n");
print_matrix(cholmod_sparse_to_dense(A, &common), *dimension);
B = factorized(A, &common);
printf("Solution:\n");
print_matrix(B, *dimension);
double alpha[] = {1, 0};
double beta[] = {0, 0};
Y = cholmod_allocate_dense(*dimension, 1, *dimension, CHOLMOD_REAL, &common);
cholmod_sdmult(A, 0, alpha, beta, B, Y, &common);
printf("B vector:\n");
print_matrix(Y, *dimension);
determinant = 0.0;
// Free up memory and finish CHOLMOD
cholmod_free_sparse (&A, &common);
cholmod_free_dense (&B, &common);
cholmod_finish (&common);
return determinant;
}

It turns out that I hadn't set the stype for my sparse matrix properly. The stype determines the symmetry (and thus the subsequent behaviour of calls to cholmod_factorize). It was in fact factorising and solving for AA'.

Related

Logistic regression code stops working above ~43,500 generated observations

Having some difficulty troubleshooting code I wrote in C to perform a logistic regression. While it seems to work on smaller, semi-randomized datasets, it stops working (e.g. assigning proper probabilities of belonging to class 1) at around the point where I pass 43,500 observations (determined by tweaking the number of observations created. When creating the 150 features used in the code, I do create the first two as a function of the number of observations, so I'm not sure if maybe that's the issue here, though I am using double precision. Maybe there's an overflow somewhere in the code?
The below code should be self-contained; it generates m=50,000 observations with n=150 features. Setting m below 43,500 should return "Percent class 1: 0.250000", setting to 44,000 or above will return "Percent class 1: 0.000000", regardless of what max_iter (number of times we sample m observations) is set to.
The first feature is set to 1.0 divided by the total number of observations, if class 0 (first 75% of observations), or the index of the observation divided by the total number of observations otherwise.
The second feature is just index divided by total number of observations.
All other features are random.
The logistic regression is intended to use stochastic gradient descent, randomly selecting an observation index, computing the gradient of the loss with the predicted y using current weights, and updating weights with the gradient and learning rate (eta).
Using the same initialization with Python and NumPy, I still get the proper results, even above 50,000 observations.
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
// Compute z = w * x + b
double dlc( int n, double *X, double *coef, double intercept )
{
double y_pred = intercept;
for (int i = 0; i < n; i++)
{
y_pred += X[i] * coef[i];
}
return y_pred;
}
// Compute y_hat = 1 / (1 + e^(-z))
double sigmoid( int n, double alpha, double *X, double *coef, double beta, double intercept )
{
double y_pred;
y_pred = dlc(n, X, coef, intercept);
y_pred = 1.0 / (1.0 + exp(-y_pred));
return y_pred;
}
// Stochastic gradient descent
void sgd( int m, int n, double *X, double *y, double *coef, double *intercept, double eta, int max_iter, int fit_intercept, int random_seed )
{
double *gradient_coef, *X_i;
double y_i, y_pred, resid;
int idx;
double gradient_intercept = 0.0, alpha = 1.0, beta = 1.0;
X_i = (double *) malloc (n * sizeof(double));
gradient_coef = (double *) malloc (n * sizeof(double));
for ( int i = 0; i < n; i++ )
{
coef[i] = 0.0;
gradient_coef[i] = 0.0;
}
*intercept = 0.0;
srand(random_seed);
for ( int epoch = 0; epoch < max_iter; epoch++ )
{
for ( int run = 0; run < m; run++ )
{
// Randomly sample an observation
idx = rand() % m;
for ( int i = 0; i < n; i++ )
{
X_i[i] = X[n*idx+i];
}
y_i = y[idx];
// Compute y_hat
y_pred = sigmoid( n, alpha, X_i, coef, beta, *intercept );
resid = -(y_i - y_pred);
// Compute gradients and adjust weights
for (int i = 0; i < n; i++)
{
gradient_coef[i] = X_i[i] * resid;
coef[i] -= eta * gradient_coef[i];
}
if ( fit_intercept == 1 )
{
*intercept -= eta * resid;
}
}
}
}
int main(void)
{
double *X, *y, *coef, *y_pred;
double intercept;
double eta = 0.05;
double alpha = 1.0, beta = 1.0;
long m = 50000;
long n = 150;
int max_iter = 20;
long class_0 = (long)(3.0 / 4.0 * (double)m);
double pct_class_1 = 0.0;
clock_t test_start;
clock_t test_end;
double test_time;
printf("Constructing variables...\n");
X = (double *) malloc (m * n * sizeof(double));
y = (double *) malloc (m * sizeof(double));
y_pred = (double *) malloc (m * sizeof(double));
coef = (double *) malloc (n * sizeof(double));
// Initialize classes
for (int i = 0; i < m; i++)
{
if (i < class_0)
{
y[i] = 0.0;
}
else
{
y[i] = 1.0;
}
}
// Initialize observation features
for (int i = 0; i < m; i++)
{
if (i < class_0)
{
X[n*i] = 1.0 / (double)m;
}
else
{
X[n*i] = (double)i / (double)m;
}
X[n*i + 1] = (double)i / (double)m;
for (int j = 2; j < n; j++)
{
X[n*i + j] = (double)(rand() % 100) / 100.0;
}
}
// Fit weights
printf("Running SGD...\n");
test_start = clock();
sgd( m, n, X, y, coef, &intercept, eta, max_iter, 1, 42 );
test_end = clock();
test_time = (double)(test_end - test_start) / CLOCKS_PER_SEC;
printf("Time taken: %f\n", test_time);
// Compute y_hat and share of observations predicted as class 1
printf("Making predictions...\n");
for ( int i = 0; i < m; i++ )
{
y_pred[i] = sigmoid( n, alpha, &X[i*n], coef, beta, intercept );
}
printf("Printing results...\n");
for ( int i = 0; i < m; i++ )
{
//printf("%f\n", y_pred[i]);
if (y_pred[i] > 0.5)
{
pct_class_1 += 1.0;
}
// Troubleshooting print
if (i < 10 || i > m - 10)
{
printf("%g\n", y_pred[i]);
}
}
printf("Percent class 1: %f", pct_class_1 / (double)m);
return 0;
}
For reference, here is my (presumably) equivalent Python code, which returns the correct percent of identified classes at more than 50,000 observations:
import numpy as np
import time
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class LogisticRegressor:
def __init__(self, eta, init_runs, fit_intercept=True):
self.eta = eta
self.init_runs = init_runs
self.fit_intercept = fit_intercept
def fit(self, x, y):
m, n = x.shape
self.coef = np.zeros((n, 1))
self.intercept = np.zeros((1, 1))
for epoch in range(self.init_runs):
for run in range(m):
idx = np.random.randint(0, m)
x_i = x[idx:idx+1, :]
y_i = y[idx]
y_pred_i = sigmoid(x_i.dot(self.coef) + self.intercept)
gradient_w = -(x_i.T * (y_i - y_pred_i))
self.coef -= self.eta * gradient_w
if self.fit_intercept:
gradient_b = -(y_i - y_pred_i)
self.intercept -= self.eta * gradient_b
def predict_proba(self, x):
m, n = x.shape
y_pred = np.ones((m, 2))
y_pred[:,1:2] = sigmoid(x.dot(self.coef) + self.intercept)
y_pred[:,0:1] -= y_pred[:,1:2]
return y_pred
def predict(self, x):
return np.round(sigmoid(x.dot(self.coef) + self.intercept))
m = 50000
n = 150
class1 = int(3.0 / 4.0 * m)
X = np.random.rand(m, n)
y = np.zeros((m, 1))
for obs in range(m):
if obs < class1:
continue
else:
y[obs,0] = 1
for obs in range(m):
if obs < class1:
X[obs, 0] = 1.0 / float(m)
else:
X[obs, 0] = float(obs) / float(m)
X[obs, 1] = float(obs) / float(m)
logit = LogisticRegressor(0.05, 20)
start_time = time.time()
logit.fit(X, y)
end_time = time.time()
print(round(end_time - start_time, 2))
y_pred = logit.predict(X)
print("Percent:", y_pred.sum() / len(y_pred))
The issue is here:
// Randomly sample an observation
idx = rand() % m;
... in light of the fact that the OP's RAND_MAX is 32767. This is exacerbated by the fact that all of the class 0 observations are at the end.
All samples will be drawn from the first 32768 observations, and when the total number of observations is greater than that, the proportion of class 0 observations among those that can be sampled is less than 0.25. At 43691 total observations, there are no class 0 observations among those that can be sampled.
As a secondary issue, rand() % m does not yield a wholly uniform distribution if m does not evenly divide RAND_MAX + 1, though the effect of this issue will be much more subtle.
Bottom line: you need a better random number generator.
At minimum, you could consider combining the bits from two calls to rand() to yield an integer with sufficient range, but you might want to consider getting a third-party generator. There are several available.
Note: OP reports "m=50,000 observations with n=150 features.", so perhaps this is not the issue for OP, but I'll leave this answer up for reference when OP tries larger tasks.
A potential issue:
long overflow
m * n * sizeof(double) risks overflow when long is 32-bit and m*n > LONG_MAX (or about 46,341 if m, n are the same).
OP does report
A first step is to perform the multiplication using size_t math where we gain at least 1 more bit in the calculation.
// m * n * sizeof(double)
sizeof(double) * m * n
Yet unless OP's size_t is more than 32-bit, we still have trouble.
IAC, I recommend to use size_t for array sizing and indexing.
Check allocations for failure too.
Since RAND_MAX may be too small and array indexing should be done using size_t math, consider a helper function to generate a random index over the entire size_t range.
// idx = rand() % m;
size_t idx = rand_size_t() % (size_t)m;
If stuck with the standard rand(), below is a helper function to extend its range as needed.
It uses the real nifty IMAX_BITS(m).
#include <assert.h>
#include <limits.h>
#include <stdint.h>
#include <stdlib.h>
// https://stackoverflow.com/a/4589384/2410359
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
// Test that RAND_MAX is a power of 2 minus 1
_Static_assert((RAND_MAX & 1) && ((RAND_MAX/2 + 1) & (RAND_MAX/2)) == 0, "RAND_MAX is not a Mersenne number");
#define RAND_MAX_WIDTH (IMAX_BITS(RAND_MAX))
#define SIZE_MAX_WIDTH (IMAX_BITS(SIZE_MAX))
size_t rand_size_t(void) {
size_t index = (size_t) rand();
for (unsigned i = RAND_MAX_WIDTH; i < SIZE_MAX_WIDTH; i += RAND_MAX_WIDTH) {
index <<= RAND_MAX_WIDTH;
index ^= (size_t) rand();
}
return index;
}
Further considerations can replace the rand_size_t() % (size_t)m with a more uniform distribution.
As has been determined elsewhere, the problem is due to the implementation's RAND_MAX value being too small.
Assuming 32-bit ints, a slightly better PRNG function can be implemented in the code, such as this C implementation of the minstd_rand() function from C++:
#define MINSTD_RAND_MAX 2147483646
// Code assumes `int` is at least 32 bits wide.
static unsigned int minstd_seed = 1;
static void minstd_srand(unsigned int seed)
{
seed %= 2147483647;
// zero seed is bad!
minstd_seed = seed ? seed : 1;
}
static int minstd_rand(void)
{
minstd_seed = (unsigned long long)minstd_seed * 48271 % 2147483647;
return (int)minstd_seed;
}
Another problem is that expressions of the form rand() % m produce a biased result when m does not divide (unsigned int)RAND_MAX + 1. Here is an unbiased function that returns a random integer from 0 to le inclusive, making use of the minstd_rand() function defined earlier:
static int minstd_rand_max(int le)
{
int r;
if (le < 0)
{
r = le;
}
else if (le >= MINSTD_RAND_MAX)
{
r = minstd_rand();
}
else
{
int rm = MINSTD_RAND_MAX - le + MINSTD_RAND_MAX % (le + 1);
while ((r = minstd_rand()) > rm)
{
}
r /= (rm / (le + 1) + 1);
}
return r;
}
(Actually, it does still have a very small bias because minstd_rand() will never return 0.)
For example, replace rand() % 100 with minstd_rand_max(99), and replace rand() % m with minstd_rand_max(m - 1). Also replace srand(random_seed) with minstd_srand(random_seed).

FFTW guru interface

I don't understand the guru interface of FFTW. Let me explain how I thought it worked based on the manual and this question How to use fftw Guru interface and maybe someone can clear up my misunderstanding.
fftw_plan fftw_plan_guru64_dft(
int rank, const fftw_iodim64 *dims,
int howmany_rank, const fftw_iodim64 *howmany_dims,
fftw_complex *in, fftw_complex *out,
int sign, unsigned flags);
Suppose we want to calculate the DFT of interleaved multidimensional arrays, such as the six 2x2 arrays (each with a different colour) in this picture.
interleaved dfts
Because the dfts have stride 3 in the vertical direction, and stride 2 in the horizontal direction, I thought we would need rank = 2 and dims = {(2, 3, 3), (2, 2, 2)}. The starting points are a 3 x 2 subarray, so I thought howmany_rank = 2, howmany_dims = {(3, 1, 1), (2, 1, 1)}.
However, this is not actually what FFTW does. I made a smaller example that is easy to calculate by hand, consisting of 4 DFTs of size 2x1 (indicated by colours). Each dft is of the form (+-1, 0) which has as output (+-1, +-1), but that is not what FFTW calculates.
small example
Here is the code I used to calculate the DFT.
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>
#include <math.h>
#include <fftw3.h>
int main()
{
fftw_complex* X = fftw_malloc(8 * sizeof(fftw_complex));
fftw_iodim* sizes = malloc(2 * sizeof(fftw_iodim));
fftw_iodim* startingPoints = malloc(2 * sizeof(fftw_iodim));
sizes[0].n = 2; sizes[0].is = 2; sizes[0].os = 2;
sizes[1].n = 1; sizes[1].is = 2; sizes[1].os = 2;
startingPoints[0].n = 2; startingPoints[0].is = 1; startingPoints[0].os = 1;
startingPoints[1].n = 2; startingPoints[1].is = 1; startingPoints[1].os = 1;
fftw_plan plan = fftw_plan_guru_dft(2, sizes, 2, startingPoints, X, X, FFTW_FORWARD, FFTW_ESTIMATE);
X[0] = 1.0; X[1] = -1.0;
X[2] = 1.0; X[3] = -1.0;
X[4] = 0.0; X[5] = 0.0;
X[6] = 0.0; X[7] = 0.0;
fftw_execute(plan);
printf("\nOutput in row-major order:\n");
for (int i = 0; i < 8; i++) {
printf("%lf + %lfi, ", creal(X[i]), cimag(X[i]));
}
return 0;
}
Strides even for major axes are in "units", i.e. doubles or fftw_complexes, not number of rows: https://www.fftw.org/fftw3_doc/Guru-vector-and-transform-sizes.html#Guru-vector-and-transform-sizes
My guess is that in major axis strides have to be multiplied by the distance between consecutive rows, also in units. So for the arrays their iodims.is and iodims.os strides should be 4*3 == 12.

Initial value problem for a system of ODEs solver C program

So I wanted to implement the path of the Moon around the Earth with a C program.
My problem is that you know the Moon's velocity and position at Apogee and Perigee.
So I started to solve it from Apogee, but I cannot figure out how I could add the second velocity and position as "initial value" for it. I tried it with an if but I don't see any difference between the results. Any help is appreciated!
Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
typedef void (*ode)(double* p, double t, double* k, double* dk);
void euler(ode f, double *p, double t, double* k, double h, int n, int N)
{
double kn[N];
double dk[N];
double Rp = - 3.633 * pow(10,8); // x position at Perigee
for(int i = 0; i < n; i++)
{
f(p, 0, k, dk);
for (int j = 0; j < N; j++)
{
if (k[0] == Rp) // this is the "if" I mentioned in my comment
// x coordinate at Perigee
{
k[1] = 0; // y coordinate at Perigee
k[2] = 0; // x velocity component at Perigee
k[3] = 1076; // y velocity component at Perigee
}
kn[j] = k[j] + h * dk[j];
printf("%f ", kn[j]);
k[j] = kn[j];
}
printf("\n");
}
}
void gravity_equation(double* p, double t, double* k, double* dk)
{
// Earth is at the (0, 0)
double G = p[0]; // Gravitational constant
double m = p[1]; // Earth mass
double x = k[0]; // x coordinate at Apogee
double y = k[1]; // y coordinate at Apogee
double Vx = k[2]; // x velocity component at Apogee
double Vy = k[3]; // y velocity component at Apogee
dk[0] = Vx;
dk[1] = Vy;
dk[2] = (- G * m * x) / pow(sqrt((x * x)+(y * y)),3);
dk[3] = (- G * m * y) / pow(sqrt((x * x)+(y * y)),3);
}
void run_gravity_equation()
{
int N = 4; // how many equations there are
double initial_values[N];
initial_values[0] = 4.055*pow(10,8); // x position at Apogee
initial_values[1] = 0; // y position at Apogee
initial_values[2] = 0; // x velocity component at Apogee
initial_values[3] = (-1) * 964; //y velocity component at Perigee
int p = 2; // how many parameters there are
double parameters[p];
parameters[0] = 6.67384 * pow(10, -11); // Gravitational constant
parameters[1] = 5.9736 * pow(10, 24); // Earth mass
double h = 3600; // step size
int n = 3000; // the number of steps
euler(&gravity_equation, parameters, 0, initial_values, h, n, N);
}
int main()
{
run_gravity_equation();
return 0;
}
Your interface is
euler(odefun, params, t0, y0, h, n, N)
where
N = dimension of state space
n = number of steps to perform
h = step size
t0, y0 = initial time and value
The intended function of this procedure seems to be that the updated values are returned inside the array y0. There is no reason to insert some hack to force the state to have some initial conditions. The initial condition is passed as argument. As you are doing in void run_gravity_equation(). The integration routine should remain agnostic of the details of the physical model.
It is extremely improbable that you will hit the same value in k[0] == Rp a second time. What you can do is to check for sign changes in Vx, that is, k[1] to find points or segments of extremal x coordinate.
Trying to interpret your description closer, what you want to do is to solve a boundary value problem where x(0)=4.055e8, x'(0)=0, y'(0)=-964 and x(T)=-3.633e8, x'(T)=0. This has the advanced tasks to solve a boundary value problem with single or multiple shooting and additionally, that the upper boundary is variable.
You might want to to use the Kepler laws to get further insights into the parameters of this problem so that you can solve it just with a forward integration. The Kepler ellipse of the first Kepler law has the formula (scaled for Apogee at phi=0, Perigee at phi=pi)
r = R/(1-E*cos(phi))
so that
R/(1-E)=4.055e8 and R/(1+E)=3.633e8,
which gives
R=3.633*(1+E)=4.055*(1-E)
==> E = (4.055-3.633)/(4.055+3.633) = 0.054891,
R = 3.633e8*(1+0.05489) = 3.8324e8
Further, the angular velocity is given by the second Kepler law
phi'*r^2 = const. = sqrt(R*G*m)
which gives tangential velocities at Apogee (r=R/(1-E))
y'(0)=phi'*r = sqrt(R*G*m)*(1-E)/R = 963.9438
and Perigee (r=R/(1+E))
-y'(T)=phi'*r = sqrt(R*G*m)*(1+E)/R = 1075.9130
which indeed reproduces the constants you used in your code.
The area of the Kepler ellipse is pi/4 times the product of smallest and largest diameter. The smallest diameter can be found at cos(phi)=E, the largest is the sum of apogee and perigee radius, so that the area is
pi*R/sqrt(1-E^2)*(R/(1+E)+R/(1-E))/2= pi*R^2/(1-E^2)^1.5
At the same time it is the integral over 0.5*phi*r^2 over the full period 2*T, thus equal to
sqrt(R*G*m)*T
which is the third Kepler law. This allows to compute the half-period as
T = pi/sqrt(G*m)*(R/(1-E^2))^1.5 = 1185821
With h = 3600 the half point should be reached between n=329 and n=330 (n=329.395). Integration with scipy.integrate.odeint vs. Euler steps gives the following table for h=3600:
n [ x[n], y[n] ] for odeint/lsode for Euler
328 [ -4.05469444e+08, 4.83941626e+06] [ -4.28090166e+08, 3.81898023e+07]
329 [ -4.05497554e+08, 1.36933874e+06] [ -4.28507841e+08, 3.48454695e+07]
330 [ -4.05494242e+08, -2.10084488e+06] [ -4.28897657e+08, 3.14986514e+07]
The same for h=36, n=32939..32940
n [ x[n], y[n] ] for odeint/lsode for Euler
32938 [ -4.05499997e+08 5.06668940e+04] [ -4.05754415e+08 3.93845978e+05]
32939 [ -4.05500000e+08 1.59649309e+04] [ -4.05754462e+08 3.59155385e+05]
32940 [ -4.05500000e+08 -1.87370323e+04] [ -4.05754505e+08 3.24464789e+05]
32941 [ -4.05499996e+08 -5.34389954e+04] [ -4.05754545e+08 2.89774191e+05]
which is a little closer for the Euler method, but not much better.

Eigenvalue calculation using TQLI algorithm fails with segmentation fault

I am trying to calculate eigenvalues using the TQLI algorithm that I got from the website of the CACS of the University of Southern California. My test script looks like this:
#include <stdio.h>
int main()
{
int i;
i = rand();
printf("My random number: %d\n", i);
float d[4] = {
{1, 2, 3, 4}
};
float e[4] = {
{0, 0, 0, 0}
};
float z[4][4] = {
{1.0, 0.0, 0.0, 0.0} ,
{0.0, 1.0, 0.0, 0.0} ,
{0.0, 0.0, 1.0, 0.0},
{0.0, 0.0, 0.0, 1.0}
};
double *zptr;
zptr = &z[0][0];
printf("Element [2][1] of identity matrix: %f\n", z[2][1]);
printf("Element [2][2] of identity matrix: %f\n", z[2][2]);
tqli(d, e, 4, zptr);
printf("First eigenvalue: %f\n", d[0]);
return 0;
}
When I try to run this script I get a segmentation fault error as you can see in here. At what location does my code produce this segmentation fault. As I believe the code from USC is bug-free I am pretty sure the mistake must be in my call of the function. However I can't see where I made a mistake in my set-up of the arrays as in my opinion I followed the instructions.
Eigenvalue calculation using TQLI algorithm fails with segmentation
fault
Segmentation fault comes from crossing the supplied array boundary. tqli requires specific data preparation.
1) The eigen code from CACS is Fortran based and counts indexes from 1.
2) The tqli expects double pointer for its matrix and double vectors.
/******************************************************************************/
void tqli(double d[], double e[], int n, double **z)
/*******************************************************************************
d, and e should be declared as double.
3) The program needs modification in respect to the data preparation for the above function.
Helper 1-index based vectors have to be created to supply properly formatted data for the tqli:
double z[NP][NP] = { {2, 0, 0}, {0, 4, 0}, {0, 0, 2} } ;
double **a;
double *d,*e,*f;
d=dvector(1,NP); // 1-index based vector
e=dvector(1,NP);
f=dvector(1,NP);
a=dmatrix(1,NP,1,NP); // 1-index based matrix
for (i=1;i<=NP;i++) // loading data from zero besed `ze` to `a`
for (j=1;j<=NP;j++) a[i][j]=z[i-1][j-1];
Complete test program is supplied below. It uses the eigen code from CACS:
/*******************************************************************************
Eigenvalue solvers, tred2 and tqli, from "Numerical Recipes in C" (Cambridge
Univ. Press) by W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery
*******************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define NR_END 1
#define SIGN(a,b) ((b) >= 0.0 ? fabs(a) : -fabs(a))
double **dmatrix(int nrl, int nrh, int ncl, int nch)
/* allocate a double matrix with subscript range m[nrl..nrh][ncl..nch] */
{
int i,nrow=nrh-nrl+1,ncol=nch-ncl+1;
double **m;
/* allocate pointers to rows */
m=(double **) malloc((size_t)((nrow+NR_END)*sizeof(double*)));
m += NR_END;
m -= nrl;
/* allocate rows and set pointers to them */
m[nrl]=(double *) malloc((size_t)((nrow*ncol+NR_END)*sizeof(double)));
m[nrl] += NR_END;
m[nrl] -= ncl;
for(i=nrl+1;i<=nrh;i++) m[i]=m[i-1]+ncol;
/* return pointer to array of pointers to rows */
return m;
}
double *dvector(int nl, int nh)
/* allocate a double vector with subscript range v[nl..nh] */
{
double *v;
v=(double *)malloc((size_t) ((nh-nl+1+NR_END)*sizeof(double)));
return v-nl+NR_END;
}
/******************************************************************************/
void tred2(double **a, int n, double d[], double e[])
/*******************************************************************************
Householder reduction of a real, symmetric matrix a[1..n][1..n].
On output, a is replaced by the orthogonal matrix Q effecting the
transformation. d[1..n] returns the diagonal elements of the tridiagonal matrix,
and e[1..n] the off-diagonal elements, with e[1]=0. Several statements, as noted
in comments, can be omitted if only eigenvalues are to be found, in which case a
contains no useful information on output. Otherwise they are to be included.
*******************************************************************************/
{
int l,k,j,i;
double scale,hh,h,g,f;
for (i=n;i>=2;i--) {
l=i-1;
h=scale=0.0;
if (l > 1) {
for (k=1;k<=l;k++)
scale += fabs(a[i][k]);
if (scale == 0.0) /* Skip transformation. */
e[i]=a[i][l];
else {
for (k=1;k<=l;k++) {
a[i][k] /= scale; /* Use scaled a's for transformation. */
h += a[i][k]*a[i][k]; /* Form sigma in h. */
}
f=a[i][l];
g=(f >= 0.0 ? -sqrt(h) : sqrt(h));
e[i]=scale*g;
h -= f*g; /* Now h is equation (11.2.4). */
a[i][l]=f-g; /* Store u in the ith row of a. */
f=0.0;
for (j=1;j<=l;j++) {
/* Next statement can be omitted if eigenvectors not wanted */
a[j][i]=a[i][j]/h; /* Store u/H in ith column of a. */
g=0.0; /* Form an element of A.u in g. */
for (k=1;k<=j;k++)
g += a[j][k]*a[i][k];
for (k=j+1;k<=l;k++)
g += a[k][j]*a[i][k];
e[j]=g/h; /* Form element of p in temporarily unused element of e. */
f += e[j]*a[i][j];
}
hh=f/(h+h); /* Form K, equation (11.2.11). */
for (j=1;j<=l;j++) { /* Form q and store in e overwriting p. */
f=a[i][j];
e[j]=g=e[j]-hh*f;
for (k=1;k<=j;k++) /* Reduce a, equation (11.2.13). */
a[j][k] -= (f*e[k]+g*a[i][k]);
}
}
} else
e[i]=a[i][l];
d[i]=h;
}
/* Next statement can be omitted if eigenvectors not wanted */
d[1]=0.0;
e[1]=0.0;
/* Contents of this loop can be omitted if eigenvectors not
wanted except for statement d[i]=a[i][i]; */
for (i=1;i<=n;i++) { /* Begin accumulation of transformation matrices. */
l=i-1;
if (d[i]) { /* This block skipped when i=1. */
for (j=1;j<=l;j++) {
g=0.0;
for (k=1;k<=l;k++) /* Use u and u/H stored in a to form P.Q. */
g += a[i][k]*a[k][j];
for (k=1;k<=l;k++)
a[k][j] -= g*a[k][i];
}
}
d[i]=a[i][i]; /* This statement remains. */
a[i][i]=1.0; /* Reset row and column of a to identity matrix for next iteration. */
for (j=1;j<=l;j++) a[j][i]=a[i][j]=0.0;
}
}
/******************************************************************************/
void tqli(double d[], double e[], int n, double **z)
/*******************************************************************************
QL algorithm with implicit shifts, to determine the eigenvalues and eigenvectors
of a real, symmetric, tridiagonal matrix, or of a real, symmetric matrix
previously reduced by tred2 sec. 11.2. On input, d[1..n] contains the diagonal
elements of the tridiagonal matrix. On output, it returns the eigenvalues. The
vector e[1..n] inputs the subdiagonal elements of the tridiagonal matrix, with
e[1] arbitrary. On output e is destroyed. When finding only the eigenvalues,
several lines may be omitted, as noted in the comments. If the eigenvectors of
a tridiagonal matrix are desired, the matrix z[1..n][1..n] is input as the
identity matrix. If the eigenvectors of a matrix that has been reduced by tred2
are required, then z is input as the matrix output by tred2. In either case,
the kth column of z returns the normalized eigenvector corresponding to d[k].
*******************************************************************************/
{
double pythag(double a, double b);
int m,l,iter,i,k;
double s,r,p,g,f,dd,c,b;
for (i=2;i<=n;i++) e[i-1]=e[i]; /* Convenient to renumber the elements of e. */
e[n]=0.0;
for (l=1;l<=n;l++) {
iter=0;
do {
for (m=l;m<=n-1;m++) { /* Look for a single small subdiagonal element to split the matrix. */
dd=fabs(d[m])+fabs(d[m+1]);
if ((double)(fabs(e[m])+dd) == dd) break;
}
if (m != l) {
if (iter++ == 30) printf("Too many iterations in tqli");
g=(d[l+1]-d[l])/(2.0*e[l]); /* Form shift. */
r=pythag(g,1.0);
g=d[m]-d[l]+e[l]/(g+SIGN(r,g)); /* This is dm - ks. */
s=c=1.0;
p=0.0;
for (i=m-1;i>=l;i--) { /* A plane rotation as in the original QL, followed by Givens */
f=s*e[i]; /* rotations to restore tridiagonal form. */
b=c*e[i];
e[i+1]=(r=pythag(f,g));
if (r == 0.0) { /* Recover from underflow. */
d[i+1] -= p;
e[m]=0.0;
break;
}
s=f/r;
c=g/r;
g=d[i+1]-p;
r=(d[i]-g)*s+2.0*c*b;
d[i+1]=g+(p=s*r);
g=c*r-b;
/* Next loop can be omitted if eigenvectors not wanted */
for (k=1;k<=n;k++) { /* Form eigenvectors. */
f=z[k][i+1];
z[k][i+1]=s*z[k][i]+c*f;
z[k][i]=c*z[k][i]-s*f;
}
}
if (r == 0.0 && i >= l) continue;
d[l] -= p;
e[l]=g;
e[m]=0.0;
}
} while (m != l);
}
}
/******************************************************************************/
double pythag(double a, double b)
/*******************************************************************************
Computes (a2 + b2)1/2 without destructive underflow or overflow.
*******************************************************************************/
{
double absa,absb;
absa=fabs(a);
absb=fabs(b);
if (absa > absb) return absa*sqrt(1.0+(absb/absa)*(absb/absa));
else return (absb == 0.0 ? 0.0 : absb*sqrt(1.0+(absa/absb)*(absa/absb)));
}
#define NP 3
#define TINY 1.0e-6
double sqrt(double x)
{
union
{
int i;
double x;
} u;
u.x = x;
u.i = (1<<29) + (u.i >> 1) - (1<<22);
return u.x;
}
int main()
{
int i,j,k;
double ze[NP][NP] = { {2, 0, 0}, {0, 4, 0}, {0, 0, 2} } ;
double **a;
double *d,*e,*f;
d=dvector(1,NP);
e=dvector(1,NP);
f=dvector(1,NP);
a=dmatrix(1,NP,1,NP);
for (i=1;i<=NP;i++)
for (j=1;j<=NP;j++) a[i][j]=ze[i-1][j-1];
tred2(a,NP,d,e);
tqli(d,e,NP,a);
printf("\nEigenvectors for a real symmetric matrix:\n");
for (i=1;i<=NP;i++) {
for (j=1;j<=NP;j++) {
f[j]=0.0;
for (k=1;k<=NP;k++)
f[j] += (ze[j-1][k-1]*a[k][i]);
}
printf("%s %3d %s %10.6f\n","\neigenvalue",i," =",d[i]);
printf("%11s %14s %9s\n","vector","mtrx*vect.","ratio");
for (j=1;j<=NP;j++) {
if (fabs(a[j][i]) < TINY)
printf("%12.6f %12.6f %12s\n",
a[j][i],f[j],"div. by 0");
else
printf("%12.6f %12.6f %12.6f\n",
a[j][i],f[j],f[j]/a[j][i]);
}
}
//free_dmatrix(a,1,NP,1,NP);
//free_dvector(f,1,NP);
//free_dvector(e,1,NP);
//free_dvector(d,1,NP);
return 0;
}
Output:
Eigenvectors for a real symmetric matrix:
eigenvalue 1 = 2.000000
vector mtrx*vect. ratio
1.000000 2.000000 2.000000
0.000000 0.000000 div. by 0
0.000000 0.000000 div. by 0
eigenvalue 2 = 4.000000
vector mtrx*vect. ratio
0.000000 0.000000 div. by 0
1.000000 4.000000 4.000000
0.000000 0.000000 div. by 0
eigenvalue 3 = 2.000000
vector mtrx*vect. ratio
0.000000 0.000000 div. by 0
0.000000 0.000000 div. by 0
1.000000 2.000000 2.000000
I hope it finaly helps to clarify confusion regarding the data preparation for tqli.

How to implement natural logarithm with continued fraction in C?

Here I have a little problem. Create something from this formula:
This is what I have, but it doesn't work. Franky, I really don't understand how it should work.. I tried to code it with some bad instructions. N is number of iteration and parts of fraction. I think it leads somehow to recursion but don't know how.
Thanks for any help.
double contFragLog(double z, int n)
{
double cf = 2 * z;
double a, b;
for(int i = n; i >= 1; i--)
{
a = sq(i - 2) * sq(z);
b = i + i - 2;
cf = a / (b - cf);
}
return (1 + cf) / (1 - cf);
}
The central loop is messed. Reworked. Recursion not needed either. Just compute the deepest term first and work your way out.
double contFragLog(double z, int n) {
double zz = z*z;
double cf = 1.0; // Important this is not 0
for (int i = n; i >= 1; i--) {
cf = (2*i -1) - i*i*zz/cf;
}
return 2*z/cf;
}
void testln(double z) {
double y = log((1+z)/(1-z));
double y2 = contFragLog(z, 8);
printf("%e %e %e\n", z, y, y2);
}
int main() {
testln(0.2);
testln(0.5);
testln(0.8);
return 0;
}
Output
2.000000e-01 4.054651e-01 4.054651e-01
5.000000e-01 1.098612e+00 1.098612e+00
8.000000e-01 2.197225e+00 2.196987e+00
[Edit]
As prompted by #MicroVirus, I found double cf = 1.88*n - 0.95; to work better than double cf = 1.0;. As more terms are used, the value used makes less difference, yet a good initial cf requires fewer terms for a good answer, especially for |z| near 0.5. More work could be done here as I studied 0 < z <= 0.5. #MicroVirus suggestion of 2*n+1 may be close to my suggestion due to an off-by-one of what n is.
This is based on reverse computing and noting the value of CF[n] as n increased. I was surprised the "seed" value did not appear to be some nice integer equation.
Here's a solution to the problem that does use recursion (if anyone is interested):
#include <math.h>
#include <stdio.h>
/* `i` is the iteration of the recursion and `n` is
just for testing when we should end. 'zz' is z^2 */
double recursion (double zz, int i, int n) {
if (!n)
return 1;
return 2 * i - 1 - i * i * zz / recursion (zz, i + 1, --n);
}
double contFragLog (double z, int n) {
return 2 * z / recursion (z * z, 1, n);
}
void testln(double z) {
double y = log((1+z)/(1-z));
double y2 = contFragLog(z, 8);
printf("%e %e %e\n", z, y, y2);
}
int main() {
testln(0.2);
testln(0.5);
testln(0.8);
return 0;
}
The output is identical to the solution above:
2.000000e-01 4.054651e-01 4.054651e-01
5.000000e-01 1.098612e+00 1.098612e+00
8.000000e-01 2.197225e+00 2.196987e+00

Resources