Symmetric Matrix Inversion in C using CBLAS/LAPACK - c

I am writing an algorithm in C that requires Matrix and Vector multiplications. I have a matrix Q (W x W) which is created by multiplying the transpose of a vector J(1 x W) with itself and adding Identity matrix I, scaled using scalar a.
Q = [(J^T) * J + aI].
I then have to multiply the inverse of Q with vector G to get vector M.
M = (Q^(-1)) * G.
I am using cblas and clapack to develop my algorithm. When matrix Q is populated using random numbers (type float) and inverted using the routines sgetrf_ and sgetri_ , the calculated inverse is correct.
But when matrix Q is symmetrical, which is the case when you multiply (J^T) x J, the calculated inverse is wrong!!.
I am aware of the row-major (in C) and column-major (in FORTRAN) format of arrays while calling lapack routines from C, but for a symmetrical matrix this should not be a problem as A^T = A.
I have attached my C function code for matrix inversion below.
I am sure there is a better way to solve this. Can anyone help me with this?
A solution using cblas would be great...
Thanks.
void InverseMatrix_R(float *Matrix, int W)
{
int LDA = W;
int IPIV[W];
int ERR_INFO;
int LWORK = W * W;
float Workspace[LWORK];
// - Compute the LU factorization of a M by N matrix A
sgetrf_(&W, &W, Matrix, &LDA, IPIV, &ERR_INFO);
// - Generate inverse of the matrix given its LU decompsotion
sgetri_(&W, Matrix, &LDA, IPIV, Workspace, &LWORK, &ERR_INFO);
// - Display the Inverted matrix
PrintMatrix(Matrix, W, W);
}
void PrintMatrix(float* Matrix, int row, int colm)
{
int i,k;
for (i =0; i < row; i++)
{
for (k = 0; k < colm; k++)
{
printf("%g, ",Matrix[i*colm + k]);
}
printf("\n");
}
}

I don't know BLAS or LAPACK, so I have no idea what may cause this behaviour.
But, for matrices of the given form, calculating the inverse is quite easy. The important fact for this is
(J^T*J)^2 = (J^T*J)*(J^T*J) = J^T*(J*J^T)*J = <J|J> * (J^T*J)
where <u|v> denotes the inner product (if the components are real - the canonical bilinear form for complex components, but then you'd probably consider not the transpose but the conjugate transpose, and you'd be back at the inner product).
Generalising,
(J^T*J)^n = (<J|J>)^(n-1) * (J^T*J), for n >= 1.
Let us denote the symmetric square matrix (J^T*J) by S and the scalar <J|J> by q. Then, for general a != 0 of sufficiently large absolute value (|a| > q):
(a*I + S)^(-1) = 1/a * (I + a^(-1)*S)^(-1)
= 1/a * (I + ∑ (-1)^k * a^(-k) * S^k)
k>0
= 1/a * (I + (∑ (-1)^k * a^(-k) * q^(k-1)) * S)
k>0
= 1/a * (I - 1/(a+q)*S)
= 1/a*I - 1/(a*(a+q))*S
That formula holds (by analyticity) for all a except a = 0 and a = -q, as can be verified by calculating
(a*I + S) * (1/a*I - 1/(a*(a+q))*S) = I + 1/a*S - 1/(a+q)*S - 1/(a*(a+q))*S^2
= I + 1/a*S - 1/(a+q)*S - q/(a*(a+q))*S
= I + ((a+q) - a - q)/(a*(a+q))*S
= I
using S^2 = q*S.
That calculation is also much simpler and more efficient than first finding the LU decomposition.

You may want to try Armadillo, which is an easy to use C++ wrapper for LAPACK. It provides several inverse related functions:
inv(), general inverse, with an optional speedup for symmetric positive definite matrices
pinv(), pseudo-inverse
solve(), solve a system of linear equations (that can be over- or under-determined), without doing the actual inverse

Example for 3x3 matrix inversion, visit sgetri.f for more
//__CLPK_integer is typedef of int
//__CLPK_real is typedef of float
__CLPK_integer ipiv[3];
{
//Compute LU lower upper factorization of matrix
__CLPK_integer m=3;
__CLPK_integer n=3;
__CLPK_real *a=(float *)this->m1;
__CLPK_integer lda=3;
__CLPK_integer info;
sgetrf_(&m, &n, a, &lda, ipiv, &info);
}
{
//compute inverse of a matrix
__CLPK_integer n=3;
__CLPK_real *a=(float *)this->m1;
__CLPK_integer lda=3;
__CLPK_real work[3];
__CLPK_integer lwork=3;
__CLPK_integer info;
sgetri_(&n, a, &lda, ipiv, work, &lwork, &info);
}

Related

"no operator found" when asigning Sparse matrices results to sparse matrices

I do have a function that implements a minimization algorithm. I didn't include all the vars, just the matrices to illustrate the types:
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::VectorXd Vec;
int lm_solver(void (*f_dz)(Vec* x_, int m, Vec* dz_, SpMat* W_),
void (*f_H)(Vec* x_, SpMat* jac_,int n_, int m_),
Vec* x, int nx, int mm, int nnz,
double tol=1e-9, int max_iter = 100){
SpMat A(mm, nx);
SpMat H1(mm, nx);
SpMat H2(mm, nx);
SpMat H(mm, nx);
SpMat W(mm, mm);
Vec rhs(nx);
Vec dz(nx);
Vec dx(nx);
Vec a(1);
Vec b(1);
double f, f_prev, lbmda, rho, nu, tau;
bool updateH, converged;
int iter_;
// reserve matrices memory
H.reserve(nnz);
W.reserve(mm);
while (!converged && iter_ < max_iter){
// get the system matrices
if (updateH){ // if the Jacobian computation is not locked...
f_dz(x, mm, &dz, &W); // Residual increment (z-h(x)) vector creation or update: fill dz and W
f_H(x, &H, nx, mm); // Jacobian matrix creation or update: fill H
// Start forming the auxiliary matrices of A
H1 = H.transpose() * W;
H2 = H1 * H;
}
// set the first value of lmbda
if (iter_ == 1)
lbmda = tau * H2.diagonal().maxCoeff();
// form the system matrix A = H^t·W·H + lambda·I
A = H2 + lbmda * Idn;
// form the right hand side: H^t·W·dz
rhs = H1 * dz;
// Solve the increment: dx = solve(A, rhs);
solver.compute(A);
dx = solver.solve(rhs);
// calculate the objective function: Least squares function
a = 0.5 * dz * W * dz; //vector x matrix x vector -> vector of 1 element
f = a.coeffRef(0);
// calculate the gain ratio
b = 0.5 * dx * (lbmda * dx - rhs); //vector x matrix x vector -> vector of 1 element
rho = (f_prev - f) / b.coeffRef(0);
}
return 0;
}
The process does the following:
Declare sparse matrices matrices (SpMat)
reserve matrices memory
Call external functions to fill H, dz and W
Do matrices multiplications and store the results into intermediate matrices
that are sparse too.
This function is the only function in a .h file that is compiled into a static library .lib
When I compile the static library alone, it compiles flawlessly.
However when I use the library project from another project, I get the following error:
error: C2679: binary '=' : no operator found which takes a right-hand operand of type 'const Eigen::CwiseBinaryOp' (or there is no acceptable conversion)
\eigen\src/Core/Matrix.h(206): could be 'Eigen::Matrix<_Scalar,_Rows,_Cols> &Eigen::Matrix<_Scalar,_Rows,_Cols>::operator =(const Eigen::Matrix<_Scalar,_Rows,_Cols> &)'
with
[
_Scalar=double,
_Rows=-1,
_Cols=1
]
d:\proyectos\proyectos_i+d\ingrid\eigen\eigen_3_3_3\eigen\src/Core/Matrix.h(281): or 'Eigen::Matrix<_Scalar,_Rows,_Cols> &Eigen::Matrix<_Scalar,_Rows,_Cols>::operator =(Eigen::Matrix<_Scalar,_Rows,_Cols> &&)'
with
[
_Scalar=double,
_Rows=-1,
_Cols=1
]
while trying to match the argument list '(Vec, const Eigen::CwiseBinaryOp)'
This error flags the lines:
H1 = H.transpose() * W;
H2 = H1 * H;
rhs = H1 * dz;
b = 0.5 * dx * (lbmda * dx - rhs);
a = 0.5 * dz * W * dz;
I understand from this that I cannot store the result of sparse matrices multiplications in a new sparse matrix. I don't know the solution to this.
(I'm using Eigen 3.3.3)
I don't see what lines exactly cause your error, but it looks rather like it is caused by calculating a and b. You can't multiply a col-vector by another col-vector without transposing it, e.g.
b = 0.5 * dx.transpose() * (lbmda * dx - rhs);
However, this is actually a dot product, so you should just write
double b = 0.5 * dx.dot(lbmda * dx - rhs);
The problem was that I wrote all the functions in the .h.
By putting the body of the function on the .cpp all went fine.
This dicotomy of .h and .cpp is what anoys me the most about c++.
Anyway, for future reference.

Linear indexing of Matlab matrices in MEX file

I have a NxN symmetric matrix F of the following form
F_11 F_12 F_13 ... F_1N
F_21 ...
F_31
.
.
.
F_N1 F_N2 F_N3 ... F_NN
with each submatrices F_IJ of size m x m.
This matrix is created in MatLab, and will be used in a C-programm. So the values are stored in a vector columnwise. (E.g the vector will be of the form : (F_11_11,F_11_21,F_11_31,...F_11_m1,F_21_11,...F_NN_(m-1)m,F_NN_mm).
My question is the following: For readability I would like to define in C a way to access the values of F, given the indices (I,J) of the location of the first submatrix, and the indices (i,j) of the location of value in the submatrix. How can I link the linear indexing of the matrix to the (I,J,i,j) indices?
I assume all indices to be zero based, as usual in C/C++. If you want to use Matlab style one based indices, subtract one from each index.
I didn't check it, but I guess your index should be...
int idx = I*m+J*N*m*m+i+j*N*m;
You can write a function that calculates the index. Note that in C, indices start at 0.
size_t index_of_2d(size_t x, size_t y, size_t n) {
return x + y*n;
}
size_t index_of_4d(size_t I, size_t J, size_t N, size_t i, size_t j, size_t m) {
size_t submatrix = index_of_2d(I, J, N) * m * m; // scale the index in super matrix by the size of the submatrix
return submatrix + index_of_2d(i, j, m);
}

Matrix operations using code vectorization

I have written a function to do the transpose of a 4x4 matrix, but I do not know how to extend the code for a matrix m x n.
Where can I find maybe some sample code on matrix operations with SSE? product, transpose, inverse, etc?
This is the code of transpose 4x4:
void transpose(float* src, int n) {
__m128 row0, row1, row2, row3;
__m128 tmp1;
tmp1=_mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+ 4));
row1=_mm_loadh_pi(_mm_loadl_pi(row1, (__m64*)(src+8)), (__m64*)(src+12));
row0=_mm_shuffle_ps(tmp1, row1, 0x88);
row1=_mm_shuffle_ps(row1, tmp1, 0xDD);
tmp1=_mm_movelh_ps(tmp1, row1);
row1=_mm_movehl_ps(tmp1, row1);
tmp1=_mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src+ 2)), (__m64*)(src+ 6));
row3= _mm_loadh_pi(_mm_loadl_pi(row3, (__m64*)(src+10)), (__m64*)(src+14));
row2=_mm_shuffle_ps(tmp1, row3, 0x88);
row3=_mm_shuffle_ps(row3, tmp1, 0xDD);
tmp1=_mm_movelh_ps(tmp1, row3);
row3=_mm_movehl_ps(tmp1, row3);
_mm_store_ps(src, row0);
_mm_store_ps(src+4, row1);
_mm_store_ps(src+8, row2);
_mm_store_ps(src+12, row3);
}
I'm not sure how to do a in-place transpose for arbitrary matrices using SIMD efficiently but I do know how to do it for out-of-place. Let me describe how to do both
In place transpose
For in-place transpose you should see Agner Fog's Optimizing software in C++ manual. See section 9.10 "Cache contentions in large data structures" example 9.5a. For certain matrix sizes you will see a large drop in performance due to cache aliasing. See table 9.1 for examples and this Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?. Agner gives a way to fix this using loop tiling (similar to what Paul R described) in Example 9.5b.
Out of place transpose
See my answer here (the one with the most votes) What is the fastest way to transpose a matrix in C++?. I have not looked into this in ages but let me just repeat my code here:
inline void transpose4x4_SSE(float *A, float *B, const int lda, const int ldb) {
__m128 row1 = _mm_load_ps(&A[0*lda]);
__m128 row2 = _mm_load_ps(&A[1*lda]);
__m128 row3 = _mm_load_ps(&A[2*lda]);
__m128 row4 = _mm_load_ps(&A[3*lda]);
_MM_TRANSPOSE4_PS(row1, row2, row3, row4);
_mm_store_ps(&B[0*ldb], row1);
_mm_store_ps(&B[1*ldb], row2);
_mm_store_ps(&B[2*ldb], row3);
_mm_store_ps(&B[3*ldb], row4);
}
inline void transpose_block_SSE4x4(float *A, float *B, const int n, const int m, const int lda, const int ldb ,const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
int max_i2 = i+block_size < n ? i + block_size : n;
int max_j2 = j+block_size < m ? j + block_size : m;
for(int i2=i; i2<max_i2; i2+=4) {
for(int j2=j; j2<max_j2; j2+=4) {
transpose4x4_SSE(&A[i2*lda +j2], &B[j2*ldb + i2], lda, ldb);
}
}
}
}
}
Here is one general approach you can use for transposing an NxN matrix using tiling. You could even use your existing 4x4 transpose and work with a 4x4 tile size:
for each 4x4 block in the matrix with top left indices r, c
if block is on diagonal (i.e. if r == c)
get block a = 4x4 block at r, c
transpose block a
store block a at r, c
else if block is above diagonal (i.e. if r < c)
get block a = 4x4 block at r, c
get block b = 4x4 block at c, r
transpose block a
transpose block b
store transposed block a at c, r
store transposed block b at r, c
else // block is below diagonal
do nothing
endif
endfor
Obviously N needs to be a multiple of 4 for this to work, otherwise you will need to do some additional housekeeping.
As mentioned above in the comments, an MxN in-place transpose is hard to do - you need to either use an additional temporary matrix (which effectively makes it a not-in-place transpose) or use the method described here, but this will be much harder to vectorize with SIMD.

covariance matrix gsl

I am trying to calculate the Mahalanobis distance between two vectors a and b. Eventually, I will be using this as a distance measure in statistical algorithms. I am using gsl to implement them. The formula for the mahalanobis distance is sqrt((a-b)'c^-1(a-b)), where c is the covariance matrix. According to this gsl documentation, it takes in two data sets and returns one covariance value. I am not sure how to calculate the covariance matrix using that.
Any help is appreciated.
Thanks.
I think you need to understand the calcuation of a covariance matrix first, second heres a sample code to get you started
for (i = 0; i < A->size1; i++) {
for (j = i; j < A->size2; j++) {
a = gsl_matrix_column (A, i);
b = gsl_matrix_column (A, j);
double cov = gsl_stats_covariance(a.vector.data, a.vector.stride,b.vector.data, b.vector.stride, a.vector.size);
gsl_matrix_set (C, i, j, cov);
}
}

Finding the squares in a plane given n points

Given n points in a plane , how many squares can be formed ...??
I tried this by calculating the distances between each 2 points , then sort them , and look for the squares in the points with four or more equal distances after verifying the points and slopes.
But this looks like an approach with very high complexity . Any other ideas ...??
I thought dynamic programming for checking for line segments of equal distances might work ... but could not get the idea quite right ....
Any better ideas???
P.S : The squares can be in any manner . They can overlap , have a common side, one square inside another ...
If possible please give a sample code to perform the above...
Let d[i][j] = distances between points i and j. We are interested in a function count(i, j) that returns, as fast as possible, the number of squares that we can draw by using points i and j.
Basically, count(i, j) will have to find two points x and y such that d[i][j] = d[x][y] and check if these 4 points really define a square.
You can use a hash table to solve the problem in O(n^2) on average. Let H[x] = list of all points (p, q) that have d[p][q] = x.
Now, for each pair of points (i, j), count(i, j) will have to iterate H[ d[i][j] ] and count the points in that list that form a square with points i and j.
This should run very fast in practice, and I don't think it can ever get worse than O(n^3) (I'm not even sure it can ever get that bad).
This problem can be solved in O(n^1.5) time with O(n) space.
The basic idea is to group the points by X or Y coordinate, being careful to avoid making groups that are too large. The details are in the paper Finding squares and rectangles in sets of points. The paper also covers lots of other cases (allowing rotated squares, allowing rectangles, and working in higher dimensions).
I've paraphrased their 2d axis-aligned square finding algorithm below. Note that I changed their tree set to a hash set, which is why the time bound I gave is not O(n^1.5 log(n)):
Make a hash set of all the points. Something you can use to quickly check if a point is present.
Group the points by their X coordinate. Break any groups with more than sqrt(n) points apart, and re-group those now-free points by their Y coordinate. This guarantees the groups have at most sqrt(n) points and guarantees that for each square there's a group that has two of the square's corner points.
For every group g, for every pair of points p,q in g, check whether the other two points of the two possible squares containing p and q are present. Keep track of how many you find. Watch out for duplicates (are the two opposite points also in a group?).
Why does it work? Well, the only tricky thing is the regrouping. If either the left or right columns of a square are in groups that are not too large, the square will get found when that column group gets iterated. Otherwise both its top-left and top-right corners get regrouped, placed into the same row group, and the square will be found when that row group gets iterated.
I have a O(N^2) time, O(N) space solution:
Assume given points is an array of object Point, each Point has x,y.
First iterate through the array and add each item into an HashSet: This action de-duplicate and give us an O(1) access time. The whole process takes O(N) time
Using Math, Say vertices A, B, C, D can form a square, AC is known and it's a diagonal line, then the corresponding B, D is unique. We could write a function to calculate that. This process is O(1) time
Now Let's get back to our thing. write a for-i-loop and a for-j-inner-loop. Say input[i] and input[j] form a diagonal line, find its anti-diagonal line in the set or not: If exist, counter ++; This process take O(N^2) time.
My code in C#:
public int SquareCount(Point[] input)
{
int count = 0;
HashSet<Point> set = new HashSet<Point>();
foreach (var point in input)
set.Add(point);
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < input.Length; j++)
{
if (i == j)
continue;
//For each Point i, Point j, check if b&d exist in set.
Point[] DiagVertex = GetRestPints(input[i], input[j]);
if (set.Contains(DiagVertex[0]) && set.Contains(DiagVertex[1]))
{
count++;
}
}
}
return count;
}
public Point[] GetRestPints(Point a, Point c)
{
Point[] res = new Point[2];
int midX = (a.x + c.y) / 2;
int midY = (a.y + c.y) / 2;
int Ax = a.x - midX;
int Ay = a.y - midY;
int bX = midX - Ay;
int bY = midY + Ax;
Point b = new Point(bX,bY);
int cX = (c.x - midX);
int cY = (c.y - midY);
int dX = midX - cY;
int dY = midY + cX;
Point d = new Point(dX,dY);
res[0] = b;
res[1] = d;
return res;
}
It looks like O(n^3) to me. A simple algo might be something like:
for each pair of points
for each of 3 possible squares which might be formed from these two points
test remaining points to see if they coincide with the other two vertices
Runtime: O(nlog(n)^2), Space: θ(n), where n is the number of points.
For each point p
Add it to the existing arrays sorted in the x and y-axis respectively.
For every pair of points that collide with p in the x and y-axis respectively
If there exists another point on the opposite side of p, increment square count by one.
The intuition is counting how many squares a new point creates. All squares are created on the creation of its fourth point. A new point creates a new square if it has any colliding points on concerned axes and there exists the "fourth" point on the opposite side that completes the square. This exhausts all the possible distinct squares.
The insertion into the arrays can be done binary, and checking for the opposite point can be done by accessing a hashtable hashing the points' coordinates.
This algorithm is optimal for sparse points since there will be very little collision points to check. It is pessimal for dense-squares points for the opposite of the reason for that of optimal.
This algorithm can be further optimized by tracking if points in the axis array have a collision in the complementary axis.
Just a thought: if a vertex A is one corner of a square, then there must be vertices B, C, D at the other corners with AB = AD and AC = sqrt(2)AB and AC must bisect BD. Assuming every vertex has unique coordinates, I think you can solve this in O(n^2) with a hash table keying on (distance, angle).
This is just an example implementation in Java - any comments welcome.
import java.util.Arrays;
import java.util.NoSuchElementException;
import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.ArrayList;
public class SweepingLine {
public static void main(String[] args) {
Point[] points = {
new Point(1,1),
new Point(1,4),
new Point(4,1),
new Point(4,4),
new Point(7,1),
new Point(7,4)
};
int max = Arrays.stream(points).mapToInt(p -> p.x).max().orElseThrow(NoSuchElementException::new);
int count = countSquares(points, max);
System.out.println(String.format("Found %d squares in %d x %d plane", count, max, max));
}
private static int countSquares(Point[] points, int max) {
int count = 0;
Map<Integer, List<Integer>> map = new HashMap<>();
for (int x=0; x<max; x++) {
for (int y=0; y<max; y++) {
for(Point p: points) {
if (p.x == x && p.y == y) {
List<Integer> ys = map.computeIfAbsent(x, _u -> new ArrayList<Integer>());
ys.add(y);
Integer ley = null;
for (Integer ey: ys) {
if (ley != null) {
int d = ey - ley;
for (Point p2: points) {
if (x + d == p2.x && p2.y == ey){
count++;
}
}
}
ley = ey;
}
}
}
}
}
return count;
}
private static class Point {
public final int x;
public final int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
}
Here is a complete implemention of finding the diagonal points in C++!
Given points a and c, return b and d, which lie on the opposite diagonal
If b or d are not integer points, dicard them (optional)
To find all squares generated by n points, can check out this C++ implementation
Idea credited to Kevman. Hope it can help!
vector<vector<int>> createDiag(vector<int>& a, vector<int>& c){
double midX = (a[0] + c[0])/2.0;
double midY = (a[1] + c[1])/2.0;
double bx = midX - (a[1] - midY);
double by = midY + (a[0] - midX);
double dx = midX - (c[1] - midY);
double dy = midY + (c[0] - midX);
// discard the non-integer points
double intpart;
if(modf(bx, &intpart) != 0 or modf(by, &intpart) != 0 or modf(dx, &intpart) != 0 or modf(dy, &intpart) != 0){
return {{}};
}
return {{(int)bx, (int)by}, {(int)dx, (int)dy}};
}

Resources