Solve a banded matrix system of equations - c

I need to solve a 2D Poisson equation, that is, a system of equations in the for AX=B where A is an n-by-n matrix and B is a n-by-1 vector. Being A a discretization matrix for the 2D Poisson problem, I know that only 5 diagonals will be not null. Lapack doesn't provide functions to solve this particular problem, but it has functions for solving banded matrix system of equations, namely DGBTRF (for LU factorization) and DGBTRS. Now, the 5 diagonals are: the main diagonal, the first diagonals above and below the main and two diagonals above and below by m diagonals wrt the main diagonal. After reading the lapack documentation about band storage, I learned that I have to create a (3*m+1)-by-n matrix to store A in band storage format, let's call this matrix AB. Now the questions:
1) what is the difference between dgbtrs and dgbtrs_? Intel MKL provides both but I can't understand why
2) dgbtrf requires the band storage matrix to be an array. Should I linearize AB by rows or by columns?
3) is this the correct way to call the two functions?
int n, m;
double *AB;
/*... fill n, m, AB, with appropriate numbers */
int *pivots;
int nrows = 3 * m + 1, info, rhs = 1;
dgbtrf_(&n, &n, &m, &m, AB, &nrows, pivots, &info);
char trans = 'N';
dgbtrs_(&trans, &n, &m, &m, &rhs, AB, &nrows, pivots, B, &n, &info);

It also provides DGBTRS and DGBTRS_. Those are fortran administrativa that you should not care about. Just call dgbtrs (reason is that on some architectures, fortran routine names have underscore appended, on other not, and names may be either upper or lower case -- Intel MKL #defines the right one to dgbtrs).
LAPACK routines expects column major matrices (ie. Fortran style): store columns one after the others. The banded storage you must use is not hard : http://www.netlib.org/lapack/lug/node124.html.
It seems good to me, but please try it on small problems beforehand (always a good idea by the way). Also make sure you handle non-zero info (this is the way errors are reported).
Better style is to use MKL_INT instead of plain int, this is a typedef to the right type (may be different on some architectures).
Also make sure to allocate memory for pivots before calling dgbtrf.

This might be off topic. But for Poisson equation, FFT based solution is much faster. Just do 2D FFT of your potential field, divided by -(k^2+lambda^2) then do IFFT. lambda is a small number to avoid divergence for k=0. The 5-diagonal equation is a band-limited approximation of the Poisson equation, which approximate the differential operator by finite difference.
http://en.wikipedia.org/wiki/Screened_Poisson_equation

Related

sparse matrix multiplication using MKL libraries

I was looking to find a way to perform a symmetric sparse matrix - matrix multiplication:
X = A B
where the sparse matrix A was previously stored in CSR3 format (upper triangular), while the matrix B is a dense non-symmetric matrix. Is there a routine inside the MKL libraries to do it? or do they all need the full sparse matrix in CSR format (to get the handle from) instead of the triangular one (I built the triangular matrix because I need to use it in MKL Pardiso)? I know about the mkl_sparse_d_mv(...) routine, but I couldn't find a way to get the sparse matrix handle from a symmetric sparse matrix previously stored like an upper triangular matrix in CSR format.
thank you in advance,
Daniele
Could you try the mkl_sparse_?_mm, where [?] == s,d,c and z data types.
This routine performs a matrix-matrix operation:
Y := alpha*op(A)X + betaY
where alpha and beta are scalars, A is a sparse matrix, op is a matrix modifier for matrix A, and X and Y are dense matrices.
In most cases you can easily feed a CSR3-stored matrix into sparse_d_create_csr by passing appropriately offset pointers to your row index for pointerB and pointerE.
You can then tell mkl_sparse_d_mm the sparse matrix is triangular and you'd like it to be filled (I have never done this and can't promise that it works).
ok, I can now say the routine needs the full matrix in CSR format, the matrix struct description it will only tells the routine to take one triangle(upper/lower) from the input full CSR matrix, but still it needs it all.

Assemble eigen3 sparsematrix from smaller sparsematrices

I am assembling the jacobian of a coupled multi physics system. The jacobian consists of a blockmatrix on the diagonal for each system and off diagonal blocks for the coupling.
I find it the best to assemble to block separatly and then sum over them with projection matrices to get the complete jacobian.
pseudo-code (where J[i] are the diagonal elements and C[ij] the couplings, P are the projections to the complete matrix).
// diagonal blocks
J.setZero();
for(int i=0;i<N;++i){
J+=P[i]J[i]P[i].transpose()
}
// off diagonal elements
for(int i=0;i<N;++i){
for(int j=i+1;j<N;++j){
J+=P[i]C[ij]P[j].transpose()
J+=P[j]C[ji]P[i].transpose()
}
}
This takes a lot performance, around 20% of the whole programm, which is too much for some assembling. I have to recalculate the jacobian every time step since the system is highly nonlinear.
Valgrind indicates that the ressource consuming method is Eigen::internal::assign_sparse_to_sparse and in this method the call to Eigen::SparseMatrix<>::InsertBackByOuterInner.
Is there a more efficient way to assemble such a matrix?
(I also had to use P*(JP.transpose()) instead of PJ*J.transpose() to make the programm compile, may be there is already something wrong)
P.S: NDEBUG and optimizations are turned on
Edit: by storing P.transpose in a extra matrix ,I get a bit better performance, but the summation accounts still for 15% of the programm
Your code will be much faster by working inplace. First, estimate the number of non-zeros per column in the final matrix and reserve space (if not already done):
int nnz_per_col = ...;
J.reserve(VectorXi::Constant(n_cols, nnz_per_col));
If the number of nnz per column is highly non-uniform, then you can also compute it per column:
VectorXi nnz_per_col(n_cols);
for each j
nnz_per_col(j) = ...;
J.reserve(nnz_per_col);
Then manually insert elements:
for each block B[k]
for each elements i,j
J.coeffRef(foo(i),foo(j)) += B[k](i,j)
where foo implement the appropriate mapping of indices.
And for the next iteration, no need to reserve, but you need to set coefficient values to zero while preserving the structure:
J.coeffs().setZero();

C Code Wavelet Transform and Explanation

I am trying to implement a wavelet transform in C and I have never done it before. I have read some about Wavelets, and understand the 'growing subspaces' idea, and how Mallat's one sided filter bank is essentially the same idea.
However, I am stuck on how to actually implement Mallat's fast wavelet transform. This is what I understand so far:
The high pass filter, h(t), gives you the detail coefficients. For a given scale j, it is a reflected, dilated, and normed version of the mother wavelet W(t).
g(t) is then the low pass filter that makes up the difference. It is supposed to be the quadrature mirror of h(t)
To get the detail coefficients, or the approximation coefficients for the jth level, you need to convolve your signal block with h(t) or g(t) respectively, and downsample the signal by 2^{j} (ie take every 2^{j} value)
However these are my questions:
How can I find g(t) when I know h(t)?
How can I compute the inverse of this transform?
Do you have any C code that I can reference? (Yes I found the one on wiki but it doesn't help)
What I would like some code to say is:
A. Here is the filter
B. Here is the transform (very explicitly)
C.) Here is the inverse transform (again for dummies)
Thanks for your patience, but there doesn't seem to be a Step1 - Step2 - Step3 -- etc guide out there with explicit examples (that aren't HAAR because all the coefficients are 1s and that makes things confusing).
the Mallat recipe for the fwt is really simple. If you look at the matlab code, eg the script by Jeffrey Kantor, all the steps are obvious.
In C it is a bit more work but that is mainly because you need to take care of your own declarations and allocations.
Firstly, about your summary:
usually the filter h is a lowpass filter, representing the scaling function (father)
likewise, g is usually the highpass filter representing the wavelet (mother)
you cannot perform a J-level decomposition in 1 filtering+downsampling step. At each level, you create an approximation signal c by filtering with h and downsampling, and a detail signal d by filtering with g and downsampling, and repeat this at the next level (using the current c)
About your questions:
for a filter h of an an orthogonal wavelet basis, [h_1 h_2 .. h_m h_n], the QMF is [h_n -h_m .. h_2 -h_1], where n is an even number and m==n-1
the inverse transform does the opposite of the fwt: at each level it upsamples detail d and approximation c, convolves d with g and c with h, and adds the signals together -- see the corresponding matlab script.
Using this information, and given a signal x of len points of type double, scaling h and wavelet g filters of f coefficients (also of type double), and a decomposition level lev, this piece of code implements the Mallat fwt:
double *t=calloc(len+f-1, sizeof(double));
memcpy(t, x, len*sizeof(double));
for (int i=0; i<lev; i++) {
memset(y, 0, len*sizeof(double));
int len2=len/2;
for (int j=0; j<len2; j++)
for (int k=0; k<f; k++) {
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
}
len=len2;
memcpy(t, y, len*sizeof(double));
}
free(t);
It uses one extra array: a 'workspace' t to copy the approximation c (the input signal x to start with) for the next iteration.
See this example C program, which you can compile with gcc -std=c99 -fpermissive main.cpp and run with ./a.out.
The inverse should also be something along these lines. Good luck!
The only thing that is missing is some padding for the filter operation.
The lines
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
exceed the boundaries of the t-array during first iteration and exceed the approximation part of the array during the following iterations. One must add (f-1) elements at the beginning of the t-array.
double *t=calloc(len+f-1, sizeof(double));
memcpy(&t[f], x, len*sizeof(double));
for (int i=0; i<lev; i++) {
memset(t, 0, (f-1)*sizeof(double));
memset(y, 0, len*sizeof(double));
int len2=len/2;
for (int j=0; j<len2; j++)
for (int k=0; k<f; k++) {
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
}
len=len2;
memcpy(&t[f], y, len*sizeof(double));
}

What does the Parameter superb in LAPACKE_dgesvd(..) mean?

Putting up questions like this one raises a bad conscience... nevertheless I find it surprisingly difficult to google this one. I am experimenting with
lapack_int LAPACKE_dgesvd(
int matrix_order, char jobu, char jobvt,
lapack_int m, lapack_int n, double* a,
lapack_int lda, double* s, double* u, lapack_int ldu,
double* vt, lapack_int ldvt, double* superb);
which promises a Singular Value Decomposition. Having already stopped to fear Fortran I found a gold mine of information here: http://www.netlib.no/netlib/lapack/double/dgesvd.f
Actually that link's target explains all parameters but the LAPACKE specific double* superb (well and the order parameter, but in FORTRAN all is COL_MAJOR).
Next, here http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/lapacke_dgesvd_row.c.htm I found a program which seems to hint at 'this is some kind of worker cache'.
However, if that were true what is the reason for LAPACKE_dgesvd_work(..)?
In addition I have a second question: In the example they use min(M,N)-1 as a size for superb. Why?
According to http://www.netlib.no/netlib/lapack/double/dgesvd.f , about parameter WORK of the fortran version :
WORK (workspace/output) DOUBLE PRECISION array, dimension (MAX(1,LWORK))
On exit, if INFO = 0, WORK(1) returns the optimal LWORK; if INFO > 0, WORK(2:MIN(M,N)) contains the unconverged superdiagonal elements of an upper bidiagonal matrix B whose diagonal is in S (not necessarily sorted). B satisfies A = U * B * VT, so it has the same singular values as A, and singular vectors related by U and VT.
There is a chance that superb is the superdiagonal of this upper bidiagonal matrix B which has the same singular values as A. This also explain the length min(n,m)-1
A look at lapack-3.5.0/lapacke/src/lapacke_dgesvd.c downloaded from http://www.netlib.org/lapack/ confirms it.
The source code also shows that the high level function lapacke_dgesvd() calls the middle level interface lapacke_dgesvd_work(). If you use the high level interface, you don't have to care about the optimal size of WORK. It will be computed and WORK will be allocated in lapacke_dgesvd()
I wonder if there is any gain to use the middle level interface instead...Maybe when this function is called many times on little matrices of same sizes...
Bye,
Francis

C Programming : Sum of third upper anti-diagonal a squared matrix , help badly needed please

im doing a short course in c programming and i have been so busy lately with my other classes and and helping my bother prepare for his wedding (as im his best man)that I have fallen behind and need help. any help towards this short assignment would be much appreciated as im not familiar at all with matrixs and its due in a few days.
the assignment is to Sum of third upper anti-diagonal a squared matrix .
i have been given this information:
The matrix should be a square, integer matrix of size N. In this assignment the matrix will be stored
in a 1d block of memory. You will have to convert between the conceptual 2d matrix addressing and
1d block addressing with pointer arithmetic.
Note on random numbers:
rand() function returns the next integer a sequence of pseudo-random integers in the range
[0, RAND_MAX]. RAND_MAX is a very large number and varies from system to system. To get an
integer in the range [min, max]:
(int)((min)+rand()/(RAND_MAX+1.0) * ((max)-(min)+1))
srand(SEED) is used to set the seed for rand. If srand() is called with the same seed value, the
sequence of pseudo-random numbers is repeated. To get different random numbers each time a
programme runs use time(NULL) as the seed. Rand is in stdlib.h, which needs to be included.
The program should be structured as follows.
#define N 32 // Matrix size
#define MYSEED 1234 // Last 4 digits of your student number.
int *initialise( ) // Allocate memory for the matrix using malloc
// Initialise the matrix with random integers x, 1≤ x ≤ 9
// Use 'MYSEED' as the seed in the random generator.
// Return a pointer to the matrix
void print_matrix(int *) // Print matrix on screen
int compute_diagonal(int *) // Compute your required calculation using pointer arithmetic.
// (square bracket [ ] array indexes shouldn't be used)
// Return the solution.
void finalise(int *) //free the allocated memory.
int main() // The main function should print the solution to screen.
Without doing your homework assignment for you, here's a tip:
Make a functions that abstract storing and retrieving values out of the matrix. One of the signatures should look a lot like this:
int retrieve(int* matrix, int row, int col);
Ok since this is homework and you still have a few days I will not give you an exact answer here. But I will give you some thoughts with which it should be pretty easy to come to your answer.
Indexing a matrix 1D-way: You are not allowed to use matrix[x][y] here, but only a one-dimensional array. So just take a minute and think of how the index (x,y) can be computed within a 1D array. Keep in mind that C stores elements rowwise (i.e. the elements matrix[0][0], matrix[0][1], matrix[0][2] refer to matrix[0], matrix[1], matrix[2]). It is a simply forumla in terms of X, Y and N
Filling the matrix randomly is easy, the function to create a random int is already given, just walk the matrix along and fill every element
Adding the third anti-upper diagonal. This isn really a programming question. Just sketch a small matrix on a piece of paper and see what elements you have to add. Look at their indices and than combine your newly gained knowledge with your result from my point 1 and you will know what to add up
Edit: Since you are not allowed to use the bracket operator, keep in mind that matrix[5] is the same as *(matrix+5).
I think it's fair to tell you this ;)

Resources