MKL Matrix Transpose

MKL Matrix Transpose - c

I have a very large rectangular and square float as well as complex matrix. I want to know Is there any in place MKL transpose routine? There is mkl_?imatcopy in MKL, please help me with an example.
I have tried this, but it didnot transpose matrix
size_t nEle = noOfCols * noOfRows;
float *data = (float*)calloc(nEle,sizeof(float));
initalizeData(data,noOfCols,noOfRows);
printdata(data,noOfCols,noOfRows);
printf("After transpose \n\n");
mkl_simatcopy('R','T',noOfCols,noOfRows,1,data,noOfPix,noOfCols);
//writeDataFile((char *)data,"AfterTranspose.img",nEle*sizeof(float));
printdata(data,noOfCols,noOfRows);

You may try to look at the existing in-place transposition routines for float real and complex datatypes. MKL package contains such examples: cimatcopy.c dimatcopy.c simatcopy.c zimatcopy.c. Please refer to the mklroot/examples/transc/source directory

Related

sparse matrix multiplication using MKL libraries

I was looking to find a way to perform a symmetric sparse matrix - matrix multiplication:
X = A B
where the sparse matrix A was previously stored in CSR3 format (upper triangular), while the matrix B is a dense non-symmetric matrix. Is there a routine inside the MKL libraries to do it? or do they all need the full sparse matrix in CSR format (to get the handle from) instead of the triangular one (I built the triangular matrix because I need to use it in MKL Pardiso)? I know about the mkl_sparse_d_mv(...) routine, but I couldn't find a way to get the sparse matrix handle from a symmetric sparse matrix previously stored like an upper triangular matrix in CSR format.
thank you in advance,
Daniele

Could you try the mkl_sparse_?_mm, where [?] == s,d,c and z data types.
This routine performs a matrix-matrix operation:
Y := alpha*op(A)X + betaY
where alpha and beta are scalars, A is a sparse matrix, op is a matrix modifier for matrix A, and X and Y are dense matrices.

In most cases you can easily feed a CSR3-stored matrix into sparse_d_create_csr by passing appropriately offset pointers to your row index for pointerB and pointerE.
You can then tell mkl_sparse_d_mm the sparse matrix is triangular and you'd like it to be filled (I have never done this and can't promise that it works).

ok, I can now say the routine needs the full matrix in CSR format, the matrix struct description it will only tells the routine to take one triangle(upper/lower) from the input full CSR matrix, but still it needs it all.

Eigen QR decomposition results differs from two methods

I try to use QR decomposition using Eigen, but the results get from the following tow methods is different, please help me to find out the error!
Thanks.
// Initialize the sparse matrix
A.setFromTriplets(triplets.begin(), triplets.end());
A.makeCompressed();
//Dense matrix method
MatrixXd MatrixA = A;
HouseholderQR<MatrixXd> qr(MatrixA);
MatrixXd Rr = qr.matrixQR().triangularView<Upper>();
//Sparse matrix method
SparseQR < SparseMatrix < double >, COLAMDOrdering< int > > qr;
qr.compute(A);
SparseMatrix<double, RowMajor> Rr = qr.matrixR();

This is because SparseQR performs column reordering to both reduce fill-in and achieve a nearly rank-revealing decomposition, similar to ColPivHouseholderQR. More precisely, HouseholderQR computes: A = Q*R, whereas SparseQR computes: A*P = Q*R. So it is expected that the two R triangular factors are different.

Elementwise product between a vector and a matrix using GNU Blas subroutines

I am working on C, using GNU library for scientific computing. Essentially, I need to do the equivalent of the following MATLAB code:
x=x.*(A*x);
where x is a gsl_vector, and A is a gsl_matrix.
I managed to do (A*x) with the following command:
gsl_blas_dgemv(CblasNoTrans, 1.0, A, x, 1.0, res);
where res is an another gsl_vector, which stores the result. If the matrix A has size m * m, and vector x has size m * 1, then vector res will have size m * 1.
Now, what remains to be done is the elementwise product of vectors x and res (the results should be a vector). Unfortunately, I am stuck on this and cannot find the function which does that.
If anyone can help me on that, I would be very grateful. In addition, does anyone know if there is some better documentation of GNU rather than https://www.gnu.org/software/gsl/manual/html_node/GSL-BLAS-Interface.html#GSL-BLAS-Interface which so far is confusing me.
Finally, would I lose in time performance if I do this step by simply using a for loop (the size of the vector is around 11000 and this step will be repeated 500-5000 times)?
for (i = 0; i < m; i++)
gsl_vector_set(res, i, gsl_vector_get(x, i) * gsl_vector_get(res, i));
Thanks!

The function you want is:
gsl_vector_mul(res, x)
I have used Intel's MKL, and I like the documentation on their website for these BLAS routines.

The for-loop is ok if GSL is well designed. For example gsl_vector_set() and gsl_vector_get() can be inlined. You could compare the running time with gsl_blas_daxpy. The for-loop is well optimized if the timing result is similar.
On the other hand, you may want to try a much better matrix library Eigen, with which you can implement your operation with the code similar to this
x = x.array() * (A * x).array();

Solve a banded matrix system of equations

I need to solve a 2D Poisson equation, that is, a system of equations in the for AX=B where A is an n-by-n matrix and B is a n-by-1 vector. Being A a discretization matrix for the 2D Poisson problem, I know that only 5 diagonals will be not null. Lapack doesn't provide functions to solve this particular problem, but it has functions for solving banded matrix system of equations, namely DGBTRF (for LU factorization) and DGBTRS. Now, the 5 diagonals are: the main diagonal, the first diagonals above and below the main and two diagonals above and below by m diagonals wrt the main diagonal. After reading the lapack documentation about band storage, I learned that I have to create a (3*m+1)-by-n matrix to store A in band storage format, let's call this matrix AB. Now the questions:
1) what is the difference between dgbtrs and dgbtrs_? Intel MKL provides both but I can't understand why
2) dgbtrf requires the band storage matrix to be an array. Should I linearize AB by rows or by columns?
3) is this the correct way to call the two functions?
int n, m;
double *AB;
/*... fill n, m, AB, with appropriate numbers */
int *pivots;
int nrows = 3 * m + 1, info, rhs = 1;
dgbtrf_(&n, &n, &m, &m, AB, &nrows, pivots, &info);
char trans = 'N';
dgbtrs_(&trans, &n, &m, &m, &rhs, AB, &nrows, pivots, B, &n, &info);

It also provides DGBTRS and DGBTRS_. Those are fortran administrativa that you should not care about. Just call dgbtrs (reason is that on some architectures, fortran routine names have underscore appended, on other not, and names may be either upper or lower case -- Intel MKL #defines the right one to dgbtrs).
LAPACK routines expects column major matrices (ie. Fortran style): store columns one after the others. The banded storage you must use is not hard : http://www.netlib.org/lapack/lug/node124.html.
It seems good to me, but please try it on small problems beforehand (always a good idea by the way). Also make sure you handle non-zero info (this is the way errors are reported).
Better style is to use MKL_INT instead of plain int, this is a typedef to the right type (may be different on some architectures).
Also make sure to allocate memory for pivots before calling dgbtrf.

This might be off topic. But for Poisson equation, FFT based solution is much faster. Just do 2D FFT of your potential field, divided by -(k^2+lambda^2) then do IFFT. lambda is a small number to avoid divergence for k=0. The 5-diagonal equation is a band-limited approximation of the Poisson equation, which approximate the differential operator by finite difference.
http://en.wikipedia.org/wiki/Screened_Poisson_equation

OpenFOAM, PETSc or other sparse matrix multiplication source code

Could someone tell me, where I can find source code for matrix multiplication realized by OpenFOAM, PETSc or something similar? It can't be trivial algorithm.
I have found homepages of OpenFOAM and PETSc but in doc I cant find multiply methods and source code.

PETSc implements matrix multiplication for many formats, look at this part of MatMult_SeqAIJ for the most basic implementation. For a sparse matrix stored in compressed sparse row form with row starts ai, column indices aj, and entries aa, multiplication consists of the following simple kernel.
for (i=0; i<m; i++) {
y[i] = 0;
for (j=ai[i]; j<ai[i+1]; j++)
y[i] += aa[j] * x[aj[j]];
}