BLAS : Matrix product in C? - c

I would like to realize some fast operations in C language thanks to BLAS (no chance to choose another library, it is the only one available in my project).
I do the following operations:
Invert a square matrix,
Make a matrix product A*B where A is the computed inverse matrix and B a vector,
Sum two (very long) vectors.
I heard this kind of operations were possible with BLAS and were very fast. But I searched and found nothing (in C code lines, I mean) which could make me understand and apply it.

The BLAS library was written originally in Fortran. The interface to C is called CBLAS and has all functions prefixed by cblas_.
Unfortunately with BLAS you can only address directly the last two points:
sgemv (single precision) or dgemv (double precision) performs matrix-vector multiplication
saxpy (single precision) or daxpy (double precision) performs general vector-vector addition
BLAS does not deal with the more complex operation of inverting the matrix. For that there is the LAPACK library that builds on BLAS and provides linear algebra opertaions. General matrix inversion in LAPACK is done with sgetri (single precision) or dgetri (double precision), but there are other inversion routines that handle specific cases like symmetric matrices. If you are inverting the matrix only to multiply it later by a vector, that is essentially solving a system of linear equations and for that there are sgesv (single precision) and dgesv (double precision).
You can invert a matrix using BLAS operations only by essentially (re-)implementing one of the LAPACK routines.
Refer to one of the many BLAS/LAPACK implmentations for more details and examples, e.g. Intel MKL or ATLAS.

Do you really need to compute the full inverse? This is very rarely needed, very expensive, and error prone.
It is much more common to compute the inverse multiplied by a vector or matrix. This is very common, fairly cheap, and not error prone. You do not need to compute the inverse to multiply it by a vector.
If you want to compute Z = X^-1Y then you should look at the LAPACK driver routines. Usually Y in this case has only a few columns. If you really need to see all of X^-1 then you can set Y to be the full identity.

Technically, you can do what you are asking, but normally it is more stable to do:
triangular factorization, eg LU factorization, or Cholesky factorization
use a triangular solver on the factorized matrix
BLAS is quite capable of doing this. Technically, it's in 'LAPACK', but most/many BLAS implementations include LAPACK, eg OpenBLAS and Intel's MKL both do.
the function to do the LU factorization is dgetrf ("(d)ouble, (ge)neral matrix, (tr)iangular (f)actorization") http://www.netlib.org/lapack/double/dgetrf.f
then the solver is dtrsm ("(d)ouble (tr)iangular (s)olve (m)atrix") http://www.netlib.org/blas/dtrsm.f
Note that to call these from C, note that:
the function names should be in lowercase, with _ postfixed, ie dgetrf_ and dtrsm_
all the parameters should be pointers, eg int *m and double *a

Related

LAPACK from Pytrilinos for least-squares

I'm trying to solve a large sparse 30,000x1,000 matrix using the a LAPACK solver in the Trilinos package. My goal is to minimise computation time however this Lapack solver only takes square matrices. So I'm manually converting my non-square matrix (A) into a square matrix by multiplying by its transpose, essentially solving the system like so:
(AT*A)x = AT*b
What is making my solve slow are the matrix multiplications by AT steps.
Any ideas on how to fix this?
You can directly compute the QR-decomposition (wikipedia-link) and use that to solve your least squares problem.
This way you don't have to compute the matrix product A^T A.
There are multiple LAPACK routines for the computation of QR-decompositions, e.g. gels is a driver routine for general matrices (intel-link).
I must say that I don't know if Trilinos supplies that routine but it is a standard LAPACK routine.

Inverting a matrix of any size

I'm using the GNU Scientific Library in implementing a calculator which needs to be able to raise matrices to powers. Unfortunately, there doesn't seem to be such a function available in the GSL for even multiplying matrices (the gsl_matrix_mul_elements() function only multiplies using the addition process), and by extension, no raising to powers.
I want to be able to raise to negative powers, which requires the ability to take the inverse. From my searching around, I have not been able to find any sound code for calculating the inverses of arbitrary matrices (only ones with defined dimensions), and the guides I found for doing it by hand use clever "on-paper tricks" that don't really work in code.
Is there a common algorithm that can be used to compute the inverse of a matrix of any size (failing of course when the inverse cannot be calculated)?
As mentioned in the comments, power of matrices can be computed for square matrices for integer exponents. The n power of A is A^n = A*A*...A where A appears n times. If B is the inverse of A, then the -n power of A is A^(-n) = (A^-1)^n = B^n = B*B*...B.
So in order to compute the n power of A I can suggest the following algorithm using GSL:
gsl_matrix_set_identity(); // initialize An as I
for(i=0;i<n;i++) gsl_blas_dgemm(); // compute recursive product of A
For computing B matrix you can use the following routine
gsl_linalg_LU_decomp(); // compute A decomposition
gsl_linalg_complex_LU_invert // comput inverse from decomposition
I recommend reading up about the SVD, which the gsl implements. If your matrix is invertible, then computing it via the SVD is a not bad, though somewhat slow, way to go. If your matrix is not invertible, the SVD allows you to compute the next best thing, the generalised inverse.
In matrix calculations the errors inherent in floating point arithmetic can accumulate alarmingly. One example is the Hilbert matrix, an innocent looking thing with a remarkably large condition number, even for quite moderate dimension. A good test of an inversion routine is to see how big a Hilbert matrix it can invert, and how close the computed inverse times the matrix is to the identity.

Computing the determinant of a C array

I have written a program that generates a few N x N matrices for which I would like to compute their determinant. Related to this I have two questions
What library is the best for doing so? I would like the fastest possible library since I have millions of such matrices.
Are there any specifics I should take care of when casting the result to an integer? All matrices that I will generate have integer determinants and I would like to make sure that no rounding error skews the correct value of the determinant.
Edit. If possible provide an example of computing the determinant for the recommended library.
As for Matrix Libraries, it looks like that question is answered here:
Recommendations for a small c-based vector and matrix library
As for casting to an integer: if the determinant is not an integer, then you shouldn't be casting it to an integer, you should be using round, floor, or ceil to convert it in an acceptable way. These will probably give you integral values, but you will still need to cast them; however, you will now be able to do so without fear of losing any information.
You can work wonders with matrices by blas and lapack. They are actually written in fortran and using them from "c" is kind of a tweak. but all in all they can crunch numbers in horrendous speed.
http://www.netlib.org/lapack/lug/node11.html
You have GSL, but the choice really depends on your matrices. Are the matrices dense or sparse? Is N big or small? For small Ns you may find that coding the determinant yourself using Cramer's rule or Gauss elimination is faster, since most hight performance libraries focus on big matrices and their optimisations may introduce overhead on simple problems.

Matrix operations in CUDA

What is the best way to organize matrix operations in CUDA (in terms of performance)?
For example, I want to calculate C * C^(-1) * B^T + C, C and B are matrices.
Should I write separate functions for multiplication, transposition and so on or write one function for the whole expression?
Which way is the fastest?
I'd recommend you to use the CUBLAS library. It's normally much daster and more reliable than everything you could write on your own. In addition it's API is similar to the BLAS library which is the standard library for numerical linear algebra.
I think the answer depends heavily on the size of your matrices.
If you can fit a matrix in shared memory, I would probably use a single block to compute that and have all inside a single kernel (probably bigger, where this computation is only a part of it). Hopefully, if you have more matrices, and you need to compute the above equation several times, you can do it in parallel, utilising all GPU computing power.
However, if your matrices are much bigger, you will want more blocks to compute that (check matrix multiplication example in CUDA manual). You need a guarantee that multiplication is finished by all blocks before you proceed with the next part of your equation, and if so, you will need a kernel call for each of your operations.

Eigenvector (Spectral) Decomposition

I am trying to find a program in C code that will allow me to compute a eigenvalue (spectral) decomposition for a square matrix. I am specifically trying to find code where the highest eigenvalue (and therefore its associated eigenvalue) are located int the first column.
The reason I need the output to be in this order is because I am trying to compute eigenvector centrality and therefore I only really need to calculate the eigenvector associated with the highest eigenvalue. Thanks in advance!
In any case I would recommend to use a dedicated linear algebra package like Lapack (Fortran but can be called from C) or CLapack. Both are free and offer routines for almost any eigenvalue problem. If the matrix is large it might be preferable to exploit its sparseness e.g. by using Arpack. All of these libraries tend to sort the eigenvectors according to the eigenvalues if they can (real or purely imaginary eigenvalues).
See the book "Numerical recipes in C"
And the #1 google hit (search: eigenvalue decomposition code C#)
http://crsouza.blogspot.com/2010/06/generalized-eigenvalue-decomposition-in.html
does not help?

Resources