I have written a program that generates a few N x N matrices for which I would like to compute their determinant. Related to this I have two questions
What library is the best for doing so? I would like the fastest possible library since I have millions of such matrices.
Are there any specifics I should take care of when casting the result to an integer? All matrices that I will generate have integer determinants and I would like to make sure that no rounding error skews the correct value of the determinant.
Edit. If possible provide an example of computing the determinant for the recommended library.
As for Matrix Libraries, it looks like that question is answered here:
Recommendations for a small c-based vector and matrix library
As for casting to an integer: if the determinant is not an integer, then you shouldn't be casting it to an integer, you should be using round, floor, or ceil to convert it in an acceptable way. These will probably give you integral values, but you will still need to cast them; however, you will now be able to do so without fear of losing any information.
You can work wonders with matrices by blas and lapack. They are actually written in fortran and using them from "c" is kind of a tweak. but all in all they can crunch numbers in horrendous speed.
http://www.netlib.org/lapack/lug/node11.html
You have GSL, but the choice really depends on your matrices. Are the matrices dense or sparse? Is N big or small? For small Ns you may find that coding the determinant yourself using Cramer's rule or Gauss elimination is faster, since most hight performance libraries focus on big matrices and their optimisations may introduce overhead on simple problems.
Related
I have two n x k complex matrices A and B. I need to calculate C = A B^H. C is guaranteed to be real, symmetric and all elements will be non-negative. These matrices will be pretty large so I am going to be using either CBLAS or MKL in C for this purpose. However, I cannot find a way to tell BLAS that C is symmetric, at least based on the quick reference document. This is important as it would halve the computation time. The xHER2K routine will give me a B A^H term which I don't want. Please suggest a way to proceed in this matter.
I am looking to find the fastest code/algorithm/package for obtaining the singular value decomposition (SVD) of a real, square bidiagonal matrix. The matrices I am working with are fairly small - typically somewhere around 15x15 in size. However, I am performing this SVD thousands (maybe millions) of times, so the computational cost becomes quite large.
I am writing all of my code in C. I assume the fastest codes for performing the SVD will probably come from LAPACK. I have been looking into LAPACK and it seems like I have quite a few different options for performing the SVD of real, bidiagonal matrices:
dbdsqr - uses a zero-shift QR algorithm to get the SVD of a bidiagonal matrix
dbdsdc - uses a divide and conquer algorithm to get the SVD of a bidiagonal matrix
dgesvd - not sure exactly how this one works, but it can handle any arbitrary matrix, so I assume I am better of using dbdsqr or dbdsdc
dgesdd - not quire sure how this one works either, but it can also handle any arbitrary matrix, so I assume I am better of using dbdsqr or dbdsdc
dstemr - estimates eigenvectors/eigenvalues for a tridiagonal matrix; I can use it to estimate the left singular vectors/values by finding the eigenvectors/values of A*A'; I can then estimate the right singular vectors/values by finding the eigenvectors/values of A'*A
dstemr - perhaps there is an even faster way to use dstemr to get the SVD... please enlighten me if you know of a way
I have no idea which of these methods is fastest. Is there an even faster way to get the SVD of a real, bidiagonal matrix that I haven't listed above?
Ideally I am looking for a small example C code to demonstrate the fastest possible SVD for a relatively small real, bidiagonal matrix.
I have used Jacobi method to find all eigenvalues and eigenvectors in c code. Though the complexity of Jacobi method is O(n^3) but the dimension of my matrix is huge (17814 X 17814). It takes a lot of time. I want to know a better algorithm by which I can solve this problem. If you want I can attach my c code.
The algorithm suggested in the comments is not necessarily the best one.
As you can see here, the Jacobi method can be vastly faster when using special techniques.
On top of that, Jacobi is quite easy to run in parallel, and it's much faster for sparse matrices than for dense matrices so you can take advantage of that as well, depending on your architecture and the type of matrix you have.
I'd say that the best thing is to test a few different methods and see in practice where you can get the best results.
O(n^2.376) is not necessarily better than O(n^3) depending on constants.
What is the best way to organize matrix operations in CUDA (in terms of performance)?
For example, I want to calculate C * C^(-1) * B^T + C, C and B are matrices.
Should I write separate functions for multiplication, transposition and so on or write one function for the whole expression?
Which way is the fastest?
I'd recommend you to use the CUBLAS library. It's normally much daster and more reliable than everything you could write on your own. In addition it's API is similar to the BLAS library which is the standard library for numerical linear algebra.
I think the answer depends heavily on the size of your matrices.
If you can fit a matrix in shared memory, I would probably use a single block to compute that and have all inside a single kernel (probably bigger, where this computation is only a part of it). Hopefully, if you have more matrices, and you need to compute the above equation several times, you can do it in parallel, utilising all GPU computing power.
However, if your matrices are much bigger, you will want more blocks to compute that (check matrix multiplication example in CUDA manual). You need a guarantee that multiplication is finished by all blocks before you proceed with the next part of your equation, and if so, you will need a kernel call for each of your operations.
Does anyone know a good one? I'm looking for multiplication of matrices, transpose, invert, converting from 4x4 to top left corner 3x3 etc.
Like you say, rolling your own is easy enough. The inverse is tricky to get efficient unless you read this:
http://www.geometrictools.com/Documentation/LaplaceExpansionTheorem.pdf
I could send you my code, but it's a 4x4-only c++ class and does not take that paper into account yet, among other things that probably don't fit your needs.
Try BLAS or LAPACK.
Intel's Math Kernel or Numerical Receipes in C by Press, Flannery, Teukolsky and Vetterling