Eigen Sparse Matrix - sparse-matrix

I am trying to multiply two large sparse matrices of size 300k * 1000k and 1000k*300k using Eigen. The matrices are highly sparse ~0.01% non zero entries, however there's no block or other structure in their sparsity.
It turns out that Eigen chokes and ends up taking 55-60G of memory. Actually, it makes the final matrix dense which explains why it takes so much memory.
I have tried multiplying matrices of similar sizes when one of the matrix is diagonal and the multiplication works fine, with ~2-3 G of memory.
Any thoughts on whats going wrong?

Even though your matrices are sparse, the result might be completely dense. You can try to remove smallest entries with (A*B).prune(ref,eps); where ref is a reference value for what is not a zero and eps is a tolerance value. Basically, all entries smaller than ref*eps will be removed during the computation of the product, thus reducing both the memory usage and size of the result. A better option would be to find a way to avoid performing this product.

Related

Julia: all eigenvalues of large sparse matrix

I have a large sparse matrix, for example, 128000×128000 SparseMatrixCSC{Complex{Float64},Int64} with 1376000 stored entries.
How to quickly get all eigenvalues of the sparse matrix ? Is it possible ?
I tried eigs for 128000×128000 with 1376000 stored entries but the kernel was dead.
I use a mac book pro with 16GB memory and Julia 1.3.1 on jupyter notebook.
As far as I'm aware (and I would love to be proven wrong) there is no efficient way to get all the eigenvalues of a general sparse matrix.
The main algorithm to compute the eigenvalues of a matrix is the QR algorithm. The first step of the QR algorithm is to reduce the matrix to a Hessenberg form (in order to do the QR factorisations in O(n) time). The problem is that reducing a matrix to Hessenberg form destroys the sparsity and you just end up with a dense matrix.
There are also other methods to compute the eigenvalues of a matrix like the (inverse) power iteration, that only require matrix vector products and solving linear systems, but these only give you the largest or smallest eigenvalues, and they become expensive when you want to compute all the eigenvalues (they require storing the eigenvectors for the "deflation").
So that was in general, now if your matrix has some special structure there may some better alternatives. For example, if your matrix is symmetric, then its Hessenberg form is tridiagonal and you can compute all the eigenvalues pretty fast.
TLDR: Is it possible ? — in general, no.
P.S: I tried to keep this short but if you're interested I can give you more details on any part of the answer.

counterintuitive speed difference between LM and shift-invert modes in scipy.sparse.linalg.eigsh?

I'm trying to find the smallest (as in most negative, not lowest magnitude) several eigenvalues of a list of sparse Hermitian matrices in Python using scipy.sparse.linalg.eigsh. The matrices are ~1000x1000, and the list length is ~500-2000. In addition, I know upper and lower bounds on the eigenvalues of all the matrices -- call them eig_UB and eig_LB, respectively.
I've tried two methods:
Using shift-invert mode with sigma=eig_LB.
Subtracting eig_UB from the diagonal of each matrix (thus shifting the smallest eigenvalues to be the largest magnitude eigenvalues), diagonalizing the resulting matrices with default eigsh settings (no shift-invert mode and using which='LM'), and then adding eig_UB to the resulting eigenvalues.
Both methods work and their results agree, but method 1 is around 2-2.5x faster. This seems counterintuitive, since (at least as I understand the eigsh documentation) shift-invert mode subtracts sigma from the diagonal, inverts the matrix, and then finds eigenvalues, whereas default mode directly finds the largest magnitude eigenvalues. Does anyone know what could explain the difference in performance?
One other piece of information: I've checked, and the matrices that result from shift-inverting (that is, (M-sigma*identity)^(-1) if M is the original matrix) are no longer sparse, which seems like it should make finding their eigenvalues take even longer.
This is probably resolved. As pointed out in https://arxiv.org/pdf/1504.06768.pdf, you don't actually need to invert the shifted sparse matrix and then repeatedly apply it in some Lanczos-type method -- you just need to repeatedly solve an inverse problem (M-sigma*identity)*v(n+1)=v(n) to generate a sequence of vectors {v(n)}. This inverse problem can be done quickly for a sparse matrix after LU decomposition.

How save memory for a solving a symmetric (or upper traingular) matrix?

I need to solve system of linear algebraic equations A.X = B
The matrix A is double precision with about size of 33000x33000 and I will get an error when I try to allocate it:
Cannot allocate array - overflow on array size calculation.
Since I am using LAPACK dposv with the Intel MKL library, I was wondering if there is a way to somehow pass an smaller matrix to the library function? (because only half of the matrix arrays are needed to solve)
The dposv function only needs an upper or lower triangular matrix for A. Here is more details about dposv.
Update: Please notice that the A matrix is N x N and yet it takes lda: INTEGER as The leading dimension of a; lda ≥ max(1, n). So may be there is a way to parse A as an 1D array?
As the error says (Cannot allocate array - overflow on array size calculation) Your problem seems to be somewhere else: especially the limit of the integer type used to compute the array size internally. And I am afraid that you might not be able to solve that even if you add more memory. You will need to check the internals of the library that your are using for memory management (Possibly MKL, but I don't use MKL so I can not help) or choose another one.
Explanation, some functions use 4 bytes integer to compute the memory size when allocating. That gives you a limit of 2^32 or 4 Gbytes of memory that you can allocate wich is way lower than your 8 Gbytes array. In that I am assuming unsigned integer; with signed integer, that limit is 2 Gbytes.
Hints if you have limited memory:
If you do not have enough memory (about 4 Gbytes for the matrix alone since it is triangular) and you do not know the structure of the matrix, then forget about special solvers and solve your problem yourself. Solving a system with an upper triangular matrix is a backward substitution. Starting with the last row of the solution, you need only one row of the matrix to compute each component of the solution.
Find a way to load your matrix row by row starting with the last row.
Thanks to mecej4
There are several options to pass a huge matrix using less memory:
Using functions that support Matrix Storage Schemes e.g. ?pbsv
Using PARDISO

Matrix multiplication optimization: Loop tiling

I'm trying to optimize the multiplication of 2 1024x1024 matrices by tiling the loops. I found that using block sizes of 128 and 64 gave me by far the best results but I only obtained those numbers by guess and check. And when trying to use the same block sizes for 2000x2000 matrices the results are far from ideal. Could anyone point me towards being able to logically solve which tile sizes are best when trying to optimize matrix multiplication?

Sparse matrix-matrix multiplication

I'm currently working with sparse matrices, and I have to compare the computation time of sparse matrix-matrix multiplication with full matrix-matrix multiplication. The issue is that sparse matrix computation is waaaaay slower than full matrix computation.
I'm compressing my matrices with the Compressed Row Storage, and multiplicating 2 matrices is very time consuming (quadruple for loop), so I'm wondering if there is a better compression format more suitable for matrix-matrix operation (CRS is very handy with matrix-vector computation).
Thanks in advance!
It's usually referred to as "Compressed Sparse Rows" (CSR), not CRS. The transpose, Compressed Sparse Columns (CSC) is also commonly used, including by the CSparse package that ends up being the backend of quite a lot of systems including MatLAB and SciPy (I think).
There is also a less-common Doubly-Compressed Sparse Columns (DCSC) format used by the Combinatorial BLAS. It compresses the column index again, and is useful for cases where the matrix is hypersparse. A hypersparse matrix has most columns empty, something that happens with 2D matrix decomposition.
That said, yes there is more overhead. However your operations are now dominated by the number of nonzeros, not the dimensions. So your FLOPS might be less but you still get your answer quicker.
You might look at the paper EFFICIENT SPARSE MATRIX-MATRIX PRODUCTS USING COLORINGS http://www.mcs.anl.gov/papers/P5007-0813_1.pdf for a discussion of how to achieve high performance with sparse matrix matrix products.

Resources