Sparse matrices, sparse accumulator and multiplication - sparse-matrix

I'm trying to implement the algorithm for multiplying two sparse matrices from this paper: https://crd.lbl.gov/assets/pubs_presos/spgemmicpp08.pdf (the first algorithm - 1D algorithm).
What bothers me is that I'm not sure what SPA (sparse accumulator) really is. I've done some research and what I've concluded is that SPA represents a 𝐬𝐢𝐧𝐠𝐥𝐞 row/column of a sparse matrix (I'm mostly not sure about that part) and that it consists of a dense vector with nonzero values, a list of indices of nonzero elements (why list?) and a bool dense vector consisting of "occupied" flags (𝑇𝑟𝑢𝑒 on 𝑖-th index if an element in the active row/column on that position is not zero). Some also keep the number of nonzero inputs.
Am I correct? If so, I have some questions. If this structure has a dense boolean vector and we must keep the values, isn't it easier to simply fill one dense vector and ignore that it's sparse? I'm sure that there are reasons why this is more efficient (memory and time), but I don't see why.
Also, as I've already asked, why is everything a vector except the list of indices? Why isn't that also a vector?
Thanks in advance!

Many sparse matrix algorithms use a dense working vector to allow random access to the currently "active" column or row of a matrix.
The sparse MATLAB implementation formalizes this idea by defining an
abstract data type called the sparse accumulator, or SPA. The SPA consists of a dense vector of real (or complex) values, a dense vector of true/false "occupied" flags, and an unordered list of the indices whose occupied flags are true.
The SPA represents a column vector whose "unoccupied" positions are zero and
whose "occupied" positions have values (zero or nonzero) specified by the dense real or complex vector. It allows random access to a single element in constant time, as well as sequencing through the occupied positions in constant time per element.
Check section 3.1.3 at https://epubs.siam.org/doi/pdf/10.1137/0613024

Related

Mathematical representation of a multidimensional array

This might me a ridiculous question.
I created mathematical model using Python and I know that I started this from the end, but I need write mathematical equations for the documentation.
The equation has multidimensional array in it.
So my question is how to present multidimensional array in mathematical way?
If the number of dimensions in your array is one, you can represent it as a vector, or perhaps a tuple. But this almost certainly is not what you mean by "multidimensional."
If the number of dimensions is two, you can use a matrix.
If the number of dimensions is greater than two, you can use a tensor. Here is a Wikipedia link explaining a little how tensors and multidimensional arrays are related. A search will give you many more such pages. Tensors include vectors and matrices, so this is the most general solution, though vectors and matrices are much more well known.

How to efficiently store large sparse array

I am tracking particles into a 3D lattice. Each lattice element is labeled with an index corresponding to an unrolled 3D array
S = x + WIDTH * (y + DEPTH * z)
I am interested in the transition form cell S1 to cell S2. The resulting transition matrix M(S1,S2) is sparsely populated, because particles can reach only near by cells. Unfortunately using the indexing of an unrolled 3D array cells that are geometrically near might have big difference in their indexes. For instance, cells that are siting on top of each other (say at z and z+1) will have their indexes shifted by WIDTH*DEPTH. Therefore if I try accumulating the resulting 2D matrix M(S1,S2) , S1 and S2 will be very different, even dough the cells are adjacent. This is a significant problem, because I can't use the usual sparse matrix storage.
At the beginning I tried storing the matrix in coordinate format:
I , J VALUE
Unfortunately I need to loop the entire index set to find the proper S1,S2 and store the accumulated M(S1,S2).
Unusually sparse matrices have some underlying structure and therefore the indexing is quite straightforward. In this case however, I have some troubles figuring out how to index my cells.
I would appreciate your help
Thank you in advance,
There are several approaches. Which is best depends on operations that need to be performed on the matrix.
A good general purpose one is to use a hash table where the key is the index tuple, in your case (i,j).
If neighboring (in the Euclidean sense) matrix elements must be discoverable, then an alternate strategy is a balanced tree with a Morton Order key. The Morton order value of a key (i,j) is just the integers i and j with their bits interleaved. You should quickly see that index tuples close to each other in the index 2-space are also close in linear Morton order.
Of course if you are building the matrix all at once, after which it's immutable, then you can build the key-value pairs in an array rather than a hash table or balanced tree, sort them (lexicographically for (i,j) pairs and linearly for Morton keys) and then do reads with simple binary search.

How to indicate specific slice of a 3D array in MATLAB using GPUs?

I have a 4x4x1250 matrix in MATLAB. I want to find a way to move through the 4x4 matrices slice by slice in order to find the condition of the 4x4 matrices individually.
I don't want to do it in a loop because I want to do this on the GPU and would like it to be indexed.
I saw "squeeze", but I don't think it works for 3D arrays...
I kind of want to use arrayfun, but I don't know how to indicate the specific dimension that I'm interested in.
Any ideas?
Edit: I thought the details I gave are sufficient, nevertheless:
I have a matrix A, size 4x4x1250.
I am interested in the conditions of the 1250 4x4 matrices that make up A. So lets say B = A(:,:,1).
I want to calculate cond(B), but in reality I want 1250 of these calculations.
If I do arrayfun, I don't know how to specify the specific dimension of A along which to slice.
ARRAYFUN disregards the shape of the input, and operates in a purely element-wise fashion. There's also PAGEFUN on the GPU which operates on pages of an array - however, PAGEFUN only really offers an advantage if you're using one of the functions explicitly supported - otherwise it operates in an element-wise fashion.

Sparse matrix conversion in C

I'm trying to develop a program in C to convert a sparse matrix file into a dense matrix. From what I've read, the best approach would be the use of linked lists but I have no experience with them and haven't found a good online resource explaining the subject. I'm not looking for a quick solution but rather a website or text source that can explain how the process works so I can apply it to this project. What resources I have seen, suggest using three arrays to handle the values in the matrix (The row, column, and individual value) and two arrays for the vector (one for the row, the other for the column). Thanks!
The file format you've specified is for a dense matrix. A 10x10 matrix with 100 elements is dense. A sparse matrix has fewer than n*m elements and all "missing" elements are assumed to be 0. The point of doing it this way is so that matrices that are almost all zero (which happens in a lot of applications) will use less space. But using a sparse matrix format to store a dense matrix will use far more space than just a plain array.
One common sparse matrix file format is called MatrixMarket and it looks very similar to what you described. The first line has three values, # of rows, # of columns, # of nonzero elements (called nnz). Then you have nnz lines of the actual elements in a triplet: (row #) (column #) (value)
If your sparse matrix is in a similar format then you don't need any sparse matrix in memory. Just scan the values and fill in your dense array directly.
If you do want to have a sparse matrix in memory then there are several options for how to store it. Triplets is the easiest, and it's just an in-memory version of the MatrixMarket file. 3 arrays, or 1 array of structs.
The most common structure for linear algebra operations is Compressed Sparse Columns (CSC) or Compressed Sparse Rows (CSR). I'll let you look that up, but if you want a C implementation to play with you should look at Tim Davis' CSparse. This is also how MatLAB stores sparse matrices, Tim was one of the people who wrote that part of MatLAB.
It sounds like a linked list may not be what you're looking for, but this site offers a pretty comprehensive tutorial on the subject. It may help shed some light on whether or not it would be appropriate for your problem... Good luck!

How to implement a huge matrix in C

I'm writing a program for a numerical simulation in C. Part of the simulation are spatially fixed nodes that have some float value to each other node. It is like a directed graph. However, if two nodes are too far away, (farther than some cut-off length a) this value is 0.
To represent all these "correlations" or float values, I tried to use a 2D array, but since I have 100.000 and more nodes, that would correspond to 40GB memory or so.
Now, I am trying to think of different solustions for that problem. I don't want to save all these values on the harddisk. I also don't want to calculate them on the fly. One idea was some sort of sparse matrix, like the one one can use in Matlab.
Do you have any other ideas, how to store these values?
I am new to C, so please don't expect too much experience.
Thanks and best regards,
Jan Oliver
How many nodes, on average, are within the cutoff distance for a given node determines your memory requirement and tells you whether you need to page to disk. The solution taking the least memory is probably a hash table that maps a pair of nodes to a distance. Since the distance is the same each way, you only need to enter it into the hash table once for the pair -- put the two node numbers in numerical order and then combine them to form a hash key. You could use the Posix hsearch/hcreate/hdestroy functions for the hash table, although they are less than ideal.
A sparse matrix approach sounds ideal for this. The Wikipedia article on sparse matrices discusses several approaches to implementation.
A sparse adjacency matrix is one idea, or you could use an adjacency list, allowing your to only store the edges which are closer than your cutoff value.
You could also hold a list for each node, which contains the other nodes this node is related to. You would then have an overall number of list entries of 2*k, where k is the number of non-zero values in the virtual matrix.
Implementing the whole system as a combination of hashes/sets/maps is still expected to be acceptable with regard to speed/performance compared to a "real" matrix allowing random access.
edit: This solution is one possible form of an implementation of a sparse matrix. (See also Jim Balter's note below. Thank you, Jim.)
You should indeed use sparse matrices if possible. In scipy, we have support for sparse matrices, so that you can play in python, although to be honest sparse support still has rough edges.
If you have access to matlab, it will definitely be better ATM.
Without using sparse matrix, you could think about using memap-based arrays so that you don't need 40 Gb of RAM, but it will still be slow, and only really make sense if you have a low degree of sparsity (say if 10-20 % of your 100000x100000 matrix has items in it, then full arrays will actually be faster and maybe even take less space than sparse matrices).

Resources