Creating and using a sparse matrix in Accord.NET - sparse-matrix

I would appreciate any hint on how to generate a sparse matrix in Accord.NET or in C#, without creating a library from scratch. The problem is that I want to make a matrix of size 30k by 30k representing the adjacency matrix of a graph, which will be almost always sparse. Past 15000 rows or columns, the following code generates an error:
var A = new double[n, n];
error: Array dimensions exceeded supported range.
ps. I am aware of the Sparse Class but as can be seen from the description, this class can only be used to make sparse vectors, not matrices.
If it is possible to make a sparse matrix, then the next immediate question is can it be treated like a normal matrix in Linear Algebraic operations, e.g. in subtracting two matrices, or finding a sub-matrix using:
Accord.Math.Matrix.Get(A, IVI, BVI, B);

30kx30k isn't very much.. Try change in your project properties in "Build", "Prefer 32 bit" to unchecked, then you can also try this code:
Process.GetCurrentProcess().MaxWorkingSet = new IntPtr(262144000);
Process.GetCurrentProcess().MinWorkingSet = new IntPtr(209715200);
to your method, it may help if there is problem with memory to make this array.
Anyway you can use for this MathNet downloadavble via Nuget.
Exactly sparse matrix from MathNet.Numerics.LinearAlgebra.Double
MathNet.Numerics.LinearAlgebra.Double.SparseMatrix M = new MathNet.Numerics.LinearAlgebra.Double.SparseMatrix(30000);
This Math.Net look slow to me, Accord is better in some operations and I dont know how to do sparse matrix in Accord which I would rather use than MathNet.

Related

Is there a way to perform 2D convolutions with strides using Accelerate library in Swift?

I am trying to perform a specific downsampling process. It is described by the following pseudocode.
//Let V be an input image with dimension of M by N (row by column)
//Let U be the destination image of size floor((M+1)/2) by floor((N+1)/2)
//The floor function is to emphasize the rounding for the even dimensions
//U and V are part of a wrapper class of Pixel_FFFF vImageBuffer
for i in 0 ..< U.size.rows {
for j in 0 ..< U.size.columns {
U[i,j] = V[(i * 2), (j * 2)]
}
}
The process basically takes pixel values on every other locations spanning on both dimensions. The resulting image will be approximately half of the original image.
On a one-time call, the process is relatively fast running by itself. However, it becomes a bottleneck when the code is called numerous times inside a bigger algorithm. Therefore, I am trying to optimize it. Since I use Accelerate in my app, I would like to be able to adapt this process in a similar spirit.
Attempts
First, this process can be easily done by a 2D convolution using the 1x1 kernel [1] with a stride [2,2]. Hence, I considered the function vImageConvolve_ARGBFFFF. However, I couldn't find a way to specify the stride. This function would be the best solution, since it takes care of the image Pixel_FFFF structure.
Second, I notice that this is merely transferring data from one array to another array. So, I thought vDSP_vgathr function is a good solution for this. However, I hit a wall, since the resulting vector of vectorizing a vImageBuffer would be the interleaving bits structure A,R,G,B,A,R,G,B,..., which each term is 4 bytes. vDSP_vgathr function transfers every 4 bytes to the destination array using a specified indexing vector. I could use a linear indexing formula to make such vector. But, considering both even and odd dimensions, generating the indexing vector would be as inefficient as the original solution. It would require loops.
Also, neither of the vDSP 2D convolution functions fit the solution.
Is there any other functions in Accelerate that I might have overlooked? I saw that there's a stride option in the vDSP 1D convolution functions. Maybe, does someone know an efficient way to translate 2D convolution process with strides to 1D convolution process?

How to indicate specific slice of a 3D array in MATLAB using GPUs?

I have a 4x4x1250 matrix in MATLAB. I want to find a way to move through the 4x4 matrices slice by slice in order to find the condition of the 4x4 matrices individually.
I don't want to do it in a loop because I want to do this on the GPU and would like it to be indexed.
I saw "squeeze", but I don't think it works for 3D arrays...
I kind of want to use arrayfun, but I don't know how to indicate the specific dimension that I'm interested in.
Any ideas?
Edit: I thought the details I gave are sufficient, nevertheless:
I have a matrix A, size 4x4x1250.
I am interested in the conditions of the 1250 4x4 matrices that make up A. So lets say B = A(:,:,1).
I want to calculate cond(B), but in reality I want 1250 of these calculations.
If I do arrayfun, I don't know how to specify the specific dimension of A along which to slice.
ARRAYFUN disregards the shape of the input, and operates in a purely element-wise fashion. There's also PAGEFUN on the GPU which operates on pages of an array - however, PAGEFUN only really offers an advantage if you're using one of the functions explicitly supported - otherwise it operates in an element-wise fashion.

Sparse matrix conversion in C

I'm trying to develop a program in C to convert a sparse matrix file into a dense matrix. From what I've read, the best approach would be the use of linked lists but I have no experience with them and haven't found a good online resource explaining the subject. I'm not looking for a quick solution but rather a website or text source that can explain how the process works so I can apply it to this project. What resources I have seen, suggest using three arrays to handle the values in the matrix (The row, column, and individual value) and two arrays for the vector (one for the row, the other for the column). Thanks!
The file format you've specified is for a dense matrix. A 10x10 matrix with 100 elements is dense. A sparse matrix has fewer than n*m elements and all "missing" elements are assumed to be 0. The point of doing it this way is so that matrices that are almost all zero (which happens in a lot of applications) will use less space. But using a sparse matrix format to store a dense matrix will use far more space than just a plain array.
One common sparse matrix file format is called MatrixMarket and it looks very similar to what you described. The first line has three values, # of rows, # of columns, # of nonzero elements (called nnz). Then you have nnz lines of the actual elements in a triplet: (row #) (column #) (value)
If your sparse matrix is in a similar format then you don't need any sparse matrix in memory. Just scan the values and fill in your dense array directly.
If you do want to have a sparse matrix in memory then there are several options for how to store it. Triplets is the easiest, and it's just an in-memory version of the MatrixMarket file. 3 arrays, or 1 array of structs.
The most common structure for linear algebra operations is Compressed Sparse Columns (CSC) or Compressed Sparse Rows (CSR). I'll let you look that up, but if you want a C implementation to play with you should look at Tim Davis' CSparse. This is also how MatLAB stores sparse matrices, Tim was one of the people who wrote that part of MatLAB.
It sounds like a linked list may not be what you're looking for, but this site offers a pretty comprehensive tutorial on the subject. It may help shed some light on whether or not it would be appropriate for your problem... Good luck!

How to implement a huge matrix in C

I'm writing a program for a numerical simulation in C. Part of the simulation are spatially fixed nodes that have some float value to each other node. It is like a directed graph. However, if two nodes are too far away, (farther than some cut-off length a) this value is 0.
To represent all these "correlations" or float values, I tried to use a 2D array, but since I have 100.000 and more nodes, that would correspond to 40GB memory or so.
Now, I am trying to think of different solustions for that problem. I don't want to save all these values on the harddisk. I also don't want to calculate them on the fly. One idea was some sort of sparse matrix, like the one one can use in Matlab.
Do you have any other ideas, how to store these values?
I am new to C, so please don't expect too much experience.
Thanks and best regards,
Jan Oliver
How many nodes, on average, are within the cutoff distance for a given node determines your memory requirement and tells you whether you need to page to disk. The solution taking the least memory is probably a hash table that maps a pair of nodes to a distance. Since the distance is the same each way, you only need to enter it into the hash table once for the pair -- put the two node numbers in numerical order and then combine them to form a hash key. You could use the Posix hsearch/hcreate/hdestroy functions for the hash table, although they are less than ideal.
A sparse matrix approach sounds ideal for this. The Wikipedia article on sparse matrices discusses several approaches to implementation.
A sparse adjacency matrix is one idea, or you could use an adjacency list, allowing your to only store the edges which are closer than your cutoff value.
You could also hold a list for each node, which contains the other nodes this node is related to. You would then have an overall number of list entries of 2*k, where k is the number of non-zero values in the virtual matrix.
Implementing the whole system as a combination of hashes/sets/maps is still expected to be acceptable with regard to speed/performance compared to a "real" matrix allowing random access.
edit: This solution is one possible form of an implementation of a sparse matrix. (See also Jim Balter's note below. Thank you, Jim.)
You should indeed use sparse matrices if possible. In scipy, we have support for sparse matrices, so that you can play in python, although to be honest sparse support still has rough edges.
If you have access to matlab, it will definitely be better ATM.
Without using sparse matrix, you could think about using memap-based arrays so that you don't need 40 Gb of RAM, but it will still be slow, and only really make sense if you have a low degree of sparsity (say if 10-20 % of your 100000x100000 matrix has items in it, then full arrays will actually be faster and maybe even take less space than sparse matrices).

Matrices and databases

I went through the topic and found out this link quite useful and simple at the same time.
Storing matrices in a relational database
But can you please let me know if the way mentioned as
A B C D
E F G H
I J K L
[A B C D E F G H I J K L]
is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?
In postgresql you can actually have multidimensional arrays, define your own types and define your own functions on those types. For instance one could simply do:
CREATE TABLE tictactoe (
squares integer[3][3]
);
See The PostgreSQL manual for info on how to create your own types.
I think it pretty much depends on how you want to use the matrices in your application.
Is the DB only for persistence for the same application, speed is important, and sizes cannot be known in advance? Make your own serialization scheme, and save the binary blob.
Is the DB for sharing in between applications, with the size not known in advance? Use the comma delimited list.
Are you concerned with data integrity, type safety, and would like to query individual cells? Then use the (row, col, cell value) schema.
Do you know that your matrices are of fixed size and relatively small, for example 4X4 transformation matrices, and will have a 1 to 1 relationship to whatever element you have in the DB? Then you could actually have 16 rows in your table, layed out in line.
Think about your use cases, and experiment!
is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?
I'll start by saying both approaches are valid, but the second one is not sufficient as written by you. You have to have some other information, like the length of the rows or the (row, col) indexes of each element to store a matrix as a 1D array. This is commonly done for sparse matricies, where there are lots of zeros surrounding values clustered on either side of the diagonal.
Persisting the matrix in a database and operating on it in memory are two separate things.
Tasks like multiplying require (row, col) indexes. Storing the matrix as a 2D array means that you'll have them, so no other info is needed. The 1D array needs this info too, so you'll have to supply it.
The advantage swings to the 1D array for sparse matricies. You don't have to store zero values outside the bandwidth in that case, but your operations like addition and multiplication become more complex to code.

Resources