Matrices and databases - database

I went through the topic and found out this link quite useful and simple at the same time.
Storing matrices in a relational database
But can you please let me know if the way mentioned as
A B C D
E F G H
I J K L
[A B C D E F G H I J K L]
is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?

In postgresql you can actually have multidimensional arrays, define your own types and define your own functions on those types. For instance one could simply do:
CREATE TABLE tictactoe (
squares integer[3][3]
);
See The PostgreSQL manual for info on how to create your own types.

I think it pretty much depends on how you want to use the matrices in your application.
Is the DB only for persistence for the same application, speed is important, and sizes cannot be known in advance? Make your own serialization scheme, and save the binary blob.
Is the DB for sharing in between applications, with the size not known in advance? Use the comma delimited list.
Are you concerned with data integrity, type safety, and would like to query individual cells? Then use the (row, col, cell value) schema.
Do you know that your matrices are of fixed size and relatively small, for example 4X4 transformation matrices, and will have a 1 to 1 relationship to whatever element you have in the DB? Then you could actually have 16 rows in your table, layed out in line.
Think about your use cases, and experiment!

is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?
I'll start by saying both approaches are valid, but the second one is not sufficient as written by you. You have to have some other information, like the length of the rows or the (row, col) indexes of each element to store a matrix as a 1D array. This is commonly done for sparse matricies, where there are lots of zeros surrounding values clustered on either side of the diagonal.
Persisting the matrix in a database and operating on it in memory are two separate things.
Tasks like multiplying require (row, col) indexes. Storing the matrix as a 2D array means that you'll have them, so no other info is needed. The 1D array needs this info too, so you'll have to supply it.
The advantage swings to the 1D array for sparse matricies. You don't have to store zero values outside the bandwidth in that case, but your operations like addition and multiplication become more complex to code.

Related

Large matrices with a certain structure: how can one define where memory allocation is not needed?

Is there a way to create a 3D array for which only certain elements are defined, while the rest does not take up memory?
Context: I am running Monte-Carlo simulations in which I want to solve 10^5 matrices. All of these matrices have a majority of elements that are zero, for which I wouldn't need to use 8 bytes of memory per element. These elements are the same for all matrices. For simplicity, I have combined all of these matrices into a 3D array, but if my matrices start to become too large, I encounter memory issues (since at matrix dimensions of 100*100*100000, the array already takes up 8 GB of memory).
One workaround would be to store every matrix element with its 10^6 iterations in a vector, that way, no additional information needs to be stored. The inconvenience is that then I would need to work with more than 50 different vectors, and I prefer working with arrays.
Is there any way to tell R that some matrix elements don't need information?
I have been thinking that defining a new class could help for this, but since I have just discovered classes, I am not sure what all the options are. Do you think this could be a good approach? Are there specific things I should keep in mind?
I also know that there are packages made to deal with memory problems, but that did not seem like the quickest solution in terms of human and computation effort for this specific problem.

How to indicate specific slice of a 3D array in MATLAB using GPUs?

I have a 4x4x1250 matrix in MATLAB. I want to find a way to move through the 4x4 matrices slice by slice in order to find the condition of the 4x4 matrices individually.
I don't want to do it in a loop because I want to do this on the GPU and would like it to be indexed.
I saw "squeeze", but I don't think it works for 3D arrays...
I kind of want to use arrayfun, but I don't know how to indicate the specific dimension that I'm interested in.
Any ideas?
Edit: I thought the details I gave are sufficient, nevertheless:
I have a matrix A, size 4x4x1250.
I am interested in the conditions of the 1250 4x4 matrices that make up A. So lets say B = A(:,:,1).
I want to calculate cond(B), but in reality I want 1250 of these calculations.
If I do arrayfun, I don't know how to specify the specific dimension of A along which to slice.
ARRAYFUN disregards the shape of the input, and operates in a purely element-wise fashion. There's also PAGEFUN on the GPU which operates on pages of an array - however, PAGEFUN only really offers an advantage if you're using one of the functions explicitly supported - otherwise it operates in an element-wise fashion.

Array ordering in Julia

Is there a way to work with C-ordered or non-contiguous arrays natively in Julia?
For example, when using NumPy, C-ordered arrays are the default, but I can initialize a Fortran ordered array and do computations with that as well.
One easy way to do this was to take the Transpose of a matrix.
I can also work with non-contiguous arrays that are made via slicing.
I have looked through the documentation, etc. and can't find a way to make, declare, or work with a C-ordered array in Julia.
The transpose appears to return a copy.
Does Julia allow a user to work with C-ordered and non-contiguous arrays?
Is there currently any way to get a transpose or a slice without taking a copy?
Edit: I have found how to do slicing.
Currently it is available as a different type called a SubArray.
As an example, I could do the following to get the first row of a 100x100 array A
sub(A, 1, 1:100)
It looks like there are plans to improve this, as can be seen in https://github.com/JuliaLang/julia/issues/5513
This still leaves open the question of C-ordered arrays.
Is there an interface for C-ordered arrays?
Is there a way to do a transpose via a view instead of a copy?
Naturally, there's nothing that prevents you from working with row-major arrays as a chunk of memory, and certain packages (like Images.jl) support arbitrary ordering of arbitrary-dimensional arrays.
Presumably the main issue you're wondering about is linear algebra. Currently I don't know of anything out-of-the-box, but note that matrix multiplication in Julia is implemented through a series of functions with names like A_mul_B, At_mul_B, Ac_mul_Bc, etc, where t means transpose and c means conjugate. The parser replaces expressions like A'*b with Ac_mul_B(A, b) without actually taking the transpose.
Consequently, you could implement a RowMajorMatrix <: AbstractArray type yourself, and set up special multiplication rules:
A_mul_B(A::RowMajorMatrix, B::RowMajorMatrix) = At_mul_Bt(A, B)
A_mul_B(A::RowMajorMatrix, B::AbstractArray) = At_mul_B(A, B)
A_mul_B(A::AbstractArray, B::RowMajorMatrix) = A_mul_Bt(A, B)
etc. In addition to these two-argument versions, there are 3-argument versions (like A_mul_B!) that store the result in a pre-allocated output; you'd need to implement those, too. Finally, you'd also have to set up appropriate show methods (to display them appropriately), size methods, etc.
Finally, Julia's transpose function has been implemented in a cache-friendly manner, so it's quite a lot faster than the naive
for j = 1:n, i = 1:m
At[j,i] = A[i,j]
end
Consequently there are occasions where it's not worth worrying about creating custom implementations of algorithms, and you can just call transpose.
If you implement something like this, I'd encourage you to contribute it as a package, as it's likely that others may be interested.

Sparse matrix conversion in C

I'm trying to develop a program in C to convert a sparse matrix file into a dense matrix. From what I've read, the best approach would be the use of linked lists but I have no experience with them and haven't found a good online resource explaining the subject. I'm not looking for a quick solution but rather a website or text source that can explain how the process works so I can apply it to this project. What resources I have seen, suggest using three arrays to handle the values in the matrix (The row, column, and individual value) and two arrays for the vector (one for the row, the other for the column). Thanks!
The file format you've specified is for a dense matrix. A 10x10 matrix with 100 elements is dense. A sparse matrix has fewer than n*m elements and all "missing" elements are assumed to be 0. The point of doing it this way is so that matrices that are almost all zero (which happens in a lot of applications) will use less space. But using a sparse matrix format to store a dense matrix will use far more space than just a plain array.
One common sparse matrix file format is called MatrixMarket and it looks very similar to what you described. The first line has three values, # of rows, # of columns, # of nonzero elements (called nnz). Then you have nnz lines of the actual elements in a triplet: (row #) (column #) (value)
If your sparse matrix is in a similar format then you don't need any sparse matrix in memory. Just scan the values and fill in your dense array directly.
If you do want to have a sparse matrix in memory then there are several options for how to store it. Triplets is the easiest, and it's just an in-memory version of the MatrixMarket file. 3 arrays, or 1 array of structs.
The most common structure for linear algebra operations is Compressed Sparse Columns (CSC) or Compressed Sparse Rows (CSR). I'll let you look that up, but if you want a C implementation to play with you should look at Tim Davis' CSparse. This is also how MatLAB stores sparse matrices, Tim was one of the people who wrote that part of MatLAB.
It sounds like a linked list may not be what you're looking for, but this site offers a pretty comprehensive tutorial on the subject. It may help shed some light on whether or not it would be appropriate for your problem... Good luck!

How to implement a huge matrix in C

I'm writing a program for a numerical simulation in C. Part of the simulation are spatially fixed nodes that have some float value to each other node. It is like a directed graph. However, if two nodes are too far away, (farther than some cut-off length a) this value is 0.
To represent all these "correlations" or float values, I tried to use a 2D array, but since I have 100.000 and more nodes, that would correspond to 40GB memory or so.
Now, I am trying to think of different solustions for that problem. I don't want to save all these values on the harddisk. I also don't want to calculate them on the fly. One idea was some sort of sparse matrix, like the one one can use in Matlab.
Do you have any other ideas, how to store these values?
I am new to C, so please don't expect too much experience.
Thanks and best regards,
Jan Oliver
How many nodes, on average, are within the cutoff distance for a given node determines your memory requirement and tells you whether you need to page to disk. The solution taking the least memory is probably a hash table that maps a pair of nodes to a distance. Since the distance is the same each way, you only need to enter it into the hash table once for the pair -- put the two node numbers in numerical order and then combine them to form a hash key. You could use the Posix hsearch/hcreate/hdestroy functions for the hash table, although they are less than ideal.
A sparse matrix approach sounds ideal for this. The Wikipedia article on sparse matrices discusses several approaches to implementation.
A sparse adjacency matrix is one idea, or you could use an adjacency list, allowing your to only store the edges which are closer than your cutoff value.
You could also hold a list for each node, which contains the other nodes this node is related to. You would then have an overall number of list entries of 2*k, where k is the number of non-zero values in the virtual matrix.
Implementing the whole system as a combination of hashes/sets/maps is still expected to be acceptable with regard to speed/performance compared to a "real" matrix allowing random access.
edit: This solution is one possible form of an implementation of a sparse matrix. (See also Jim Balter's note below. Thank you, Jim.)
You should indeed use sparse matrices if possible. In scipy, we have support for sparse matrices, so that you can play in python, although to be honest sparse support still has rough edges.
If you have access to matlab, it will definitely be better ATM.
Without using sparse matrix, you could think about using memap-based arrays so that you don't need 40 Gb of RAM, but it will still be slow, and only really make sense if you have a low degree of sparsity (say if 10-20 % of your 100000x100000 matrix has items in it, then full arrays will actually be faster and maybe even take less space than sparse matrices).

Resources