I have a list corresponding to the matrix of a .png image without the RGB transparency information. The dimension of the matrix is 128 128 3. So I have a 128 x 128 matrix coding the red hues in each pixel; a 128 x 128 matrix coding the greens; and a 128 x 128 matrix for the blues. There is no transparency in the image, but to write this .png, I think I need to include a 128x128 matrix of 1's and tag it to my list so as to get a 128 x 128 x 4 matrix.
How can I append this matrix of ones to my list?
I have a list named compressed with these dimensions (128 128 3), and I've tried multiple single and double-bracketed ways to include something like matrix(rep(1, 128^2), nrow= 128) without success.
The idea is to eventually save it as:
require(png)
writePNG(compressed, "compressed.picture")
without doing away with color.
Here's an example depending on whether you have an array or a list of matrices currently. Looking at ?writePNG it seems you need a n x n x 4 array as the final product, so I included a conversion from list to array.
Plenty of other solutions are offered e.g. here.
n <- 4
# generate matrices
r <- matrix(runif(n^2), n, n)
g <- matrix(runif(n^2), n, n)
b <- matrix(runif(n^2), n, n)
a <- matrix(1, n, n)
# list or array format for the data you have
li <- list(r, g, b)
ar <- array(c(r, g, b), dim = c(n, n, 3))
# appending the fourth matrix
li[[4]] <- a
ar1 <- array(c(ar, a), dim = c(n, n, 4))
# alternatively for array
library(abind)
ar2 <- abind(ar, a, along = 3)
# if you have a list and need an array
sapply(li, identity, simplify = "array")
Related
I implemented a finite differences algorithm to solve a PDE.
The grid is a structured 2D domain of size [Nx, Nz], solved Nt times.
I pre-allocate the object containing all solutions:
sol = zeros(Nx, Nz, Nt, 'single') ;
This becomes very easily too large and I get a 'out of memory' error.
Unfortunately sparse doesn't work for N-dimensional arrays.
For the sake of the question it's not important to know the values, it goes without saying that the RAM usage grows exponentially with decreasing the grid spacing and increasing the simulation time.
I am aware that I do not need to store each time instant for the purpose of the advancement of the solution. It would be sufficient to just store the previous two time steps. However, for post-processing reasons I need to access the solution at all time-steps (or at least at a submultiple of the total number).It might help to specify that, even after the solution, the grid remains predominantly populated by zeros.
Am I fighting a lost battle or is there a more efficient way to proceed (other type of objects, vectorization...)?
Thank you.
You could store the array in sparse, linear form; that is, a column vector with length equal to the product of dimensions:
sol = sparse([], [], [], Nx*Nz*Nt, 1); % sparse column vector containing zeros
Then, instead of indexing normally,
sol(x, z, t),
you need to translate the indices x, z, t into the corresponding linear index:
For scalar indices you use
sol(x + Nx*(z-1) + Nx*Nz*(t-1))
You can define a helper function for convenience:
ind = #(sol, x, y, t) sol(x + Nx*(z-1) + Nx*Nz*(t-1))
so the indexing becomes more readable:
ind(sol, x, z, t)
For general (array) indices you need to reshape the indices along different dimensions so that implicit expansion produces the appropriate linear index:
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1))
which of course could also be encapsulated into a function.
Check that the conversion to linear indexing works (general case, using non-sparse array to compare with normal indexing):
Nx = 15; Nz = 18; Nt = 11;
sol = randi(9, Nx, Nz, Nt);
x = [5 6; 7 8]; z = 7; t = [4 9 1];
isequal(sol(x, z, t), ...
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1)))
gives
ans =
logical
1
You can create a a cell array of sparse matrices to store the results. However computations can be performed on full matrices if working with a full matrix is faster than sparse matrix and convert the full matrix to sparse matrix and place it in the cell.
I have a row vector x in Matlab which contains 164372 components. I now want to group these elements in another vector y, which has to contain 52 components. The first element of the vector y must be the average of the first 164372 / 52 = 3161 elements of the vector x, the second element of y must be the average of the next 3161 elements of x, etc. This continues until I have taken all of the 52 averages of the elements in the vector x and placed them in y.
How can I implement this in Matlab? Is there some built-in function that lets me sum elements from a certain index to another index?
Thank you kindly for any help!
With reshape and mean:
x = rand(1,164372); % example data
N = 52; % block size. Assumed to divide numel(x)
result = mean(reshape(x, numel(x)/N, []), 1)
What this does is: reshape the vector into a 52-row matrix in the usual column-major order, and then compute the mean of each column.
I have a binary 3D array of the size 1024 by 1024 by 1024. I want to use a function (convhull), which has the following input:
X is of size mpts-by-ndim, where mpts is the number of points and ndim is the dimension of the space where the points reside, 2 ≦ ndim ≦ 3
How can I reshape my array into the array X which is required by this function?
Maybe "reshape" isn't the best word, because using the "reshape" function isn't enough.
What convhull is looking for is a list of subscripts of nonzero elements in your array. Given a 3D array M:
[X,Y,Z] = ind2sub(size(M), find(M));
Then you use these in convhull:
convhull(X, Y, Z);
The lone X parameter you mention in your question is just these three column vectors concatenated:
X = [X Y Z];
convhull(X);
If I am given 100 points in the coordinate system, and I have to find if there exist a right angled triangle in those vertices.
Is there a way that I can detect the right angled triangle among those vertices without having to choose all pairs of 3 vertices and then applying Pythagoras theorem on them??
Can there be a better algorithm for this?
Thanks for any help. :)
Here's an O(n^2 log n)-time algorithm for two dimensions only. I'll describe what goes wrong in higher dimensions.
Let S be the set of points, which have integer coordinates. For each point o in S, construct the set of nonzero vectors V(o) = {p - o | p in S - {o}} and test whether V(o) contains two orthogonal vectors in linear time as follows.
Method 1: canonize each vector (x, y) to (x/gcd(x, y), y/gcd(x, y)), where |gcd(x, y)| is the largest integer that divides both x and y, and where gcd(x, y) is negative if y is negative, positive if y is positive, and |x| if y is zero. (This is very similar to putting a fraction in lowest terms.) The key fact about two dimensions is that, for each nonzero vector, there exists exactly one canonical vector orthogonal to that vector, specifically, the canonization of (-y, x). Insert the canonization of each vector in V(o) into a set data structure and then, for each vector in V(o), look up its canonical orthogonal mate in that data structure. I'm assuming that the gcd and/or set operations take time O(log n).
Method 2: define a comparator on vectors as follows. Given vectors (a, b), (c, d), write (a, b) < (c, d) if and only if
s1 s2 (a d - b c) < 0,
where
s1 = -1 if b < 0 or (b == 0 and a < 0)
1 otherwise
s2 = -1 if d < 0 or (d == 0 and c < 0)
1 otherwise.
Sort the vectors using this comparator. (This is very similar to comparing the fraction a/b with c/d.) For each vector (x, y) in V(o), binary search for its orthogonal mate (-y, x).
In three dimensions, the set of vectors orthogonal to the unit vector along the z-axis is the entire x-y-plane, and the equivalent of canonization fails to map all vectors in this plane to one orthogonal mate.
I want to find an as fast as possible way of multiplying two small boolean matrices, where small means, 8x8, 9x9 ... 16x16. This routine will be used a lot, so it needs to be very efficient, so please don't suggest that the straightforward solution should be fast enough.
For the special cases 8x8, and 16x16 I already have fairly efficient implementations, based on the solution found here, where we treat the entire matrix as an uint64_t or uint64_t[4] respectively. On my machine this is roughly 70-80 times faster than the straightforward implementation.
However, in the case of 8 < k < 16, I don't really know how I can leverage any reasonable representation in order to enable such clever tricks as above.
So basically, I'm open for any suggestions using any kind of representation (of the matrices) and function signature. You may assume that this targets either a 32-bit or 64-bit architecture (pick what best suits your suggestion)
Given two 4x4 matrices a= 0010,0100,1111,0001, b=1100,0001,0100,0100, one could first calculate the transpose b' = 1000,1011,0000,0100.
Then the resulting matrix M(i,j)=a x b mod 2 == popcount(a[i]&b[j]) & 1; // or parity
From that one can notice that the complexity only grows in n^2, as long as the bitvector fits a computer word.
This can be speed up for 8x8 matrices at least, provided that some special permutation and bit selection operations are available. One can iterate exactly N times with NxN bits in a vector. (so 16x16 is pretty much the limit).
Each step consists of accumulating i.e. Result(n+1) = Result(n) XOR A(n) .& B(n), where Result(0) = 0, A(n) is A <<< n, and '<<<' == columnwise rotation of elements and where B(n) copies diagonal elements from the matrix B:
a b c a e i d h c g b f
B= d e f B(0) = a e i B(1) = d h c B(2) = g b f
g h i a e i d h c g b f
And after thinking it a bit further, a better option is to ^^^ (row wise rotate) matrix B and select A(n) == column copied diagonals from A:
a b c a a a b b b c c c
A= d e f A(0) = e e e , A(1) = f f f, A(2) = d d d
g h i i i i g g g h h h
EDIT To benefit later readers, I'd propose the full solution for W<=16 bit matrix multiplications in portable C.
#include <stdint.h>
void matrix_mul_gf2(uint16_t *a, uint16_t *b, uint16_t *c)
{
// these arrays can be read in two successive xmm registers or in a single ymm
uint16_t D[16]; // Temporary
uint16_t C[16]={0}; // result
uint16_t B[16];
uint16_t A[16];
int i,j;
uint16_t top_row;
// Preprocess B (while reading from input)
// -- "un-tilt" the diagonal to bit position 0x8000
for (i=0;i<W;i++) B[i]=(b[i]<<i) | (b[i]>>(W-i));
for (i=0;i<W;i++) A[i]=a[i]; // Just read in matrix 'a'
// Loop W times
// Can be parallelized 4x with MMX, 8x with XMM and 16x with YMM instructions
for (j=0;j<W;j++) {
for (i=0;i<W;i++) D[i]=((int16_t)B[i])>>15; // copy sign bit to rows
for (i=0;i<W;i++) B[i]<<=1; // Prepare B for next round
for (i=0;i<W;i++) C[i]^= A[i]&D[i]; // Add the partial product
top_row=A[0];
for (i=0;i<W-1;i++) A[i]=A[i+1];
A[W-1]=top_row;
}
for (i=0;i<W;i++) c[i]=C[i]; // return result
}
How about padding it out to the next "clever" (e.g. 8 or 16) size, with all '1' on the diagonal?
Depending on your application, storing both the matrix and its transpose together might help. You will save a lot of time that otherwise would be used to transpose during matrix multiplications, at the expense of some memory and some more operations.
There is a faster method for multiplying 8x8 matrices using 64-bit multiplication along with some simple bit trickery, which works for either GF[2] or boolean algebra.
Assuming the three matrices being packed in 8 consecutive rows of 8 bits inside a 64-bit int each, we can use multiplication to scatter the bits and do the job in just one for loop:
uint64_t mul8x8 (uint64_t A, uint64_t B) {
const uint64_t ROW = 0x00000000000000FF;
const uint64_t COL = 0x0101010101010101;
uint64_t C = 0;
for (int i=0; i<8; ++i) {
uint64_t p = COL & (A>>i);
uint64_t r = ROW & (B>>i*8);
C |= (p*r); // use ^ for GF(2) instead
}
return C;
}
The code for 16x16 is straightfoward if you can afford blocking the rows for improved efficiency.
This trick is also used extensively in high-performance linear algebra libraries, and consists in partitioning the matrix into N/M x N/M blocks of MxM submatrices, with M = 2^m chosen to maximize locality in cache. The usual way to deal with N % M != 0 is to pad rows and columns with 0s so one can use the same algorithm for all block multiplications.
We can apply the same ideas to boolean matrices of variable dimension 8 >= N >= 16 as long as we can afford to have the matrices represented internally in a row blocking format. We just assume the matrix is 16x16 and the last 16-N rows and columns are filled with 0s:
void mul16x16 (uint64_t C[2][2], const uint64_t A[2][2], const uint64_t B[2][2]) {
for (int i=0; i<2; ++i)
for (int j=0; j<2; ++j)
C[i][j] = mul8x8(A[i][0],B[0][j])
| mul8x8(A[i][1],B[1][j]); // once again, use ^ instead for GF(2)
}
Notice we have done a 16x16 matrix multiplication in just 8x8=64 integer products and some bit operations.
Also mul8x8 can be much improved with modern SSE/AVX vector instructions. In theory it is possible to perform all 8 products in parallel with one AVX512 instruction (we still need to scatter the data to the ZMM register first) and then reduce horizontally using lg2(8) = O(3) instructions.