I have an n x p matrix that looks like this:
n = 100
p = 10
x <- matrix(sample(c(0,1), size = p*n, replace = TRUE), n, p)
I want to create an n x p x p array A whose kth item along the 1st dimension is a p x p diagonal matrix containing the elements of x[k,]. What is the most efficient way to do this in R? I'm looking for a way that uses outer (or some other vectorized approach) rather than one of the apply functions.
Solution using lapply:
A <- aperm(simplify2array(lapply(1:nrow(x), function(i) diag(x[i,]))), c(3,2,1))
I'm looking for something more efficient than this.
Thanks.
As a starting point, here is a humble for loop method with pre-allocation of the matrix.
# pre-allocate matrix of desired size
myArray <- array(0, dim=c(ncol(x), ncol(x), nrow(x)))
# fill in array
for(i in seq_len(nrow(x))) myArray[,,i] <- diag(x[i,])
It should run relatively fast. On my machine, for a 1000 X 100 matrix, the lapply method took 0.87 seconds, while the for loop (including the array pre-allocation) took 0.25 seconds to transform the matrix into to your desired array. So the for loop was about 3.5 times faster.
transpose your original matrix
Note also that row operations on R matrices tend to be slower than column operations. This is because matrices are stored in memory by column. If you transpose your matrix, and perform the operation this way, the time to complete the operation on 100X1000 matrix drops to 0.14, half that of the first for loop, and 7 times faster than the lapply method.
Related
Say A is a 3x4x5 array. I am given a vector a, say of dimension 2 and b of dimension 2. If I do A(a,b,:) it will give 5 matrices of dimensions 2x2. I instead want the piecewise vectors (without writing a for loop).
So, I want the two vectors of A which are given by (a's first element and b's first element) and (a's second element and b's second element)
How do I do this without a for loop? If A were two dimensions I could do this using sub2ind. I don't know how to access the entire vectors.
You can use sub2ind to find the linear index to the first element of each output vector: ind = sub2ind(size(A),a,b). To get the whole vectors, you can't do A(ind,:), because the : has to be the 3rd dimension. However, what you can do is reshape A to be 2D, collapsing the first two dimensions into one. We have a linear index to the vectors we want, that will correctly index the first dimension of this reshaped A:
% input:
A = rand(3,4,5);
a = [2,3];
b = [1,2];
% expected:
B = [squeeze(A(a(1),b(1),:)).';squeeze(A(a(2),b(2),:)).']
% solution:
ind = sub2ind(size(A),a,b);
C = reshape(A,[],size(A,3));
C = C(ind,:)
assert(isequal(B,C))
You can change a and b to be 3d arrays just like A and then the sub2ind should be able to index the whole matrix. Like this:
Edit: Someone pointed out a bug. I have changed it so that a correction gets added. The problem was that ind1, which should have had the index number for each desired element of A was only indexing the first "plane" of A. The fix is that for each additional "plane" in the z direction, the total number of elements in A in the previous "planes" must be added to the index.
A=rand(3,4,5);
a=[2,3];
b=[1,2];
a=repmat(a,1,1,size(A,3));
b=repmat(b,1,1,size(A,3));
ind1=sub2ind(size(A),a,b);
correction=(size(A,1)*size(A,2))*(0:size(A,3)-1);
correction=permute(correction,[3 1 2]);
ind1=ind1+repmat(correction,1,2,1);
out=A(ind1)
To use vcat(a,b) and hcat(a,b), one must match the number of columns or number of rows in the matrices a and b.
When constructing a matrix using vact(a, b) or hcat(a, b) in a loop, one needs an initial matrix a (like a starting statement). Although all the sub-matrices are created in the same manner, I might need to construct this initial matrix a outside of the loop.
For example, if the loop condition is for i in 1:w, then I would need to pre-create a using i = 1, then start the loop with for i in 2:w.
If there is a nested loop, then my method is very awkward. I have thought the following methods, but it seems they don't really work:
Use a dummy a, delete a after the loop. From this question, we cannot delete row in a matrix. If we use another variable to refer to the useful rows and columns, we might waste some memory allocation.
Use reshape() to make an empty dummy a. It works for 1 dimension, but not multiple dimensions.
julia> a = reshape([], 2, 0)
2×0 Array{Any,2}
julia> b = hcat(a, [3, 3])
2×1 Array{Any,2}:
3
3
julia> a = reshape([], 2, 2)
ERROR: DimensionMismatch("new dimensions (2,2) must be consistent with array size 0")
in reshape(::Array{Any,1}, ::Tuple{Int64,Int64}) at ./array.jl:113
in reshape(::Array{Any,1}, ::Int64, ::Int64, ::Vararg{Int64,N}) at ./reshapedarray.jl:39
So my question is how to work around with vcat() and hcat() in a loop?
Edit:
Here is the problem I got stuck in:
There are many gray pixel images. Each one is represented as a 20 by 20 Float64 array. One function foo(n) randomly picks n of those matrices, and combine them to a big square.
If n has integer square root, then foo(n) returns a sqrt(n) * 20 by sqrt(n) * 20 matrix.
If n does not have integer square root, then foo(n) returns a ceil(sqrt(n)) * 20 by ceil(sqrt(n)) * 20 matrix. On the last row of the big square image (a row of 20 by 20 matrices), foo(n) fills ceil(sqrt(n)) ^ 2 - n extra black images (each one is represented as zeros(20,20)).
My current algorithm for foo(n) is to use a nested loop. In the inner loop, hcat() builds a layer (consisting ceil(sqrt(n)) images). In the outer loop, vcat() combines those layers.
Then dealing with hcat() and vcat() in a loop becomes complicated.
So would:
pickimage() = randn(20,20)
n = 16
m = ceil(Int, sqrt(n))
out = Matrix{Float64}(20m, 20m)
k = 0
for i in (1:m)-1
for j in (1:m)-1
out[20i + (1:20), 20j + (1:20)] .= ((k += 1) <= n) ? pickimage() : zeros(20,20)
end
end
be a relevant solution?
Given some function f that accepts 1D array and gives 2D array, is it possible to apply it efficiently for each row of the NxM array A?
More specifically, I want to apply np.triu for each of the row of the NxM array A and then concatenate all the results. I can achieve this by
B = np.dstack(map(np.triu, A))
which gives MxMxN matrix. However, this is not very efficiently for large N. Unfortunately, the function np.apply_along_axis cannot be employed here because f changes dimension.
Knowing the power of NumPy for efficient broadcasting, I am almost sure that there exists a better solution for my problem.
Here's a vectorized approach using broadcasting -
Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
Runtime test and output verification -
In [319]: A = np.random.randint(0,20,(400,100))
In [320]: %timeit np.dstack(map(np.triu, A))
10 loops, best of 3: 69.9 ms per loop
In [321]: %timeit A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
10 loops, best of 3: 24.8 ms per loop
In [322]: B = np.dstack(map(np.triu, A))
In [323]: Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
In [324]: np.allclose(B,Bout)
Out[324]: True
I'm working on a fishery stock assessment model and want to speed it up by removing a loop (actually two loops of the same form).
I have an array, A, dim(A)=[L,L,Y], and a matrix, M, dim(M)=[L,Y].
These are used to make a matrix, mat, dim(mat)=[L,Y], by calculating matrix products. My loop looks like:
for(i in 1:Y){
mat[,i]<-(A[,,i]%*%M[,i])[,1]}
Can anyone help me out? I really need a speed gain.
Also, (don't know if it'll make a difference but) each A[,,i] matrix is lower triangular.
I'm pretty sure this will give you the results you want. Since there is no reproducible example, I can't be absolutely sure. Had to trace some of the linear algebra logic to see what you are trying to accomplish.
library(plyr) # We need this to split the array into a list of 9 matrices
B = lapply(alply(A, 3), function(x) (x%*%M)) # Perform 9 linear algebra multiplications
sapply(1:9, function(i) (B[[i]])[,i]) # Extract the 9 columns you actually want.
I used the following test data:
A = array(rnorm(225), dim = c(5,5,9))
M = matrix(rnorm(45), nrow = 5, ncol = 9)
Searching around here one finds many questions how one can convert cell arrays of doubles into one big matrix.
In my application I have a two dimensional cell array (lets call it celldata of size m times n) of all same sized double matrices (lets say of size a times b).
I want to convert that data structure into one bit 4D double (m times n times a times b).
At the moment I do that by
reshape(cat(3,celldata{:}),m,n,a,b)
but maybe there are other methods doing that directly? Maybe with a call like
cat([3 4],celldata{:,:})
or similar.
I think
cell2mat(permute(celldata, [3 4 1 2]))
will do the trick. However,
%// create some bogus data
m = 1.1e2;
n = 1.2e2;
a = 1.3e2;
b = 1.4e2;
celldata = cellfun(#(~) randi(10, a,b, 'uint8'), cell(m,n), 'UniformOutput', false);
%// new method
tic
cell2mat(permute(celldata, [3 4 1 2]));
toc
%// your current method
tic
reshape(cat(3,celldata{:}),m,n,a,b);
toc
Results:
Elapsed time is 1.745495 seconds. % cell2mat/permute
Elapsed time is 0.305368 seconds. % reshape/cat
cell2mat is a matlab m-file (with necessary inefficiencies in the loop due to compatibility issues), while reshape and cat are built-ins. This is where that difference comes from.
I'd stick with your current method :)
Now, I'm asking you why you'd want to do this convesion in the first place. Is it an indexing problem? Because
celldata{x,y}(w,z)
prevents you from having to do the conversion, so you can index like
converted_celldata(x,y,w,z)
I don't see other reasons, because matrix/vector operations don't work anyway on 4D arrays...