Suppose I have an array of dimension (i,j,k) and I want to loop through the dimension k and calculate for each matrix(i,j) the row max and the row indices. How could I avoid a double apply function. The problem of the double apply is that it is slow because it does not handle efficient the creation of the result matrices, for example:
Array <- rnorm(10000000)
dim(Array) <- c(1, 10000000,1)
system.time(apply(Array, 3, function(x)apply(x,1,function(y)c(min(y), max(y), which.min(y), which.max(y)))))
system.time(apply(Array, 3, rowRanges))
system.time(apply(Array, 3, function(y)c(min(y), max(y), which.min(y), which.max(y))))
How could I avoid the double apply function, and is it possible to calculate the position of the min and max in a more efficient way? In this example the array is an vector but in real life it is a real array
The example was more to prove that the apply function causes overhead. The range function only calculates the min and the max and not the indices. Here the real example that I want to optimise(Notice that the first indices 10 in the example is in reallife around 1500 but then the array is around 16 gb)
Array <- rnorm(35000*36*10)
dim(Array) <- c(10, 35000,36)
test1 <- apply(Array, 3, function(x)apply(x,1,function(y)c(min(y), max(y), which.min(y), which.max(y))))
dim(test1) <- c(4, 10,36)
test2 <- apply(Array, 3, rowRanges)
dim(test2) <- c(10,2,36)
test2 <- aperm(test2, c(2,1,3))
sum(!test2 == test1[1:2,,])
Related
I have two 2D arrays, one M is 2000x3 and the other N is 20x3 (sets of x,y,z coords). I would like to subtract N from M to produce a 3D array 2000x20x3. Currently I get a ValueError: operands could not be broadcast together with shapes (2000,3) (20,3)
A more simple example as a working exercise
M = np.array([[1,1,1],[2,1,1],[3,1,1],[4,1,1],[1,2,1],[2,2,1],[3,2,1],[4,2,1]])
N = np.array([[0,0,0],[1,0,0]])
M.shape = (8,3)
N.shape = (2,3)
I wish to do A=M-N to produce an 8x2x3 array, where for each value 1->M, there are N sets of differences in the x,y,z coordinates.
In other words:
A = array([[[1,1,1],[0,1,1]],[[2,1,1],[1,1,1]],[[3,1,1],[2,1,1]],[[4,1,1],[3,1,1]],[[1,2,1],[0,2,1]]...])
Is this possible, and if so how? Preferably without the use of any for loops
Use broadcasting:
A = M[:,None]-N
A.shape
# (8, 2, 3)
My dataframe has column names of outstanding balance from Balance, Balance1, Balance2,...,Balance36.
I want to add a column for the delta between each month, i.e. Delta2 = Balance2 - Balance1
How can I simplify by method below.
dataset$delta1 = apply(dataset[, c("Balance1","Balance")], 1, function(x){x[2]-x[1]})
dataset$delta2 = apply(dataset[, c("Balance2","Balance1")], 1, function(x){x[2]-x[1]})
...
dataset$delta35 = apply(dataset[, c("Balance35","Balance34")], 1, function(x){x[2]-x[1]})
dataset$delta36 = apply(dataset[, c("Balance36","Balance35")], 1, function(x){x[2]-x[1]})
It boils down to a one-liner. First, name your dataset something short, df is the usual name. Then, use direct subtraction; there's zero need to call apply() to subtract one column from another:
df$delta1 <- df[,"Balance1"] - df[,"Balance"]
df$delta2 <- df[,"Balance2"] - df[,"Balance1"]
...
df$delta35 <- df[,"Balance35"] - df[,"Balance34")]
df$delta36 <- df[,"Balance36"] - df[,"Balance35")]
But since the whole computation has a regular structure, we're really only talking about generating a Nx36 array of differences, so use numeric column indices. Say your "Balance*" column indices are (50:85) and your delta_cols are 100:135, or whatever. Then the indices for LHS of your "Balance*" subtraction are balance_lhs <- (50:84) and RHS indices are (51:85), or just ((50:84)+1) (remember that most operators like addition vectorize in R)
So your Nx36 array can be generated by just the one-liner:
df[,delta_cols] <- df[,(balance_lhs+1)] - df[,balance_lhs]
And you can compute delta_cols <- which(colnames(df) == c("delta1",...,"delta36") programmatically, to avoid magic-number column indices in your code.
Use lapply to calculate delta for all 36 comparisons in one line.
# Sample data (37 columns, labelled Balance, Balance1, ...)
set.seed(2017);
df <- as.data.frame(matrix(runif(37 * 100), ncol = 37));
colnames(df) <- paste("Balance", c("", seq(1:36)), sep = "");
# List of difference vectors (36 distance vectors, labelled delta1, ...)
lst <- lapply(2:ncol(df), function(i) df[, i] - df[, i - 1]);
names(lst) <- paste("delta", seq(1:36), sep = "");
# Combine with original dataframe
df <- cbind.data.frame(
df,
as.data.frame(lst));
Say, Y is a 7-dimensional array, and I need an efficient way to maximize it along the last 3 dimensions, that will work on GPU.
As a result I need a 4-dimensional array with maximal values of Y and three 4-dimensional arrays with the indices of these values in the last three dimensions.
I can do
[Y7, X7] = max(Y , [], 7);
[Y6, X6] = max(Y7, [], 6);
[Y5, X5] = max(Y6, [], 5);
Then I have already found the values (Y5) and the indices along the 5th dimension (X5). But I still need indices along the 6th and 7th dimensions.
Here's a way to do it. Let N denote the number of dimensions along which to maximize.
Reshape Y to collapse the last N dimensions into one.
Maximize along the collapsed dimensions. This gives argmax as a linear index over those dimensions.
Unroll the linear index into N subindices, one for each dimension.
The following code works for any number of dimensions (not necessarily 7 and 3 as in your example). To achieve that, it handles the size of Y generically and uses a comma-separated list obtained from a cell array to get N outputs from sub2ind.
Y = rand(2,3,2,3,2,3,2); % example 7-dimensional array
N = 3; % last dimensions along which to maximize
D = ndims(Y);
sz = size(Y);
[~, ind] = max(reshape(Y, [sz(1:D-N) prod(sz(D-N+1:end))]), [], D-N+1);
sub = cell(1,N);
[sub{:}] = ind2sub(sz(D-N+1:D), ind);
As a check, after running the above code, observe for example Y(2,3,1,2,:) (shown as a row vector for convenience):
>> reshape(Y(2,3,1,2,:), 1, [])
ans =
0.5621 0.4352 0.3672 0.9011 0.0332 0.5044 0.3416 0.6996 0.0610 0.2638 0.5586 0.3766
The maximum is seen to be 0.9011, which occurs at the 4th position (where "position" is defined along the N=3 collapsed dimensions). In fact,
>> ind(2,3,1,2)
ans =
4
>> Y(2,3,1,2,ind(2,3,1,2))
ans =
0.9011
or, in terms of the N=3 subindices,
>> Y(2,3,1,2,sub{1}(2,3,1,2),sub{2}(2,3,1,2),sub{3}(2,3,1,2))
ans =
0.9011
When I have an array with the dimension (i,j,k) and a matrix with the dimension (j,q). How could I multiply each (,,k) with that matrix. An example makes more sense.
A <- array(c(rep(1,20), rep(2,20), rep(3,20)),dim = c(10,2,3))
B <- matrix(c(1:10), nrow = 2)
# multiply each A[,,i]%*%B
C <- array(NA, dim=c(nrow(A), ncol(B), 3))
C[] <- apply(A, 3, function(x) x%*%B)
I could get the results in this way, but I am looking for a more efficient way, for example with the ATensor package. I hope someone could help me with this problem.
Given a vector A defined in Matlab by:
A = [ 0
0
1
0
0 ];
we can extract its dimensions using:
size(A);
Apparently, we can achieve the same things in Julia using:
size(A)
Just that in Matlab we are able to extract the dimensions in a vector, by using:
[n, m] = size(A);
irrespective to the fact whether A is one or two-dimensional, while in Julia A, size (A) will return only one dimension if A has only one dimension.
How can I do the same thing as in Matlab in Julia, namely, extracting the dimension of A, if A is a vector, in a vector [n m]. Please, take into account that the dimensions of A might vary, i.e. it could have sometimes 1 and sometimes 2 dimensions.
A = zeros(3,5)
sz = size(A)
returns a tuple (3,5). You can refer to specific elements like sz[1]. Alternatively,
m,n = size(A,1), size(A,2)
This works even if A is a column vector (i.e., one-dimensional), returning a value of 1 for n.
This will achieve what you're expecting:
n, m = size(A); #or
(n, m) = size(A);
If size(A) is a one dimensional Tuple, m will not be assigned, while n will receive length(A). Just be sure to catch that error, otherwise your code may stop if running from a script.