Product of two 3D array and a 2D matrix - arrays

I'm trying to find a much more efficient way to code in R the following matrix:
Let A and C be two 3D array of dimension (n, n, m) and B a matrix of dimension (m, m), then M is an (n, n) matrix such that:
M_ij = SUM_kl A_ijk * B_kl * C_ijl
for (i in seq(n)) {
for (j in seq(n)) {
M[i, j] <- A[i,j,] %*% B %*% C[i,j,]
}
}
It is possible to write this with the TensorA package using i and j as parallel dimension, but I'd rather stay with base R object.
einstein.tensor(A %e% log(B), C, by = c("i", "j"))
Thanks!

I don't know if this would be faster, but it would avoid one level of looping:
for (i in seq(n))
M[i,] <- diag(A[i,,] %*% B %*% t(C[i,,]))
It gives the same answer as yours in this example:
n <- 2
m <- 3
A <- array(1:(n^2*m), c(n, n, m))
C <- A + 1
B <- matrix(1:(m^2), m, m)
M <- matrix(NA, n, n)
for (i in seq(n))
M[i,] <- diag(A[i,,] %*% B %*% t(C[i,,]))
M
# [,1] [,2]
# [1,] 1854 3216
# [2,] 2490 4032
Edited to add: Based on https://stackoverflow.com/a/42569902/2554330, here's a slightly faster version:
for (i in seq(n))
M[i,] <- rowSums((A[i,,] %*% B) * C[i,,])
I did some timing with n <- 200 and m <- 300, and this was the fastest at 3.1 sec, versus my original solution at 4.7 sec, and the one in the question at 17.4 sec.

Related

subtract the different dimensional arrays in R

Let A and B be arrays, of dimension [2,3,4] and [100,2], respectively. Note that 2 is the common dimension.
My desired answer is an array C of dimension [100,2,3,4] such that
C[h,i,j,k] = A[i,j,k] - B[h,i]
for all h,i,j,k.
Or
C[h,i,j,k] = A[i,j,k] + B[h,i]
for all h,i,j,k.
The later case is more easy to check the answer using the following example arrays.
E.g.,
A <- array(NA,c(2,3,4))
for (i in 1:2) {for(j in 1:3){for(k in 1:4){
A[i,j,k] <- i*1000000+j*100000+k*10000
}}}
B <- array(NA,c(100,2))
for (h in 1:100) {for(i in 1:2){B[h,i] <- h*10+i }}
How about this
C <- array(NA, c(dim(B)[1], dim(A)))
# Approach 1
for (h in 1 : dim(B)[1])
for(i in 1 : dim(A)[1])
C[h, i,, ] <- A[i,, ] - B[h, i]
# Approach 2
for (h in 1 : dim(B)[1])
C[h,,,] <- sweep(A, 1, B[h, ], "-")
To check if the answer is correct, pick some values for h, i, j, k
i <- 1; j <- 2; k <- 3; h <- 50
C[h, i, j, k]
#[1] 2338998
A[i,j,k] - B[h,i]
#[1] 2338998

R: Efficient way to return sums of products (compatible with JAGS)

I am attempting to calculate cell probabilities in an array that result from sums of products. This is part of a model I will run in JAGS within R. In this example, an object may be in one of 3 states. At each time interval it may remain in the same state or pass to one of the other states. With the example code below the starting state is 1 and the final state is 2. The array X is a state*state transition matrix (e.g. X[1,2] is the probability of an object in state 1 at time t being in state 2 at time t+1). Note that these are just made up numbers to test that test1 and test2 (see below) give the same result, since I know for certain that the hand calculation (test1) is correct.
X <- array((0),dim=c(3,3))
X[1,1] <- 0.1
X[1,2] <- 0.2
X[1,3] <- 0.3
X[2,1] <- 0.4
X[2,2] <- 0.5
X[2,3] <- 0.6
X[3,1] <- 0.7
X[3,2] <- 0.8
X[3,3] <- 0.9
The example is simplified. I will ultimately have a separate state*state transition matrix for each time interval because transition is time dependent. I have also removed the probability of observation from the example for simplicity. With probability of observation included (as it will need to be in the end), each cell describes the probability of observing an object in the state and time of that cell for the first time. Each element of X consists of a survival parameter (for state i at time t) and transition parameter (for state i at t-1 to state j at time t). I will derive these components separately within the model and I don't think that part will be too difficult.
Back to the example as given below. I have calculated by hand (test1) the probability for state 2 at sample occasion 4 when the release state (the time interval before sample occasion 1) is 1. This calculation consists of the sums of products of all possible ways of getting to state 2 from state 1 over the time intervals.
test1 <-
(X[1,1]*X[1,1]*X[1,1]*X[1,2])+
(X[1,1]*X[1,1]*X[1,2]*X[2,2])+
(X[1,1]*X[1,1]*X[1,3]*X[3,2])+
(X[1,1]*X[1,2]*X[2,1]*X[1,2])+
(X[1,1]*X[1,2]*X[2,2]*X[2,2])+
(X[1,1]*X[1,2]*X[2,3]*X[3,2])+
(X[1,1]*X[1,3]*X[3,1]*X[1,2])+
(X[1,1]*X[1,3]*X[3,2]*X[2,2])+
(X[1,1]*X[1,3]*X[3,3]*X[3,2])+
(X[1,2]*X[2,1]*X[1,1]*X[1,2])+
(X[1,2]*X[2,1]*X[1,2]*X[2,2])+
(X[1,2]*X[2,1]*X[1,3]*X[3,2])+
(X[1,2]*X[2,2]*X[2,1]*X[1,2])+
(X[1,2]*X[2,2]*X[2,2]*X[2,2])+
(X[1,2]*X[2,2]*X[2,3]*X[3,2])+
(X[1,2]*X[2,3]*X[3,1]*X[1,2])+
(X[1,2]*X[2,3]*X[3,2]*X[2,2])+
(X[1,2]*X[2,3]*X[3,3]*X[3,2])+
(X[1,3]*X[3,1]*X[1,1]*X[1,2])+
(X[1,3]*X[3,1]*X[1,2]*X[2,2])+
(X[1,3]*X[3,1]*X[1,3]*X[3,2])+
(X[1,3]*X[3,2]*X[2,1]*X[1,2])+
(X[1,3]*X[3,2]*X[2,2]*X[2,2])+
(X[1,3]*X[3,2]*X[2,3]*X[3,2])+
(X[1,3]*X[3,3]*X[3,1]*X[1,2])+
(X[1,3]*X[3,3]*X[3,2]*X[2,2])+
(X[1,3]*X[3,3]*X[3,3]*X[3,2])
test1 # = 0.9288
I have written a for loop that simplifies this calculation (test2) for each release state and recapture state. The problem is that over many more time intervals, my method will end up calculating the sums of products over an array with very many dimensions and I'm worried it will be very computationally expensive within JAGS.
test2 <- array((0),dim=c(3,3,3,3,3))
for (m in 1:3){
for (i in 1:3){
for (j in 1:3){
for (k in 1:3){
for (s in 1:3){
test2[m,i,j,k,s] <- X[m,i]*X[i,j]*X[j,k]*X[k,s]
}
}
}
}
}
sumtest2 <- sum(test2[1,,,,2]) # probability of state 2 at observation
# time 4 after release in state 1 at
# release time 1.
test2
sumtest2 # = 0.9288 (the same as test1 and therefore the correct result)
My question is what might be a more efficient way to achieve the same thing?
Thank you.
--------------------------------
Thanks to Dex for the very useful answer. I have a follow up question, regarding time dependent transition matrices. The code above works great but I've not been able to use it for varying numbers of iterations where the transition matrix is different at each iteration.
For each release occasion t there are T-t recapture occasions (where T is total number of study years). I would like to matrix multiply with the matrices matching the time of first possible recapture up to the last recapture time.
This is part of the long version of what I would like to achieve:
# This long version does work, with lines for release occasion 1
# and recapture occasions 1:4 only:
pr2 <- array((0), dim=c(3,3,4,4))
pr2[,,1,1] <- diag(1, 3, 3) %*% (X[,,1])
pr2[,,1,2] <- diag(1, 3, 3) %*% (X[,,1] %*% X[,,2])
pr2[,,1,3] <- diag(1, 3, 3) %*% (X[,,1] %*% X[,,2] %*% X[,,3])
pr2[,,1,4] <- diag(1, 3, 3) %*% (X[,,1] %*% X[,,2] %*% X[,,3] %*% X[,,4])
I have tried using an extra index for time and then using a for loop with matrix multiplication:
# With time dependence:
X <- array((0),dim=c(3,3,4))
X[1,1,1] <- 0.1
X[1,2,1] <- 0.2
X[1,3,1] <- 0.7
X[2,1,1] <- 0.25
X[2,2,1] <- 0.35
X[2,3,1] <- 0.4
X[3,1,1] <- 0.15
X[3,2,1] <- 0.25
X[3,3,1] <- 0.6
X[,,2] <- X[,,1]+0.05
X[,,3] <- X[,,1]-0.05
X[,,4] <- X[,,1]+0.1
# not working:
library(expm)
for (t in 1:4){
for (j in t:4){
pr1[,,t,j] <- diag(1, 3, 3) %*% (X[,,j]%^%j)
}
}
# This also doesn't work:
library(expm)
for (t in 1:4){
for (j in t:4){
pr1[,,t,j] <- diag(1, 3, 3) %*% (X[,,j]%^%1:j)
}
}
I have also tried a list approach:
# List approach
# With time dependence:
X1 <- array((0),dim=c(3,3))
X1[1,1] <- 0.1
X1[1,2] <- 0.2
X1[1,3] <- 0.7
X1[2,1] <- 0.25
X1[2,2] <- 0.35
X1[2,3] <- 0.4
X1[3,1] <- 0.15
X1[3,2] <- 0.25
X1[3,3] <- 0.6
X2 <- X1+0.05
X3 <- X1-0.05
X4 <- X1+0.1
XT <- list(X1,X2,X3,X4)
pr1 <- array((0), dim=c(3,3,4,4))
# This also doesn't work:
library(expm)
for (t in 1:4){
for (j in t:4){
pr1[,,t,j] <- diag(1, 3, 3) %*% (XT[[1:j]]%^%j)
}
}
The following code shows a long way of calculating the probabilities in the time dependent array. It gives the correct values. Writing this out in full (with 23 release occasions and 23 recapture occasions, for 3 release/recapture states), while possible seems inefficient, especially if the analysis needs to be adapted later. This shows only the first four recapture occasions for individuals first released in the first release occasion. Each cell gives the probability of an individual being recaptured at that occasion for the first time. Recapture probability is given by p and so the probability of non-recapture is given by 1-p. In my analysis, T and p are parameters to be estimated. Their values are being defined here to make sure the mechanics are working as intended.
# Make a 3D transition matrix T where z-dimension is time
T_states <- 3
T_terms <- 4
Tz <- 4
T <- array(runif(T_states * T_states * Tz)/1.5, dim = c(T_states, T_states, Tz))
# Normalise by rows - this step ensures rows sum to 1
for (z in seq(Tz)) {
T[, , z] <- T[, , z] / apply(T[, , z], 1, sum)
}
# state and time dependent recapture probability array [,recapture state,occasion]
# the first dimension is needed to repeat p across release states
p <- array((0),dim=c(3,3,4))
p[,1,1] <- 0.5
p[,1,2] <- 0.55
p[,1,3] <- 0.6
p[,1,4] <- 0.65
p[,2,1] <- 0.6
p[,2,2] <- 0.65
p[,2,3] <- 0.7
p[,2,4] <- 0.75
p[,3,1] <- 0.7
p[,3,2] <- 0.75
p[,3,3] <- 0.8
p[,3,4] <- 0.85
# Put them together (with 1-p for earlier recapture occasions):
pr2 <- array((0), dim=c(3,3,4,4))
pr2[,,1,1] <- diag(1, 3, 3) %*% (T[,,1]*p[,,1])
pr2[,,1,2] <- diag(1, 3, 3) %*% ((T[,,1]*(1-p[,,1])) %*% (T[,,2]*p[,,2]))
pr2[,,1,3] <- diag(1, 3, 3) %*% ((T[,,1]*(1-p[,,1])) %*% (T[,,2]*(1-p[,,2])) %*% (T[,,3]*p[,,1]))
pr2[,,1,4] <- diag(1, 3, 3) %*% ((T[,,1]*(1-p[,,1])) %*% (T[,,2]*(1-p[,,2])) %*% (T[,,3]*(1-p[,,3])) %*% (T[,,4]*p[,,4]))
pr2[,,1,]
I haven't been able to find a way of writing more efficient code for this that gives the same values (plenty of ways that give incorrect values).
You can solve this with matrix multiplication. Transition matrices can be matrix-multiplied together to give a transition matrix over multiple iterations. For four iterations we do this four times:
(X %*% X %*% X %*% X)
You can also use the %^% operator from the expm package to do matrix exponentiation and save having to write it out.
To find out the likelihood of an object starting in state 1 ending up in state 2 after four iterations, we can represent it as a 1x3 matrix and multiply it by the transition matrix four times.
matrix(c(1, 0, 0), nrow = 1, ncol = 3) %*% (X %*% X %*% X %*% X)
This gives the probability of it being in the three states:
[,1] [,2] [,3]
[1,] 0.756 0.9288 1.1016
Timing
I made another example based off your code with a transition matrix that has sensible probabilities. Matrix multiplication is about 100 times faster than the for loop and gives the same answer.
library("microbenchmark")
X <- array((0),dim=c(3,3))
X[1,1] <- 0.1
X[1,2] <- 0.2
X[1,3] <- 0.7
X[2,1] <- 0.25
X[2,2] <- 0.35
X[2,3] <- 0.4
X[3,1] <- 0.15
X[3,2] <- 0.25
X[3,3] <- 0.6
microbenchmark(
{test2 <- array((0),dim=c(3,3,3,3,3))
for (m in 1:3){
for (i in 1:3){
for (j in 1:3){
for (k in 1:3){
for (s in 1:3){
test2[m,i,j,k,s] <- X[m,i]*X[i,j]*X[j,k]*X[k,s]
}
}
}
}
}
sumtest2 <- sum(test2[1,,,,2])},
matrix(c(1, 0, 0), nrow = 1, ncol = 3) %*% X %*% X %*% X %*% X
)

R: Convert 3darray[i, j, ] to columns of df, fast and readable

I'm working with 3-dimensional arrays and want to have slices along the
third dimension for each position in the first two dimensions as columns in a data frame.
I also want my code to be readable for people who dont use R regularly.
Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach
is faster (14 secs) but not so readable.
Any suggestions for a nice solution?
Reproducible example here:
# Create data
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
# Idea 1: Loop
df <- data.frame(var1 = rep(0, d3))
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
})
# Idea 2: Permute dimension of array first
df2 <- data.frame(var1 = rep(0, d3))
system.time({
data.perm <- aperm(data, c(3, 1, 2))
df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
)
identical(df, df2)
I would suggest a much more simple approach:
t(apply(data, 3, c))
I hope it suits your expectations of being fast and readable.
fast, as demonstrated in the timings below.
readable because it's a basic apply statement. All that is being done is using c to convert the matrix in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array. The result just needs to be transposed....
Here's your sample data:
set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
Here are a few functions to compare:
funam <- function() t(apply(data, 3, c))
funrl <- function() {
myl <- vector("list", d3)
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
}
do.call(cbind, myl)
}
funop <- function() {
df <- data.frame(var1 = rep(0, d3))
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
}
df[-1]
}
Here are the results of the timing:
system.time(am <- funam())
# user system elapsed
# 0.000 0.000 0.062
system.time(rl <- funrl())
# user system elapsed
# 3.980 0.000 1.375
system.time(op <- funop())
# user system elapsed
# 21.496 0.000 21.355
... and a comparison for equality:
all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE
Here's an idea. Recommended read would be The R Inferno by Patrick Burns (pun intended?).
myl <- vector("list", d3) # create an empty list
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
})
user system elapsed
1.8 0.0 1.8
# bind each list element into a matrix, column-wise
do.call("cbind", myl)[1:5, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3394909 0.1266012 -0.4240452 0.2277654 -2.04943585
[2,] 1.6788653 -2.9381127 0.5781967 -0.7248759 -0.19482647
[3,] -0.6002371 -0.3132874 1.0895175 -0.2766891 -0.02109013
[4,] 0.5215603 -0.2805730 -1.0325867 -1.5373842 -0.14034565
[5,] 0.6063638 1.6027835 0.5711185 0.5410889 -1.77109124

matrix addition from an array r

I have an array with 272 matrices, each one is 2 by 2. I now want to sum these matrices up using matrix addition. So I want the return to be a single 2 by 2 matrix. Here are some code I have used.
y <- as.matrix(faithful)
B <- matrix(c(0,0,0,0),nrow = 2)
sigma <- function(n = 272,u_new) {
vec = replicate(272,B)
for (i in 1:n) {
w <- (y-u_new)[i,]
x <- ptilde1[i]*(w%*%t(w))
vec[,,i][1,1] <- x[1,1]
vec[,,i][1,2] <- x[1,2]
vec[,,i][2,1] <- x[2,1]
vec[,,i][2,2] <- x[2,2]}
vec
}
Here vec is the array with 272 matrices. Thank you in advance.
Here is code which loops a number of times (272) and adds a matrix to the same list.
B <- matrix(c(0,0,0,0),nrow = 2)
list <- list(B)
for (i in 2:272) {
list[[i]] <- B
}
To add them all together, you can use the Reduce() function:
sum <- Reduce('+', list)
> sum
[,1] [,2]
[1,] 0 0
[2,] 0 0
This is a contrived example because all the matrices are the zero matrix. I will leave it to you as a homework assignment to use the matrices you actually want to sum together.

multiply each matrix slice in a 3d array by a single matrix in R

Suppose I have a 1 x matrix mat=matrix(1,1,13)
I also have an array that is 13 x 1000 x 10.
dfarray = array(1:(13*1000*10),dim=c(13,1000,10))
Without looping, I want to return the results of this loop
dfarray2=array(NA,dim=c(1,1000,10))
for(i in 1:10){
dfarray2[,,i]=mat%*%dfarray[,,i]
}
One possibility: deform the dfarray to usual matrix, multiply and transform back to 3d array.
mat <- matrix(1, 1, 13)
dim(dfarray) <- c(13, 1000*10)
dfarray1 <- mat %*% dfarray
dim(dfarray1) <- c(1, 1000, 10)
all(dfarray1==dfarray2)
[1] TRUE

Resources