Efficiently compute the row sums of a 3d array in R - arrays

Consider the array a:
> a <- array(c(1:9, 1:9), c(3,3,2))
> a
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
How do we efficiently compute the row sums of the matrices indexed by the third dimension, such that the result is:
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
??
The column sums are easy via the 'dims' argument of colSums():
> colSums(a, dims = 1)
but I cannot find a way to use rowSums() on the array to achieve the desired result, as it has a different interpretation of 'dims' to that of colSums().
It is simple to compute the desired row sums using:
> apply(a, 3, rowSums)
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
but that is just hiding the loop. Are there other efficient, truly vectorised, ways of computing the required row sums?

#Fojtasek's answer mentioned splitting up the array reminded me of the aperm() function which allows one to permute the dimensions of an array. As colSums() works, we can swap the first two dimensions using aperm() and run colSums() on the output.
> colSums(aperm(a, c(2,1,3)))
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
Some comparison timings of this and the other suggested R-based answers:
> b <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rs1 <- apply(b, 3, rowSums))
user system elapsed
1.831 0.394 2.232
> system.time(rs2 <- rowSums3d(b))
user system elapsed
1.134 0.183 1.320
> system.time(rs3 <- sapply(1:dim(b)[3], function(i) rowSums(b[,,i])))
user system elapsed
1.556 0.073 1.636
> system.time(rs4 <- colSums(aperm(b, c(2,1,3))))
user system elapsed
0.860 0.103 0.966
So on my system the aperm() solution appears marginally faster:
> sessionInfo()
R version 2.12.1 Patched (2011-02-06 r54249)
Platform: x86_64-unknown-linux-gnu (64-bit)
However, rowSums3d() doesn't give the same answers as the other solutions:
> all.equal(rs1, rs2)
[1] "Mean relative difference: 0.01999992"
> all.equal(rs1, rs3)
[1] TRUE
> all.equal(rs1, rs4)
[1] TRUE

You could chop up the array into two dimensions, compute row sums on that, and then put the output back together the way you want it. Like so:
rowSums3d <- function(a){
m <- matrix(a,ncol=ncol(a))
rs <- rowSums(m)
matrix(rs,ncol=2)
}
> a <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rowSums3d(a))
user system elapsed
1.73 0.17 1.96
> system.time(apply(a, 3, rowSums))
user system elapsed
3.09 0.46 3.74

I don't know about the most efficient way of doing this, but sapply seems to do well
a <- array(c(1:9, 1:9), c(3,3,2))
x1 <- sapply(1:dim(a)[3], function(i) rowSums(a[,,i]))
x1
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
x2 <- apply(a, 3, rowSums)
all.equal(x1, x2)
[1] TRUE
Which gives a speed improvement as follows:
> a <- array(c(1:250000, 1:250000),c(5000,5000,2))
> summary(replicate(10, system.time(rowSums3d(a))[3]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.784 2.799 2.810 2.814 2.821 2.862
> summary(replicate(10, system.time(apply(a, 3, rowSums))[3]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.730 2.755 2.766 2.776 2.788 2.839
> summary(replicate(10, system.time( sapply(1:dim(a)[3], function(i) rowSums(a[,,i])) )[3]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.840 1.852 1.867 1.872 1.893 1.914
Timings were done on:
# Ubuntu 10.10
# Kernal Linux 2.6.35-27-generic
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

If you have a multi-core system you could write a simple C function and make use of the Open MP parallel threading library. I've done something similar for a problem of mine and I get an 8 fold increase on an 8 core system. The code will still work on a single-processor system and even compile on a system without OpenMP, perhaps with a smattering of #ifdef _OPENMP here and there.
Of course its only worth doing if you know that's what's taking most of the time. Do profile your code before optimising.

Related

Indexing a matrix by column/row when it might become length 1 [duplicate]

> a<-matrix(c(1:9),3,3)
> a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> a[3,]*a[,3] # I expect 1x1 matrix as result of this.
[1] 21 48 81
> class(a)
[1] "matrix"
> class(a[3,])
[1] "integer"
In R, 1-dimensional matrix is changed to a vector. Can I avoid this?
I would like to keep 1-D matrix as a matrix. Actually, I need to throw many kind of matrix to RcppArmadillo, even zero-D matrix. Changing matrix to vector by itself is my problem.
This is an R FAQ. You need to do a[3,,drop = FALSE].
You're confusing element-by-element multiplication and matrix multiplication (see ?"*"). You want %*%:
> a[3,]%*%a[,3]
[,1]
[1,] 150

R - Indexing Array using Array

Lets assume I have an array of dim(x) <- c(3,3,3). I also have a df or matrix with two** columns containing index combinations that I need.
When I pass x[df[[1]],df[[2]],] I get a VERY large array which I then need to go through and take the diagonal of using the apply function. This is very memory and time inefficient. Is there some sort of shortcut (without using for loops) to index an array so that it would return the vector of values that the df asks for.
Trivial Example:
`a <- array(1:27,dim = c(3,3,3))
df <- data.frame(c(1,2,2,1,3,2),c(2,3,2,1,3,2))`
In this example, I would want to pass something like "a[df[[1]],df[[2]],]"
and get something like this (or transposed):
. [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 4 8 5 1 9 5
[2,] 13 17 14 10 18 14
[3,] 22 26 23 19 27 23
When I pass that function now, I get a 3-d array of dim = c(6,6,3) as apposed to the more helpful dim = c(6,3). I can easily take apply(result, 3,diag) to get what I want, but when df>>6 it takes up a lot of space (like 750GB of space and throws warnings, errors and stops execution before beginning)
This works
temp <- array(1:27, dim=c(3,3,3))
df <- data.frame(a=c(1,2,3), b=c(1,2,3), c=c(1,2,3))
temp[cbind(df[[1]], df[[2]], df[[3]])]
[1] 1 14 27
This is sometimes referred to as matrix indexing.
To query by two of the dimensions and leave the third one open, you might just use the regular matrix subsetting: For example, to select the the first and second row and second column for each of the "z" dimension matrices, you could use something like temp[1:2, 2,] or from your dataset:
temp[1:2, 2,]
[,1] [,2] [,3]
[1,] 4 13 22
[2,] 5 14 23
temp[df[[1]][1:2], df[[2]][2], ]
[,1] [,2] [,3]
[1,] 4 13 22
[2,] 5 14 23
Which are of course identical.

R subsetting and assigning in a multidimensional array

I am working with R with a 3D dimensional array. I am trying to use it like a set of 2D matrix for different time instants.
I have find a behavior that I really don't understand and I will like to know why is happening. I have tried to find a explanation here and in other places but until now I still have the doubt.
I have my 3D array like this:
array3D=array(1:45,c(5,3,3))
And as I expected I can access to an individual 2D matrix
array3D[1,,]
[,1] [,2] [,3]
[1,] 1 16 31
[2,] 6 21 36
[3,] 11 26 41
However trying to access to two 2D matrices I don't get what I expect
array3D[1:2,,]
, , 1
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
, , 2
[,1] [,2] [,3]
[1,] 16 21 26
[2,] 17 22 27
, , 3
[,1] [,2] [,3]
[1,] 31 36 41
[2,] 32 37 42
I have find that I can solve this using aperm(array3D[1:2,,]) but I don't understand what is doing.
And the other problem is when I try to do an assignment, that I don't understand why this doesn't works
array3D[1:2,,]=matrix(9:1,3,3)
array3D[1,,]
[,1] [,2] [,3]
[1,] 9 3 6
[2,] 7 1 4
[3,] 5 8 2
I think that I can solve this with a loop or maybe with aaply as I read here, but I think that if I want to work with 3D arrays is really important to understand what is happening. If someone can point me to the right direction I will be really happy.
I have tried to find the answer here and reading http://adv-r.had.co.nz/ but so far no luck.
Update
I have found that everything works if instead of using the first index I use the last one, but I still doesn't understand why.
Is something inherent to R?
Is possible to use the first one in some other way?
array3D=array(1:45,c(3,3,5))
array3D[,,1:2]=matrix(9:1,3,3)
array3D[,,2]
[,1] [,2] [,3]
[1,] 9 6 3
[2,] 8 5 2
[3,] 7 4 1
I think it's not quite clear what you want to achieve, but here are some examples:
On your first point, you can select two of the three three-by-three matrices in the z-direction by doing:
array3D[,,1:2]
And accordingly, you can replace with an array of appropriate size:
array3D[,,1:2] <- array(18:1,c(3,3,2))
About your question on why you have to use the third index: Think about it like the z-direction in a 3D coordinate system. The rows would be the x-direction (vertical) and the columns the y-direction (horizontal). When indexing array3D[1:2,,] you selected the first two rows, while keeping everything in the x and z direction.

Fast way to calculate product over one dimension of a 3D array in R

I have a three dimensional array (e.g. dimensions = 4000 x 4000 x 2). Now I'd like to calculate the product over the third dimension to obtain a two dimensional array (dimensions = 4000 x 4000) as a result.
I tried to calculate the product using prod() within the the apply() function; however this is quite time consuming. Thus, I am wondering if there is a faster and more efficient way for such calculations?
The apply() approach:
A <- array(runif(4000*4000*2),dim=c(4000,4000,2))
system.time(apply(A, c(1,2), prod))
Here a smaller example with array B:
B <- array(c(1,2,1,2,3,4,3,4),dim=c(2,2,2))
with the results B_res:
B_res <- array(c(3,3,8,8),dim=c(2,2))
Update:
As mentioned by #42- this could be done by element wise (manual) multiplication, like: B_res <- B[,,1]*B[,,2]. However, the size of the third dimension might range from 2 to x. So manually coding B[,,1]*B[,,2]... *B[,,x] might not be feasible. Here a loop calculating in a loop might be one possible solution:
array_prod <- function(C){
C_res <- C[,,1]
for(i in 2:dim(C)[3]){
C_res <- C_res*C[,,i]
}
return(C_res)
}
Here a comparison of the three approaches (apply, manual element-wise and loop multiplication):
A <- array(runif(400*400*10),dim=c(400,400,10))
system.time(apply(A, c(1,2), prod)); system.time(A[,,1]*A[,,2]*A[,,3]*A[,,4]*A[,,5]*A[,,6]*A[,,7]*A[,,8]*A[,,9]*A[,,10]); system.time(array_prod(A))
user system elapsed
0.492 0.021 0.512
user system elapsed
0.031 0.000 0.032
user system elapsed
0.032 0.001 0.032
...which shows that the apply function is significatnly slower than the other two approaches which are basically similarily fast.
This demonstrates that elementwise array multiplication is accomplished using what in R is called a vectorised approach by leaving the first two dimensions empty and using the * operator. Can also put TRUE to signify all instances of a particular dimension:
A <- array( 1:(4*4*2),dim=c(4,4,2))
apply(A, c(1,2), prod)
#============
[,1] [,2] [,3] [,4]
[1,] 17 105 225 377
[2,] 36 132 260 420
[3,] 57 161 297 465
[4,] 80 192 336 512
#=============
A[ , , 1]*A[ , , 2]
[,1] [,2] [,3] [,4]
[1,] 17 105 225 377
[2,] 36 132 260 420
[3,] 57 161 297 465
[4,] 80 192 336 512
And this shows the 100-fold improvement in performance (Although I tired of waiting for the 4000x4000 version of apply to run so I only show the results with the vectorized approach on that example:)
> A <- array(runif(400*400*2),dim=c(400,400,2))
> system.time(apply(A, c(1,2), prod)); system.time(A[,,1]*A[,,2])
user system elapsed
0.448 0.018 0.452 # the apply timings
user system elapsed
0.005 0.000 0.004 # the vectorised operation
> A <- array(runif(4000*4000*2),dim=c(4000,4000,2))
> system.time(A[,,1]*A[,,2])
user system elapsed
0.525 0.096 0.604

Extract the anti-diagonals from an array

I want to extract the anti-diagonals of an array
m=array(1:18,c(3,3,2))
My best shot
k=dim(m)[3]
mn=matrix(nrow = k, ncol = 3)
for (i in 1:k){
mn=diag(m[,,i][3:1,1:3])
}
This returns 12 14 16, the anti-diagonal of the second matrix in the array. I want to achieve this
[1] 3 5 7
[2] 12 14 16
I want the “anti-diags” as arrays
Manually diag(m[,,1][3:1,1:3]) and diag(m[,,2][3:1,1:3]) works fine, but the array I’m working with is dim(c(3,3,22)), so I thought "loop!"
MQ: How to extract the anti-diagonals from an array using the loop? (better and elegant solutions are more than welcome)
This should work:
mn <- array(NA, dim=dim(m))
for (i in 1:dim(m)[3]){
mn[,,i]=diag(m[,,i][cbind(3:1,1:3)])
}
It was unclear whether you want the "anti-diag" to become the new diag, but that is what your code suggested as the intent. The form matrix[cbind(vec1,vec2)] pulls the (R,C) referenced elements from the matrix.
If you do not want them as arrays then this is an alternate result:
mn <- array(NA, dim=c(2,3))
for (i in 1:dim(m)[3]){
mn[i,]=m[,,i][cbind(3:1,1:3)]
}
mn
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 12 14 16
This is a loopless way of getting the same values:
m[cbind( rep(3:1,2), rep(1:3,2), rep(1:2,each=3)) ]
[1] 3 5 7 12 14 16
You could use lapply across the third dimension and extract the anti-diagonal by first rotating the matrix ( see this great answer ) by reversing the column order and taking the diagonal of that. Basically like this...
out <- lapply( 1:dim(m)[3] , function(x) diag( t( apply( m[,,x] , 2 , rev ) ) ) )
[[1]]
[1] 3 5 7
[[2]]
[1] 12 14 16
If you need them glued together as an array then use do.call...
do.call( rbind , out )
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 12 14 16
In this particular case, a for loop will be much quicker (benchmark it) and you should use #DWin's answer.
It occurs to me that we can simplfy this a bit and avoid using lists and bad use of lapply (by assuming thatm is available outside the scope of lapply) because we can also simply apply across the third dimension of your matrices. So we can apply once to rotate the matrices, then take the diag of each rotated matrix like so...
rotM <- apply( m , 2:3 , rev )
out <- t( apply( rotM , 3 , diag ) )
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 12 14 16

Resources