Quickest way to find closest elements in an array in R - arrays

I would like find the fastes way in R to indentify indexes of elements in Ytimes array which are closest to given Xtimes values.
So far I have been using a simple for-loop, but there must be a better way to do it:
Xtimes <- c(1,5,8,10,15,19,23,34,45,51,55,57,78,120)
Ytimes <- seq(0,120,length.out = 1000)
YmatchIndex = array(0,length(Xtimes))
for (i in 1:length(Xtimes)) {
YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
}
print(Ytimes[YmatchIndex])

Obligatory Rcpp solution. Takes advantage of the fact that your vectors are sorted and don't contain duplicates to turn an O(n^2) into an O(n). May or may not be practical for your application ;)
C++:
#include <Rcpp.h>
#include <cmath>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector closest_pts(NumericVector Xtimes, NumericVector Ytimes) {
int xsize = Xtimes.size();
int ysize = Ytimes.size();
int y_ind = 0;
double minval = R_PosInf;
IntegerVector output(xsize);
for(int x_ind = 0; x_ind < xsize; x_ind++) {
while(std::abs(Ytimes[y_ind] - Xtimes[x_ind]) < minval) {
minval = std::abs(Ytimes[y_ind] - Xtimes[x_ind]);
y_ind++;
}
output[x_ind] = y_ind;
minval = R_PosInf;
}
return output;
}
R:
microbenchmark::microbenchmark(
for_loop = {
for (i in 1:length(Xtimes)) {
which.min(abs(Ytimes - Xtimes[i]))
}
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
fndIntvl = {
Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
Ytimes[ findInterval(Xtimes, Y2) ]
},
rcpp = closest_pts(Xtimes, Ytimes),
times = 100
)
Unit: microseconds
expr min lq mean median uq max neval cld
for_loop 3321.840 3422.51 3584.452 3492.308 3624.748 10458.52 100 b
apply 68.365 73.04 106.909 84.406 93.097 2345.26 100 a
fndIntvl 31.623 37.09 50.168 42.019 64.595 105.14 100 a
rcpp 2.431 3.37 5.647 4.301 8.259 10.76 100 a
identical(closest_pts(Xtimes, Ytimes), findInterval(Xtimes, Y2))
# TRUE

R is vectorized, so skip the for loop. This saves time in scripting and computation. Simply replace the for loop with an apply function. Since we're returning a 1D vector, we use sapply.
YmatchIndex <- sapply(Xtimes, function(x){which.min(abs(Ytimes - x))})
Proof that apply is faster:
library(microbenchmark)
library(ggplot2)
# set up data
Xtimes <- c(1,5,8,10,15,19,23,34,45,51,55,57,78,120)
Ytimes <- seq(0,120,length.out = 1000)
# time it
mbm <- microbenchmark(
for_loop = for (i in 1:length(Xtimes)) {
YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
times = 100
)
# plot
autoplot(mbm)
See ?apply for more.

We can use findInterval to do this efficiently. (cut will also work, with a little more work).
First, let's offset the Ytimes offsets so that we can find the nearest and not the next-lesser. I'll demonstrate on fake data first:
y <- c(1,3,5,10,20)
y2 <- c(-Inf, y + c(diff(y)/2, Inf))
cbind(y, y2[-1])
# y
# [1,] 1 2.0
# [2,] 3 4.0
# [3,] 5 7.5
# [4,] 10 15.0
# [5,] 20 Inf
findInterval(c(1, 1.9, 2.1, 8), y2)
# [1] 1 1 2 4
The second column (prepended with a -Inf will give us the breaks. Notice that each is half-way between the corresponding value and its follower.
Okay, let's apply this to your vectors:
Y2 <- Ytimes + c(diff(Ytimes)/2, Inf)
head(cbind(Ytimes, Y2))
# Ytimes Y2
# [1,] 0.0000000 0.06006006
# [2,] 0.1201201 0.18018018
# [3,] 0.2402402 0.30030030
# [4,] 0.3603604 0.42042042
# [5,] 0.4804805 0.54054054
# [6,] 0.6006006 0.66066066
Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
cbind(Xtimes, Y2[ findInterval(Xtimes, Y2) ])
# Xtimes
# [1,] 1 0.9009009
# [2,] 5 4.9849850
# [3,] 8 7.9879880
# [4,] 10 9.9099099
# [5,] 15 14.9549550
# [6,] 19 18.9189189
# [7,] 23 22.8828829
# [8,] 34 33.9339339
# [9,] 45 44.9849850
# [10,] 51 50.9909910
# [11,] 55 54.9549550
# [12,] 57 56.9969970
# [13,] 78 77.8978979
# [14,] 120 119.9399399
(I'm using cbind just for side-by-side demonstration, not that it's necessary.)
Benchmark:
mbm <- microbenchmark::microbenchmark(
for_loop = {
YmatchIndex <- array(0,length(Xtimes))
for (i in 1:length(Xtimes)) {
YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
}
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
fndIntvl = {
Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
Ytimes[ findInterval(Xtimes, Y2) ]
},
times = 100
)
mbm
# Unit: microseconds
# expr min lq mean median uq max neval
# for_loop 2210.5 2346.8 2823.678 2444.80 3029.45 7800.7 100
# apply 48.8 58.7 100.455 65.55 91.50 2568.7 100
# fndIntvl 18.3 23.4 34.059 29.80 40.30 83.4 100
ggplot2::autoplot(mbm)

Related

Arranging a 3 dimensional contingency table in R in order to run a Cochran-Mantel-Haenszel analysis?

I am attempting to run a Mantel-Haenszel analysis in R to determine whether or not a comparison of proportions test is still significant when accounting for a 'diagnosis' ratio within groups. This test is available in the stats package.
library(stats)
mantelhaen.test(x)
Having done some reading, I've found that this test can perform an odds ratio test on a contingency table that is n x n x k, as opposed to simply n x n. However, I am having trouble arranging my data in the proper way, as I am fairly new to R. I have created some example data...
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
ex.data <- data.frame(ex.label,ex.diag,ex.status)
Which looks like this...
ex.label ex.diag ex.status
1 A X +
2 A X +
3 A Z -
4 A Y +
5 A Y -
6 A Y -
7 A X -
8 B Y +
9 B Z +
10 B Z -
I was originally able to use a simple N-1 chi-square to run a comparison of proportions test of + to - for only the A and B, but now I want to be able to account for the ex.diag as well. I'll show a graph here for what I wanted to be looking at, which is basically to compare the significance of the ratio in each column. I was able to do this, but I now want to be able to account for ex.diag.
I tried to use the ftable() function to arrange my data in a way that would work.
ex.ftable <- ftable(ex.data)
Which looks like this...
ex.status - +
ex.label ex.diag
A X 1 2
Y 2 1
Z 1 0
B X 0 0
Y 0 1
Z 1 1
However, when I run mantelhaen.test(ex.ftable), I get the error 'x' must be a 3-dimensional array. How can I arrange my data in such a way that I can actually run this test?
In mantelhaen.test the last dimension of the 3-dimensional contingency table x needs to be the stratification variable (ex.diag). This matrix can be generated as follows:
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
# Now ex.diag is in the first column
ex.data <- data.frame(ex.diag, ex.label, ex.status)
# The flat table
( ex.ftable <- ftable(ex.data) )
# ex.status - +
# ex.diag ex.label
# X A 1 2
# B 0 0
# Y A 2 1
# B 0 1
# Z A 1 0
# B 1 1
The 3D matrix can be generated using aperm.
# Trasform the ftable into a 2 x 2 x 3 array
# First dimension: ex.label
# Second dimension: ex.status
# Third dimension: ex.diag
( mtx3D <- aperm(array(t(as.matrix(ex.ftable)),c(2,2,3)),c(2,1,3)) )
# , , 1
#
# [,1] [,2]
# [1,] 1 2
# [2,] 0 0
#
# , , 2
#
# [,1] [,2]
# [1,] 2 1
# [2,] 0 1
#
# , , 3
#
# [,1] [,2]
# [1,] 1 0
# [2,] 1 1
Now the Cochran-Mantel-Haenszel chi-squared test can be performed.
# Cochran-Mantel-Haenszel chi-squared test of the null that
# two nominal variables are conditionally independent in each stratum
#
mantelhaen.test(mtx3D, exact=FALSE)
The results of the test is
Mantel-Haenszel chi-squared test with continuity correction
data: mtx3D
Mantel-Haenszel X-squared = 0.23529, df = 1, p-value = 0.6276
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
NaN NaN
sample estimates:
common odds ratio
Inf
Given the low number of cases, it is preferable to compute an exact conditional test (option exact=TRUE).
mantelhaen.test(mtx3D, exact=T)
# Exact conditional test of independence in 2 x 2 x k tables
#
# data: mtx3D
# S = 4, p-value = 0.5
# alternative hypothesis: true common odds ratio is not equal to 1
# 95 percent confidence interval:
# 0.1340796 Inf
# sample estimates:
# common odds ratio
# Inf

R: "Error in matrix[i, 1] : incorrect number of dimensions"

With DV and FF as vectors of length 47:
analyse <- function(DV,FF)
{
correct <- rep(0,47)
matrix <- array(rep(0,47*3), dim=c(47,3))
for(i in 1:47)
{
if(DV[i] > 50) {if(FF[i] > 50) {correct[i] <- i}
}
else
{
if(FF[i] < 0){correct[i] <- i}}
}
for(i in 1:47)
{
if((correct[i] == 0) == FALSE)
{
matrix[i,1] <- DV[i]
matrix[i,2] <- FF[i]
matrix[i,3] <- matrix[i,1] - matrix[i,2]
}
}
for(i in 47:1)
{
if(matrix[i,1]==0) {matrix<-matrix[-i]
}
}
return(matrix)
}
I do not understand why I am getting this error:
Error in matrix[i, 1] : incorrect number of dimensions
Thanks in advance
[edit] sample data:
DV <- c(56.2, 59.2, 50.9, 46.9, 50.7, 47.3, 53.6, 57.8, 42.7, 45.0, 47.3, 44.1, 51.5, 50.0, 50.3, 50.4, 51.7, 47.8, 46.8, 40.0, 45.5, 57.4, 51.6, 36.1, 34.8, 41.2, 59.1, 62.5, 55.0, 53.8, 52.4, 44.5, 42.2, 50.1, 61.3, 49.6, 38.2, 51.1, 44.7, 40.8, 46.1, 53.5, 54.7, 50.3, 48.8, 53.7, 52.0)
DF <- c(49.95662, 51.93295, 53.02263, 50.00784, 48.55493, 49.93520, 48.70022, 50.98856, 52.51411, 47.02938, 47.86480, 48.70022, 47.53790, 50.22578, 49.68094, 49.78991, 49.82623, 50.29842, 48.88184, 48.51861, 46.04866, 48.04641, 52.36882, 50.26210, 44.63208, 44.15988, 46.48454, 52.98631, 54.22128, 51.49707, 51.06120, 50.55268, 47.68319, 46.84776, 49.71726, 53.78541, 49.53565, 45.39485, 50.08049, 47.75583, 46.33925, 48.26435, 50.95223, 51.38811, 49.78991, 49.24506, 51.02488)
[edit] result:
Scope of the function it to obtain a matrix which contains:
- every couple of DV[i] and FF[i] which are not both higher (or lower) than 50.
- their difference as third column.
example:
DV[1] = 55
FF[1] = 45
DV > 50 and FF < 50, so I report them in the matrix:
DV[1] -> matrix [1,1]
FF[1] -> matrix[1,2]
Third column being their difference:
matrix[1,3] <- matrix[1,1] - matrix[1,2].
With DV[2] = 55 and FF[2] = 55, analyse() does nothing because they're both higher than 50.
You can replace the final for loop with a vectorized solution:
analyse <- function(DV,FF)
{
correct <- rep(0,47)
matrix <- array(rep(0,47*3),dim=c(47,3))
for(i in 1:47)
{
if( DV[i] > 50 ) {
if( FF[i] > 50) {
correct[i] <- i
}
}
else {
if( FF[i] < 0) {
correct[i] <- i}
}
}
for(i in 1:47)
{
if( (correct[i] == 0) == FALSE)
{
matrix[i,1] <- DV[i]
matrix[i,2] <- FF[i]
matrix[i,3] <- matrix[i,1] - matrix[i,2]
}
}
matrix <- matrix[ matrix[,1] != 0, ]
return(matrix)
}
analyse(DV, FF)
# [,1] [,2] [,3]
# [1,] 59.2 51.93295 7.26705
# [2,] 50.9 53.02263 -2.12263
# [3,] 57.8 50.98856 6.81144
# [4,] 51.6 52.36882 -0.76882
# [5,] 62.5 52.98631 9.51369
# [6,] 55.0 54.22128 0.77872
# [7,] 53.8 51.49707 2.30293
# [8,] 52.4 51.06120 1.33880
# [9,] 54.7 50.95223 3.74777
# [10,] 50.3 51.38811 -1.08811
# [11,] 52.0 51.02488 0.97512
But as you mentioned, this is inefficient. There is no need for loops. The below function provides identical output.
analyse2 <- function(DV, FF) {
indx <- (DV > 50 & FF > 50) | FF < 0
dif <- DV[indx] - FF[indx]
matrix(c(DV[indx], FF[indx], dif), ncol=3)
}
analyse2(DV, FF)
# [,1] [,2] [,3]
# [1,] 59.2 51.93295 7.26705
# [2,] 50.9 53.02263 -2.12263
# [3,] 57.8 50.98856 6.81144
# [4,] 51.6 52.36882 -0.76882
# [5,] 62.5 52.98631 9.51369
# [6,] 55.0 54.22128 0.77872
# [7,] 53.8 51.49707 2.30293
# [8,] 52.4 51.06120 1.33880
# [9,] 54.7 50.95223 3.74777
# [10,] 50.3 51.38811 -1.08811
# [11,] 52.0 51.02488 0.97512
all.equal(analyse(DV, FF), analyse2(DV, FF))
[1] TRUE
edit
Based on your description, you want the values less than 50 to be reported to the matrix, not the values greater than 50 as you made in your original function. This is an edited function with an exclamation point added to the second line.
analyse2 <- function(DV, FF) {
indx <- (!DV > 50 & FF > 50) | FF < 0
dif <- DV[indx] - FF[indx]
matrix(c(DV[indx], FF[indx], dif), ncol=3)
}
analyse2(DV, FF)
# [,1] [,2] [,3]
# [1,] 46.9 50.00784 -3.10784
# [2,] 42.7 52.51411 -9.81411
# [3,] 50.0 50.22578 -0.22578
# [4,] 47.8 50.29842 -2.49842
# [5,] 36.1 50.26210 -14.16210
# [6,] 44.5 50.55268 -6.05268
# [7,] 49.6 53.78541 -4.18541
# [8,] 44.7 50.08049 -5.38049
edit 2
With either both greater or lesser than 50.
analyse3 <- function(DV, FF) {
indx <- !( (DV > 50 & FF > 50) | (DV < 50 & FF < 50) )
dif <- DV[indx] - FF[indx]
matrix(c(DV[indx], FF[indx], dif), ncol=3)
}

List of n first Neighbors from a 3d Array R

Lets say we have a 3d array:
my.array <- array(1:27, dim=c(3,3,3))
I would like to create a list of the n first neighbors.
Example: Lets get my.array[2,2,2]=14, so the first neighbors of 14 is:
list[14] = [1 to 27] - 14
I also would like to do the same for second, third, n closest neighbors using R, C or Matlab.
Thanks
Based on the comments, I assume you are defining "first nearest neighbor" as all cells with a euclidean distance of 1 or less (excluding self), "second nearest neighbors" as those with 2 or less, etc. Your assertion in a comment in #evan058's answer that "for (1,1,1) the first level neighbors is 2,4,5,10,11,13", I'm actually interpreting this to include the immediate diagonals (with a distance of 1.414) but not further diagonals (in your example, 14 would be a further diagonal with a distance of 1.732).
This function accepts either a pre-defined array (ary) or the dimensions to make one (dims).
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 1 2 1
# [3,] 1 1 2
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1,
return_indices = FALSE)
# [1] 2 4 10
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 2 3 4 5 7 10 11 13 14 19
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 3 1 1
# [3,] 1 2 1
# [4,] 2 2 1
# [5,] 1 3 1
# [6,] 1 1 2
# [7,] 2 1 2
# [8,] 1 2 2
# [9,] 2 2 2
# [10,] 1 1 3
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 26 25 24 23 21 18 17 15 14 9
The function:
#' Find nearest neighbors.
#'
#' #param ary array
#' #param elem integer vector indicating the indices on array from
#' which all nearest neighbors will be found; must be the same
#' length as \code{dims} (or \code{dim(ary)}). Only one of
#' \code{ary} and \code{dim} needs to be provided.
#' #param dist numeric, the max distance from \code{elem}, not
#' including the 'self' point.
#' #param dims integer vector indicating the dimensions of the array.
#' Only one of \code{ary} and \code{dim} needs to be provided.
#' #param return_indices logical, whether to return a matrix of
#' indices (as many columns as dimensions) or the values from
#' \code{ary} of the nearest neighbors
#' #return either matrix of indices (one column per dimension) if
#' \code{return_indices == TRUE}, or the appropriate values in
#' \code{ary} otherwise.
nearestNeighbors <- function(ary, elem, dist, dims, return_indices = TRUE) {
if (missing(dims)) dims <- dim(ary)
tmpary <- array(1:prod(dims), dim = dims)
if (missing(ary)) ary <- tmpary
if (length(elem) != length(dims))
stop("'elem'' needs to have the same dimensions as 'ary'")
# work on a subset of the whole matrix
usedims <- mapply(function(el, d) {
seq(max(1, el - dist), min(d, el + dist))
}, elem, dims, SIMPLIFY=FALSE)
df <- as.matrix(do.call('expand.grid', usedims))
# now, df is only as big as we need to possibly satisfy `dist`
ndist <- sqrt(apply(df, 1, function(x) sum((x - elem)^2)))
ret <- df[which(ndist > 0 & ndist <= dist),,drop = FALSE]
if (return_indices) {
return(ret)
} else {
return(ary[ret])
}
}
Edit: changed the code for a "slight" speed improvement: using a 256x256x256 array and a distance of 2 previously took ~90 seconds on my machine. Now it takes less than 1 second. Even a distance of 5 (same array) takes less than a second. Not fully tested, please verify it is correct.
Edit: Removed the extra { on the fifty line of the function.
I think something along these lines will do the trick:
nClosest <- function(pts, pt, n)
{
# Get the target value
val <- pts[pt[1], pt[2], pt[3]]
# Turn the matrix into a DF
ptsDF <- adply(pts, 1:3)
# Create Dist column for distance to val
ptsDF$Dist <- abs(ptsDF$V1 - val)
# Order by the distance to val
ptsDF <- ptsDF[with(ptsDF, order(Dist)),]
# Split into groups:
sp <- split(ptsDF, ptsDF$Dist)
# Get max index
topInd = min(n+1, length(sp))
# Agg the split dfs into a single df
rbind.fill(sp[2:topInd])
}
Output:
> nClosest(my.array, c(1,2,2), 3)
X1 X2 X3 V1 Dist
1 3 1 2 12 1
2 2 2 2 14 1
3 2 1 2 11 2
4 3 2 2 15 2
5 1 1 2 10 3
6 1 3 2 16 3

R : How to fill an array with this peculiar calculation?

I would fill an array D with a loop, and only with a loop (please), where my data are structured in this particular way:
A <- data.frame(matrix(nrow=12,ncol=10))
c_2 <- c(0.003,0.004)
an <- sapply(c_2,function(x) x*c(1:12))
B <-array(an,c(12,1,2))
set.seed(1)
C<- rnorm(10,0.6,0.1)
D <- array(NA,c(12,1,20))
f_12 <- exp(c(0:11)/12)
for (k in 1:length(A)){
for (i in 1:dim(B)[3]){
for (z in 1:length(C)){
M_nat <- C[z]
A[,z] <- f_12*M_nat
ris_1 <- A[,k]
cost_1 <- B[,,i]
prov_1 <- cost_1*ris_1
D[,,k*i] <- prov_1
}
}
}
My expected result is an array D, where each [,,z] dimension is a result from B[,,1] and B[,,2] for each column of A(A in the loop).
With the above code the R result is an array where the first ten z dimension are full, and after, some have values, and others are NA. Where did I go wrong?
outer(1:10, 1:2, "*") tells you which indices you can fill with D[,,k*i]:
# [,1] [,2]
# [1,] 1 2
# [2,] 2 4
# [3,] 3 6
# [4,] 4 8
# [5,] 5 10
# [6,] 6 12
# [7,] 7 14
# [8,] 8 16
# [9,] 9 18
#[10,] 10 20
These are the ones not possible:
(1:20)[!(1:20 %in% outer(1:10, 1:2, "*"))]
#[1] 11 13 15 17 19
And indeed, those elements are not filled in D. Note that you filled some elements more than once.
You could use (i-1) * 10 + k instead if k*i.

R: Convert 3darray[i, j, ] to columns of df, fast and readable

I'm working with 3-dimensional arrays and want to have slices along the
third dimension for each position in the first two dimensions as columns in a data frame.
I also want my code to be readable for people who dont use R regularly.
Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach
is faster (14 secs) but not so readable.
Any suggestions for a nice solution?
Reproducible example here:
# Create data
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
# Idea 1: Loop
df <- data.frame(var1 = rep(0, d3))
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
})
# Idea 2: Permute dimension of array first
df2 <- data.frame(var1 = rep(0, d3))
system.time({
data.perm <- aperm(data, c(3, 1, 2))
df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
)
identical(df, df2)
I would suggest a much more simple approach:
t(apply(data, 3, c))
I hope it suits your expectations of being fast and readable.
fast, as demonstrated in the timings below.
readable because it's a basic apply statement. All that is being done is using c to convert the matrix in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array. The result just needs to be transposed....
Here's your sample data:
set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
Here are a few functions to compare:
funam <- function() t(apply(data, 3, c))
funrl <- function() {
myl <- vector("list", d3)
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
}
do.call(cbind, myl)
}
funop <- function() {
df <- data.frame(var1 = rep(0, d3))
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
}
df[-1]
}
Here are the results of the timing:
system.time(am <- funam())
# user system elapsed
# 0.000 0.000 0.062
system.time(rl <- funrl())
# user system elapsed
# 3.980 0.000 1.375
system.time(op <- funop())
# user system elapsed
# 21.496 0.000 21.355
... and a comparison for equality:
all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE
Here's an idea. Recommended read would be The R Inferno by Patrick Burns (pun intended?).
myl <- vector("list", d3) # create an empty list
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
})
user system elapsed
1.8 0.0 1.8
# bind each list element into a matrix, column-wise
do.call("cbind", myl)[1:5, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3394909 0.1266012 -0.4240452 0.2277654 -2.04943585
[2,] 1.6788653 -2.9381127 0.5781967 -0.7248759 -0.19482647
[3,] -0.6002371 -0.3132874 1.0895175 -0.2766891 -0.02109013
[4,] 0.5215603 -0.2805730 -1.0325867 -1.5373842 -0.14034565
[5,] 0.6063638 1.6027835 0.5711185 0.5410889 -1.77109124

Resources