retain array class when operation results in 2-dimensional matrix

retain array class when operation results in 2-dimensional matrix - arrays

I have an array that can have one or more pages or sheets (my names for the third dimension). I am attempting to perform operations on the array. When there is only one sheet or page the result of the operation is a matrix. I would like the result to be an array. Is there a way to retain the class array even when the result of the operation has only 1 sheet or page?
Here is an example. I would like my.var.2 and my.var.3 to be arrays. The variable my.pages is set to 1 here, which seems to be causing the problem. However, my.pages can be >1. If my.pages <- 2 then my.var.2 and my.var.3 are arrays.
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.1
my.var.2 <- 2 * my.var.1[,-my.columns,]
my.var.3 <- 10 * my.var.1[,-1,]
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array(my.var.2)
my.var.3 <- as.array(my.var.3)
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array( 2 * my.var.1[,-my.columns,])
my.var.3 <- as.array(10 * my.var.1[,-1,] )
class(my.var.2)
class(my.var.3)
The switch to matrix causes problems when I try to use my.var.1 and my.var.2 in nested for-loops.
The following if statement seems to solve the problem, but also seems a little clunky. Is there a more elegant solution?
if(my.pages == 1) {my.var.2 <- array(my.var.2, c(my.rows,(my.columns-1),my.pages))}

From help([):
Usage:
x[i, j, ... , drop = TRUE]
...
drop: For matrices and arrays. If 'TRUE' the result is coerced to
the lowest possible dimension (see the examples). This only
works for extracting elements, not for the replacement. See
'drop' for further details.
Your code, revisited:
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.2 <- 2 * my.var.1[,-my.columns,,drop=FALSE]
my.var.3 <- 10 * my.var.1[,-1,,drop=FALSE]
class(my.var.2)
## [1] "array"
class(my.var.3)
## [1] "array"

Related

R Accessing vector inside list inside Array

I have a very long Array (1955x2417x1) in R where each position stores a list of two vector (named "max" and "min") of length 5.
I would like to find a simple way to create a multidimensional array (dim 1955x2417x5) where each position holds a single value from vector "max"
I have looked at answers such as array of lists in r
but so far without success.
I know I can access the list in each position of the array using
myarray[posX, PosY][[1]][["max"]]
but how to apply that to the whole Array?
SO far I have tried
newArray <- array( unlist(myarray[][[1]][["max"]]), c(1955, 2417, 5))
and
NewArray <-parApply(cl, myarray, c(1:2), function(x) {
a=x[[1]][["max"]]
} )
but the results are not right.
Do you have any suggestion?

Let
e <- list(min = 1:3, max = 4:6)
arr <- array(list(e)[rep(1, 8)], c(2, 4))
dim(arr)
# [1] 2 4
Then one option is
res <- apply(arr, 1:2, function(x) x[[1]][["max"]])
dim(res)
# [1] 3 2 4
and, if the order of dimensions matters,
dim(aperm(res, c(2, 3, 1)))
# [1] 3 2 4

Optimizing function speed on 3D array

I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?

You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.

Outputting multiple arrays of data in R

I have a code that loops through multiple subjects and outputs the run lengths of consecutive 1's in various arrays. The output is something like this:
Variable1RunLengths 2 3 14 12 7 8
Variable2RunLengths 4 9 8 12 4 7 3
And it does this for multiple subjects. I know how to output single variable to a data frame, but I am having trouble outputting the arrays of data I'm calculating with this code. Any suggestions?
GetRL<-function(df) {
subjects <- unique(df.all$Subject)
numsubjects <- length(subjects)
runLengths.df <- data.frame()
for (i in 1:numsubjects) {
subj <- subjects[i]##names loop variable
subdf <- df.all[which(df.all$Subject == subj),] ##pulls all data for current subject
## pulls vectors within current subject for each task
patrmdf <- subdf$Patient_Room
compdf <- subdf$comp
pertoperdf <- subdf$pertoper
paperdf <- subdf$paper
##calculates runs of ones for each task, pulls lengths or all values = 1
patrmall <- rle(patrmdf)
patrmruns <- patrmall$lengths[patrmall$values == 1]
patrmslength <- length(patrmruns)
compall <- rle(compdf)
compruns <- compall$lengths[compall$values == 1]
complength <- length(compruns)
pertoperall <- rle(pertoperdf)
pertoperruns <- pertoperall$lengths[pertoperall$values == 1]
pertoperlength <- length(pertoperruns)
paperall <- rle(paperdf)
paperruns <- paperall$lengths[paperall$values == 1]
paperlength <- length(paperruns)
##outputs vectors and variables
runLengths.df <- subj
runLengths.df<- patrmruns
runLengths.df<- compruns
runLengths.df<- pertoperruns
runLengths.df <- paperruns
}
return(runLengths.df)
}

A data frame is a poor choice of data structure for this, because you have arrays that can be different sizes. I would try a list of lists. Outside the loop, you would initialize
runLengths<-list()
Then at the bottom of the loop, you would do
runLengths$subj<-list(patrm=patrmruns,
comp=compruns,
pertoper=pertoperruns,
paper=paperruns)
Then, for example, to recover the comp run lengths for subject XYZ you would write
runLengths$XYZ$comp

lapply and rbind not properly appending the results

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.

Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].

SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

How to read multiple files into a multi-dimensional array

I want to make array in 3 dimension.
Here is what I tried:
z<-c(160,720,420)
first_data_set <-array(dim = length(file_1), dimnames = z)
Data that I am reading is in one level. (only x and y)
There are other data in the same format, and I need to put them in the same array with the first data. So once I finish reading all data, all of them are in the same array but there is no overwriting.
So I think array has to be 3 dimensions; otherwise I cannot keep all data that I read in loop.

Say that you have two matrices of size 3x4:
m1 <- matrix(rnorm(12), nrow = 3, ncol = 4)
m2 <- matrix(rnorm(12), nrow = 3, ncol = 4)
If you want to place them in an array, first make an array of NA's:
A <- array(as.numeric(NA), dim = c(3,4,2))
Then populate the layers with data:
A[,,1] <- m1
A[,,2] <- m2
As suggested by #Justin, you could also just put the matrices together in a list:
A2 <- list()
A2[['m1']] <- m1
A2[['m2']] <- m2
To read matrices from files: using a list makes it easier to get these matrices from files in a directory, without having to specify the dimensions in advance. Assume you want all files with extension csv:
myfiles <- dir(pattern = ".csv")
for (i in 1:length(myfiles)){
A2[[myfiles[i]]] <- read.table(myfiles[i], sep = ',')
}