lapply and rbind not properly appending the results - arrays

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.

Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].

SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

Related

Is there any function that calculate correlation between a set of matrices included in an array in R?

I have a list that includes 20 matrices. I want to calculate Pearson's correlation betweeen all matrices. but I can not find any possible code or functions? Could you please give some tips for doing so.
something like:
a=matrix(1:8100, ncol = 90)
b=matrix(8100:16199, ncol = 90)
c=matrix(sample(16200:24299),ncol = 90)
z=list(a,b,c)
I find this:
https://rdrr.io/cran/lineup/man/corbetw2mat.html and try it:
library(lineup)
corbetw2mat(z[a], z[b], what = "all")
I've got the following error:
Error in corbetw2mat(z[a], z[b], what = "all") :
(list) object cannot be coerced to type 'double'
I want a list like this for the result:
a & b
correlations
a & c
correlations
b & c
correlations
Thanks
I will create a smaller data set to illustrate the solution below.
To get pairwise combinations the best option is to compute a matrix of combinations with combn and then loop through it, in this case a lapply loop.
set.seed(1234) # Make the results reproducible
a <- matrix(1:9, ncol = 3)
b <- matrix(rnorm(9), ncol = 3)
c <- matrix(sample(1:9), ncol = 3)
sample_list <- list(a, b, c)
cmb <- combn(3, 2)
res <- lapply(seq.int(ncol(cmb)), function(i) {
cor(sample_list[[ cmb[1, i] ]], sample_list[[ cmb[2, i] ]])
})
The results are in the list res.
Note that sample is a base r function, so I changed the name to sample_list.

Optimizing function speed on 3D array

I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?
You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.

R routine always samples from last row of array instead of random rows

I have been debugging the following routine for some time.
A problem that came to my attention is that sampling is always done on the last row of my array every time I run the simulation. I want it to select rows at random each time the code is run.
Here's what I have:
N <- 10
Hstar <- 5
perms <- 10 ### How many permutations are we considering
specs <- 1:N
Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(perms, N))
haps <- as.character(1:Hstar)
Assign probabilities
probs <- rep(1/Hstar, Hstar)
Generate permutations
for(i in 1:perms){
pop[i, ] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(perms, N))
for(j in specs){
for(i in 1:perms){
ind.index <- sample(specs, size = j, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[i, ind.index] ## pull those individuals from a permutation
HAC.mat[i, j] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each ### permutation (i)
}
}
When I look at ind.index and hap.plot, I notice that values from haps are always taken from the last row in the pop variable and I can quite understand why this is occurring. I would like it to randomly sample from a given row in pop.
Any help is greatly appreciated.
I have found a workaround that looks like it works.
hap.plot <- pop[sample(nrow(pop), size = 1, replace = TRUE), ]

loop through column and add other row

EDIT: I've made some progress. So I read up on subsets, and was able to break down my dataframe under a certain condition. Let's say titleCSV[3] consists of file names ("file1", "file2", "file3", etc) and titleCSV[13] contains values (-18, -8, -2, etc). Code below:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(x.RMS in titleRMS[2]){
x.RMS <- gsub("[A-Za-z]","",r)
x.RMS <- gsub(" ","",r)
x.RMS = abs(as.numeric(r))
}
x.titleRMSJudge <- data.frame(titleRMS[1], x.RMS)
x.titleRMSResult <- subset(x.titleRMSJudge, r < 12)
My question now is, what's the best way to print each row of the first column of x.titleRMSResult with a message saying that it's loud? Thanks, guys!
BTW, here is the dput of my titleRMS:
dput(titleRMS)
structure(list(FILE.NAME = c("00-Introduction.mp3", "01-Chapter_01.mp3",
"02-Chapter_02.mp3", "03-Chapter_03.mp3", "04-Chapter_04.mp3",
"05-Chapter_05.mp3", "06-Chapter_06.mp3", "07-Chapter_07.mp3",
"08-Chapter_08.mp3", "09-Chapter_09.mp3", "10-Chapter_10.mp3",
"11-Chapter_11.mp3", "12-Chapter_12.mp3", "Bonus_content.mp3",
"End.mp3"), AVG.RMS..dB. = c(-14, -10.74, -9.97, -10.53, -10.94,
-12.14, -11, -9.19, -10.42, -11.51, -14, -10.96, -11.71, -11,
-16)), .Names = c("FILE.NAME", "AVG.RMS..dB."), row.names = c(NA,
-15L), class = "data.frame")
ORIGINAL POST BELOW
Newb here! Coding in R. So I am trying to analyze a csv file. One column has 10 rows with different file names, while the other has 10 rows with different values. I want to run the 2nd column into a loop, and if it's greater/less than a certain value, I wanted it to print the associating file name as well as a message. I don't know how to have both columns run in a loop together so that the proper file name prints with the proper value/message. I wrote a loop that ends up checking each value for as many rows as there are in the other column. At the moment, all 10 rows meet the criteria for the message I want to print, so I've been getting 100 messages!
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(title in titleRMS[1]){
title <- gsub(" ","",title)
}
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(t in title){
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
}
And this line of code only prints the first file name for each message:
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
Can someone throw me some tips or even re-write what I wrote to show me how to get what I need? Thanks, guys!
I've figured out my own issue. Here is what I wrote to come to the conclusion I wanted:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
filesHighRMS <- vector()
x.titleRMSJudge <- data.frame(titleCSV[3], titleCSV[13])
x.titleRMSResult <- subset(x.titleRMSJudge, titleCSV[13] > -12 & titleCSV[15] > -1)
for(i in x.titleRMSResult[,1]){
filesHighRMS <- append(filesHighRMS, i, 999)
}
emailHighRMS <- paste(filesHighRMS, collapse=", ")
blurbHighRMS <- paste("" ,nrow(x.titleRMSResult), " file(s) (" ,emailHighRMS, ") have a high RMS and are too loud.")
Being new to code, I bet there is a simpler way, I'm just glad I was able to work this out on my own. :-)
You're making things hard on yourself. You don't need regex for this, and you probably don't need a loop, at least not through your data frame. Definitely you don't need nested loops.
I think this will do what you say you want...
indicesToMessage <- titleRms[, 2] > 0 & titleRms[, 2] < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
for (i in 1:length(myMessages)) {
message(myMessages[i])
}
A more R-like way (read: without an explicit loop) to do the last line is like this:
invisible(lapply(myMessages, message))
The invisible is needed because message() doesn't return anything, just has the side-effect of printing to the console, but lapply expects a return and will print NULL if there is none. invisible just masks the NULL.
Edits: Negative data
Since your data is negative, I assume you actually want messages when the absolute value abs() is between 0 and 18. This works for that case.
indicesToMessage <- abs(titleRms[, 2]) > 0 & abs(titleRms[, 2]) < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
invisible(lapply(myMessages, message))

retain array class when operation results in 2-dimensional matrix

I have an array that can have one or more pages or sheets (my names for the third dimension). I am attempting to perform operations on the array. When there is only one sheet or page the result of the operation is a matrix. I would like the result to be an array. Is there a way to retain the class array even when the result of the operation has only 1 sheet or page?
Here is an example. I would like my.var.2 and my.var.3 to be arrays. The variable my.pages is set to 1 here, which seems to be causing the problem. However, my.pages can be >1. If my.pages <- 2 then my.var.2 and my.var.3 are arrays.
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.1
my.var.2 <- 2 * my.var.1[,-my.columns,]
my.var.3 <- 10 * my.var.1[,-1,]
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array(my.var.2)
my.var.3 <- as.array(my.var.3)
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array( 2 * my.var.1[,-my.columns,])
my.var.3 <- as.array(10 * my.var.1[,-1,] )
class(my.var.2)
class(my.var.3)
The switch to matrix causes problems when I try to use my.var.1 and my.var.2 in nested for-loops.
The following if statement seems to solve the problem, but also seems a little clunky. Is there a more elegant solution?
if(my.pages == 1) {my.var.2 <- array(my.var.2, c(my.rows,(my.columns-1),my.pages))}
From help([):
Usage:
x[i, j, ... , drop = TRUE]
...
drop: For matrices and arrays. If 'TRUE' the result is coerced to
the lowest possible dimension (see the examples). This only
works for extracting elements, not for the replacement. See
'drop' for further details.
Your code, revisited:
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.2 <- 2 * my.var.1[,-my.columns,,drop=FALSE]
my.var.3 <- 10 * my.var.1[,-1,,drop=FALSE]
class(my.var.2)
## [1] "array"
class(my.var.3)
## [1] "array"

Resources