storing output of simulation in an array (R) - arrays

i was trying to store my simulation output inside an array. I have written the following code:
nsim=50
res=array(0,c(nsim,20,20))
for(i in 1:nsim) {
cat("simul=",i,"\n")
simulated = NULL
stik.simulated = NULL
simulated = rpp(....)
stik.simulated = STIKhat(....)
# from stik.simulated we will get $khat and $Ktheo and
# the dimension of stik.simulated$Khat-stik.simulated$Ktheo is 20 x 20
res[i,,] = stik.simulated$Khat - stik.simulated$Ktheo
}
But whenever the function is trying to store the output inside an array, I get the following error:
simul= 1
Xrange is 20 40
Yrange is -20 20
Doing quartic kernel
Error in res[, , i] = stik.simulated$Khat - stik.simulated$Ktheo :
subscript out of bounds
seeking your help. Thanks.

I think you need to organize your code to avoid such errors. I assume you are using package stpp.
First You create, a function which generate the matrix of each iteration; Try to test your function with many values.
stick_diff <- function(u,v){
u <- seq(0,u,by=1)
v <- seq(0,v,by=1)
simulated <- rpp(...) ## call here rpp with right parameters
stik <- STIKhat(xyt=simulated$xyt,
dist=u, times=v, ...)
stik$Khat-stik$Ktheo ## !! here if u#v you will have recycling!
}
Once you are sure of your function you call it the loop, with right dimensions.
nsim <- 50
u <- 20
res=array(0,c(nsim,u,u))
for(i in 1:nsim)
res[i,,] <- stick_diff(i,u,u)

Related

Optimizing function speed on 3D array

I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?
You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.

R routine always samples from last row of array instead of random rows

I have been debugging the following routine for some time.
A problem that came to my attention is that sampling is always done on the last row of my array every time I run the simulation. I want it to select rows at random each time the code is run.
Here's what I have:
N <- 10
Hstar <- 5
perms <- 10 ### How many permutations are we considering
specs <- 1:N
Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(perms, N))
haps <- as.character(1:Hstar)
Assign probabilities
probs <- rep(1/Hstar, Hstar)
Generate permutations
for(i in 1:perms){
pop[i, ] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(perms, N))
for(j in specs){
for(i in 1:perms){
ind.index <- sample(specs, size = j, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[i, ind.index] ## pull those individuals from a permutation
HAC.mat[i, j] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each ### permutation (i)
}
}
When I look at ind.index and hap.plot, I notice that values from haps are always taken from the last row in the pop variable and I can quite understand why this is occurring. I would like it to randomly sample from a given row in pop.
Any help is greatly appreciated.
I have found a workaround that looks like it works.
hap.plot <- pop[sample(nrow(pop), size = 1, replace = TRUE), ]

Need a bit of guidance in mathematical morphology operation "dilation" code in R

I am trying to apply a mathematical morphology operation on a binary image. The operation am trying to apply is dilation. I am posting the code below.
# Dilation of A with a square with size a. The reference is (1,1)
#
mm_dilationsqr <- function(A,a){
C <- A
if(a<=1) return(C)
for(i in 1:a)
for(j in 1:a)
{
B <- mm_translation(A,1-i,1-j)
C <- mm_union(C,B)
}
C <- mm_translation(C,a/2,a/2)
return(C)
}
The above is a dilation function which calls translation and union functions too which are posted below.
# Translation of set A by x,y.
# Warning: no periodicity, watch the borders!
#
mm_translation <- function(A,x,y){
C <- mm_zero()
if((x>=-M+1) & (x<=M) & (y>=-N+1) & (y<=N))
{
for(i in 1:M)
for(j in 1:N)
{
if((i+x>=0) & (i+x<=M) & (j+y>=0) & (j+y<=N))
C[i+x,j+y] <- A[i,j]
}
}
return(C)
}
Union function below:
# Union (OR) of sets A and B
#
mm_union <- function(A,B){
C<-as.integer(A|B)
dim(C) <- c(M,N)
return(C)
}
Now when am trying to process the image in array form using mm_dilation function, I am getting this error: Error in mm_union(C, B) : binary operation on non-conformable arrays.
My array dimensions are
dim(TA)
[1] 745 691
When am using a subset of the above given TA array of dimension given below, the code is working in that case.
dim(A)
[1] 21 21
So I want to know how I can improve this so that it can process image of TA dimensions.

Outputting multiple arrays of data in R

I have a code that loops through multiple subjects and outputs the run lengths of consecutive 1's in various arrays. The output is something like this:
Variable1RunLengths 2 3 14 12 7 8
Variable2RunLengths 4 9 8 12 4 7 3
And it does this for multiple subjects. I know how to output single variable to a data frame, but I am having trouble outputting the arrays of data I'm calculating with this code. Any suggestions?
GetRL<-function(df) {
subjects <- unique(df.all$Subject)
numsubjects <- length(subjects)
runLengths.df <- data.frame()
for (i in 1:numsubjects) {
subj <- subjects[i]##names loop variable
subdf <- df.all[which(df.all$Subject == subj),] ##pulls all data for current subject
## pulls vectors within current subject for each task
patrmdf <- subdf$Patient_Room
compdf <- subdf$comp
pertoperdf <- subdf$pertoper
paperdf <- subdf$paper
##calculates runs of ones for each task, pulls lengths or all values = 1
patrmall <- rle(patrmdf)
patrmruns <- patrmall$lengths[patrmall$values == 1]
patrmslength <- length(patrmruns)
compall <- rle(compdf)
compruns <- compall$lengths[compall$values == 1]
complength <- length(compruns)
pertoperall <- rle(pertoperdf)
pertoperruns <- pertoperall$lengths[pertoperall$values == 1]
pertoperlength <- length(pertoperruns)
paperall <- rle(paperdf)
paperruns <- paperall$lengths[paperall$values == 1]
paperlength <- length(paperruns)
##outputs vectors and variables
runLengths.df <- subj
runLengths.df<- patrmruns
runLengths.df<- compruns
runLengths.df<- pertoperruns
runLengths.df <- paperruns
}
return(runLengths.df)
}
A data frame is a poor choice of data structure for this, because you have arrays that can be different sizes. I would try a list of lists. Outside the loop, you would initialize
runLengths<-list()
Then at the bottom of the loop, you would do
runLengths$subj<-list(patrm=patrmruns,
comp=compruns,
pertoper=pertoperruns,
paper=paperruns)
Then, for example, to recover the comp run lengths for subject XYZ you would write
runLengths$XYZ$comp

lapply and rbind not properly appending the results

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.
Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].
SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

Resources