Making an array out of big number of matrices in r - arrays

I'm trying to work with arrays, but I can't seem to make one that works for my data. I have 14 matrices I would like to put in an array, but I can't figure out the way to do it without manually writing c(m1,m2,m3...) to put in all of them
this is what i tried:
m_list <- mget(paste0("well_", 0:13)) ###to make a list of all my matrices
a <- array(c(m_list),
dim = c(7338, 15, 14))
but when I try to look at the array I created something is not right with it cause I try to call for one value, like this:
print(a[1,4,2])
but I get entire columns.
I assume the error is in the list of matrices. Please help

An answer to your question is that you should use do.call(c, m_list) instead of c(m_list). (Take a couple of small matrices and try to see what c(m_list) and c(m1, m2) return.)
Also you might want to think some more whether working with an array is better than working with a list and, more importantly, how you could avoid having multiple matrices in the first place and instead to directly read/define them as a list or an array.

You can simply use unlist inside your array function call instead of c.
a = array(unlist(m_list), dim = c(dim(m_list[[1]]), length(m_list)))
Some reproducible data:
m1 = matrix(1:5, 5, 5)
m2 = matrix(5:1, 5, 5)
m_list = list(m1, m2)

Related

Parallelize nested for-loop on 3 dimensional array in R

Using R on a Windows machine, I am currently running a nested loop on a 3D array (720x360x1368) which cycles through d1 and d2 to apply a function over d3 and assemble the output to a new array of similar dimensionality.
In the following reproducible example, I have reduced the dimensions by factor 10, to make execution faster.
library(SPEI)
old.array = array(abs(rnorm(50)), dim=c(72,36,136))
new.array = array(dim=c(72,36,136))
for (i in 1:72) {
for (j in 1:36) {
new.listoflists <- spi(ts(old.array[i,j,], freq=12, start=c(1901,1)), 1, na.rm = T)
new.array[i,j,] = new.listoflists$fitted
}
}
where spi() is a function from the SPEI package returning a list of lists, of which one particular list $fittedof length 1368 is used from each loop increment to cunstruct the new array.
While this loop works flawlessly, it takes quite a long time to compute. I have read that foreachcan be used to parallelize for loops.
However, I do not understand how the nesting and the assembling of the new array can be achieved such that the dimnames of the old and the new array are consistent.
(In the end, what I want to be able to, is to transform both the old and the new array into a "flat" long panel data frame using as.data.frame.table() and merge them along their three dimensions.)
Any help on how I can achieve the desired output using parallel computing will be highly appreciated!
Cheers
CubicTom
It would have been better with a reproducible example, here is what i come up with:
First create the cluster to use
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
Then you create the loop and close the cluster:
zz <- foreach(i = 1:720, .combine = c) %:%
foreach(j = 1:360, .combine = c ) %dopar% {
new.listoflists <- FUN(old.array[i,j,])
new.array[i,j,] <- new.listoflists$list
}
stopCluster(cl)
This will create a list zz containing every iteration of new.array[i,j,], then you can bind them together with:
new.obj <- plyr::ldply(zz, data.frame)
Hope this helps you!
I did not use as much of dimensions as your question because I wanted to ensure the behavior was correct.
So here I use mapply which take multiple arguments. The result is a list of the results. Then I wrapped it with matrix() to get the dimensions you hoped for.
Please note that i is repeated using times and j is repeated using each. This is critical as matrix() put entries by row first then wraps to the next column when the number of row is reached.
new.array = array(1:(5*10*4), dim=c(5,10,4))
# FUN: function which returns lists of
FUN <- function(x){
list(lapply(x, rep, times=3))
}
# result of the computation
result <- matrix(
mapply(
function(i,j,...){
FUN(new.array[i,j,])
}
,i = rep(1:nrow(new.array),times=ncol(new.array))
,j = rep(1:ncol(new.array),each=nrow(new.array))
,new.array=new.array
)
,nrow=nrow(new.array)
,ncol=ncol(new.array)
)

R populate multidimensional array

Hi I am stuck with one of these simple but time-consuming errors:
How can I populate an array with loops? I know I am on a C approach here
and R isn't C.
Data <-[SOMETHING HERE]
One <-200
Two <-100
array222 <- array(0,length(SomeLength))
for (i in 1:One)
{
for (j in 1:Two)
{
array222[i][j] = sample(Data,1)
}
I want to populate the array with random samples from another dataset but all
I get is this:
Warning in array222[i][j] = sample(Data, 1) :
number of items to replace is not a multiple of replacement length
First of all, you wouldn't use loops to do this in R. You'd just do
array222 <- matrix(sample(Data, One*Two, replace=T), nrow=One, ncol=Two)
But going back to your code, you fail to properly initialize your array222 variable. The matrix() syntax is probably easier for a 2-D array, but you could also use array(0, dim=c(One,Two)). You need to create it with the proper dimensions.
And additionally, the proper way to index a dimensional array is
array222[i,j] #NOT array222[i][j]

How can I store a list of ggplots to use in multiplot without overwriting previous plots?

I want to plot some heatmaps of covariance/correlation matrices in a multiplot using an object created from another function (the cd parameter below). The covariance matrices are stored in an array of 3 dimensions, so that cd$covmat[,,i] calls the ith covariance matrix.
Originally I had some issues with this with having the same plot replicated. However, I discovered I had an environment issue. I've tried resolving this several ways, with the code below being the most recent, but I can't figure out why it's not reading it properly.
Is there a particular reason for this? I've tried including and excluding the environment parameter (which I hopefully shouldn't need) and I've tried directly using the cd$covmat[,,i] in the
aes() parameter.
drawCovs<-function(cd,ncols){
require(ggplot2)
coords=expand.grid(x=1:cd$q,y=1:cd$q)
climits = c(-1,1)*max(cd$covmat)
cd$levels=c(cd$levels,"Total")
covtext=ifelse(!(cd$use.cor),'Covariance','Correlation')
plots=list()
cmat=list()
for (i in 1:(nlevels+1)){
cmat[[i]]<-cd$covmat[,,i]
.e<-environment
plots[[i]]<-ggplot(environment=.e)+geom_tile(aes(x=coords$x,y=coords$y,
fill=as.numeric(cmat[[i]]),color='white'))+
scale_fill_gradient(covtext,low='darkblue',high='red',limits=climits)+ylab('')
+xlab('')+guides(color='none')+scale_x_discrete(labels=cd$varnames,
limits=1:cd$q, expand=c(0,0))+scale_y_discrete(labels=cd$varnames,
limits=1:cd$q, expand=c(0,0))+theme(axis.text.x = element_text(angle = 90,
hjust = 1))+labs(title=paste0(covtext,"s of data, ",cd$levels[i]))
}
multiplot(plotlist=plots,cols=ncols)
}
If you end up trying to fix things with direct calls to environments, you are probably overcomplicating your code. Here's a simple snippet that may serve as a core for your function:
drawCovs <- function(cd, ncols) {
require(ggplot2)
require(reshape2)
plots=list()
cmat=list()
for (i in 1:(length(cd$covmat))) {
cmat[[i]] <- cd$covmat[[i]]
plots[[i]] <- ggplot(melt(cmat), aes(x=Var1, y=Var2, fill=value)) +
geom_tile(color='white')
}
multiplot(plotlist=plots,cols=ncols)
}
cd <- list()
cd$covmat <- list(matrix(runif(25), 5), matrix(runif(25), 5))
drawCovs(cd, 1)

Store array while executing for loop

I have a function which is executed 200 times like this
for (l in 1:200) {
fun.ction(paramter1=g, paramter2=h)$element->u[z,,]
}
u is an array:
u<-array(NA, dim=c(2000,150,7))
of which I know it should have the right format. The element of func.tion is also an array which has the same dimensions. Hence, is there some way to fill the array u in each of the 200 runs with the array resulting from fun.ction()$element? I tried to use indexing via a list (u[[z]]). It saves the array but as a list so that I can't access the elements afterwards which I would need to. I appreciate any help.
I am not sure what it is that you want, but if you just want to store 200 arrays of dimensions (2000,150,7) you can just make another array with a fourth dimension of 200.
storage.array <- array(dim=c(2000,150,7, 200))
And then store your (2000, 150, 7) arrays in the fourth dimension:
for (i in 1:200){
storage.array[,,,i] <-
fun.ction(paramter1=g, paramter2=h)$element}
Then you can access each of the ith array by:
storage.array[,,,i]
But I guess that will be too big an array for R to handle, at least it is in my computer.
An example that you can easily reproduce with smaller arrays:
storage.array <- array(dim=c(20,2,7, 200))
fun.ction <- function(parameter1, parameter2){
array(rnorm(140, parameter1, parameter2), dim=c(20,2,7))
}
for (i in 1:200){
storage.array[,,,i]<- fun.ction(10, 10)
}
But as Roland and Thomas have said, you should make your code reproducible and define correctly what you want, so it is easier to answer without trying to guess what your problem is.
Best regards

Dynamically creating and naming an array

Consider the following code snippet
for i = 1:100
Yi= x(i:i + 3); % i in Yi is not an index but subscript,
% x is some array having sufficient values
i = i + 3
end
Basically I want that each time the for loop runs the subscript changes from 1 to 2, 3, ..., 100. SO in effect after 100 iterations I will be having 100 arrays, starting with Y1 to Y100.
What could be the simplest way to implement this in MATLAB?
UPDATE
This is to be run 15 times
Y1 = 64;
fft_x = 2 * abs(Y1(5));
For simplicity I have taken constant inputs.
Now I am trying to use cell based on Marc's answer:
Y1 = cell(15,1);
fft_x = cell(15,1);
for i = 1:15
Y1{i,1} = 64;
fft_x{i,1} = 2 * abs(Y1(5));
end
I think I need to do some changes in abs(). Please suggest.
It is impossible to make variably-named variables in matlab. The common solution is to use a cell array for Y:
Y=cell(100,1);
for i =1:100
Y{i,1}= x(i:i+3);
i=i+3;
end
Note that the line i=i+3 inside the for-loop has no effect. You can just remove it.
Y=cell(100,1);
for i =1:100
Y{i,1}= x(i:i+3);
end
It is possible to make variably-named variables in matlab. If you really want this do something like this:
for i = 1:4:100
eval(['Y', num2str((i+3)/4), '=x(i:i+3);']);
end
How you organize your indexing depends on what you plan to do with x of course...
Yes, you can dynamically name variables. However, it's almost never a good idea and there are much better/safer/faster alternatives, e.g. cell arrays as demonstrated by #Marc Claesen.
Look at the assignin function (and the related eval). You could do what asked for with:
for i = 1:100
assignin('caller',['Y' int2str(i)],rand(1,i))
end
Another related function is genvarname. Don't use these unless you really need them.

Resources