Parallelize nested for-loop on 3 dimensional array in R - arrays

Using R on a Windows machine, I am currently running a nested loop on a 3D array (720x360x1368) which cycles through d1 and d2 to apply a function over d3 and assemble the output to a new array of similar dimensionality.
In the following reproducible example, I have reduced the dimensions by factor 10, to make execution faster.
library(SPEI)
old.array = array(abs(rnorm(50)), dim=c(72,36,136))
new.array = array(dim=c(72,36,136))
for (i in 1:72) {
for (j in 1:36) {
new.listoflists <- spi(ts(old.array[i,j,], freq=12, start=c(1901,1)), 1, na.rm = T)
new.array[i,j,] = new.listoflists$fitted
}
}
where spi() is a function from the SPEI package returning a list of lists, of which one particular list $fittedof length 1368 is used from each loop increment to cunstruct the new array.
While this loop works flawlessly, it takes quite a long time to compute. I have read that foreachcan be used to parallelize for loops.
However, I do not understand how the nesting and the assembling of the new array can be achieved such that the dimnames of the old and the new array are consistent.
(In the end, what I want to be able to, is to transform both the old and the new array into a "flat" long panel data frame using as.data.frame.table() and merge them along their three dimensions.)
Any help on how I can achieve the desired output using parallel computing will be highly appreciated!
Cheers
CubicTom

It would have been better with a reproducible example, here is what i come up with:
First create the cluster to use
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
Then you create the loop and close the cluster:
zz <- foreach(i = 1:720, .combine = c) %:%
foreach(j = 1:360, .combine = c ) %dopar% {
new.listoflists <- FUN(old.array[i,j,])
new.array[i,j,] <- new.listoflists$list
}
stopCluster(cl)
This will create a list zz containing every iteration of new.array[i,j,], then you can bind them together with:
new.obj <- plyr::ldply(zz, data.frame)
Hope this helps you!

I did not use as much of dimensions as your question because I wanted to ensure the behavior was correct.
So here I use mapply which take multiple arguments. The result is a list of the results. Then I wrapped it with matrix() to get the dimensions you hoped for.
Please note that i is repeated using times and j is repeated using each. This is critical as matrix() put entries by row first then wraps to the next column when the number of row is reached.
new.array = array(1:(5*10*4), dim=c(5,10,4))
# FUN: function which returns lists of
FUN <- function(x){
list(lapply(x, rep, times=3))
}
# result of the computation
result <- matrix(
mapply(
function(i,j,...){
FUN(new.array[i,j,])
}
,i = rep(1:nrow(new.array),times=ncol(new.array))
,j = rep(1:ncol(new.array),each=nrow(new.array))
,new.array=new.array
)
,nrow=nrow(new.array)
,ncol=ncol(new.array)
)

Related

matlab: logically comparing two cell arrays

I have an excel file from which I obtained two string arrays, Titles of dimension 6264x1 and another Names of dimension 45696x1. I want to create an output matrix of size 6264x45696 containing in the elements a 1 or 0, a 1 if Titles contains Names.
I think I want something along the lines of:
for (j in Names)
for (k in Titles)
if (Names[j] is in Titles[k])
write to excel
end
end
end
But I don't know what functions I should use to achieve what I have in the picture. Here is what I have come up with:
[~,Title] = xlsread('exp1.xlsx',1,'A3:A6266','basic');
[~,Name] = xlsread('exp1.xlsx',2,'B3:B45698','basic');
A = cellstr(Title);
GN = cellstr(Name);
BinaryMatrix = false(45696,6264);
for i=1:1:45696
for j=1:1:6264
if (~isempty(ismember(A,GN)))
BinaryMatrix(i,j)= true;
end
end
end
the problem with this code is that it never finishes running, although there are no suggestions within matlab.
You can use third output of unique to get numbers corresponding to each string element and use bsxfun to compare numbers.
GN = cellstr(Name);
A = cellstr(Title);
B = [ GN(:); A(:)];
[~,~,u]= unique(B);
BinaryaMatrix = bsxfun(#eq, u(1:numel(GN)),u(numel(GN)+1:end).');
ismember can handle cell arrays of character vectors. Its second output tells you the information you need, from which you can build the result using sparse (it could also be done by preallocating and using [sub2ind):
[~, m] = ismember(Titles, Names);
BinaryMatrix = full(sparse(nonzeros(m), find(m), true, numel(Names), numel(Titles)));

Making an array out of big number of matrices in r

I'm trying to work with arrays, but I can't seem to make one that works for my data. I have 14 matrices I would like to put in an array, but I can't figure out the way to do it without manually writing c(m1,m2,m3...) to put in all of them
this is what i tried:
m_list <- mget(paste0("well_", 0:13)) ###to make a list of all my matrices
a <- array(c(m_list),
dim = c(7338, 15, 14))
but when I try to look at the array I created something is not right with it cause I try to call for one value, like this:
print(a[1,4,2])
but I get entire columns.
I assume the error is in the list of matrices. Please help
An answer to your question is that you should use do.call(c, m_list) instead of c(m_list). (Take a couple of small matrices and try to see what c(m_list) and c(m1, m2) return.)
Also you might want to think some more whether working with an array is better than working with a list and, more importantly, how you could avoid having multiple matrices in the first place and instead to directly read/define them as a list or an array.
You can simply use unlist inside your array function call instead of c.
a = array(unlist(m_list), dim = c(dim(m_list[[1]]), length(m_list)))
Some reproducible data:
m1 = matrix(1:5, 5, 5)
m2 = matrix(5:1, 5, 5)
m_list = list(m1, m2)

In MATLAB: How should nested fields of a struct be converted to a cell array?

In MATLAB, I would like to extract a nested field for each index of a 1 x n struct (a nonscalar struct) and receive the output as a 1 x n cell array. As a simple example, suppose I start with the following struct s:
s(1).f1.fa = 'foo';
s(2).f1.fa = 'yedd';
s(1).f1.fb = 'raf';
s(2).f1.fb = 'da';
s(1).f2 = 'bok';
s(2).f2 = 'kemb';
I can produce my desired 1 x 2 cell array c using a for-loop:
n = length(s);
c = cell(1,n);
for k = 1:n
c{k} = s(k).f1.fa;
end
If I wanted to do analogously for a non-nested field, for example f2, then I could "vectorize" the operation (see this question), writing simply:
c = {s.f2};
However the same approach does not appear to work for nested fields. What then are possible ways to vectorize the above for-loop?
You cannot really vectorize it. The problem is that Matlab does not allow most forms of nested indexing, including []..
The most concise / readable option would be to concatenate s.f1 results in a structure array using [...], and then index into the new structure array with a separate call:
x = [s.f1]; c = {x.fa};
If you have a Mapping Toolbox, you could use extractfield to perform the second indexing in one expression:
c = extractfield([s.f1], 'fa');
Alternatively you could write a one-liner using arrayfun - here's a couple of options:
c = arrayfun(#(x) x.f1.fa, s, 'uni', false);
c = arrayfun(#(x) x.fa, [s.f1], 'uni', false);
Note that arrayfun and similar functions are generally slower than explicit for loops. So if the performance is critical, time / profile your code, before making a decision to get rid of the loop.

R populate multidimensional array

Hi I am stuck with one of these simple but time-consuming errors:
How can I populate an array with loops? I know I am on a C approach here
and R isn't C.
Data <-[SOMETHING HERE]
One <-200
Two <-100
array222 <- array(0,length(SomeLength))
for (i in 1:One)
{
for (j in 1:Two)
{
array222[i][j] = sample(Data,1)
}
I want to populate the array with random samples from another dataset but all
I get is this:
Warning in array222[i][j] = sample(Data, 1) :
number of items to replace is not a multiple of replacement length
First of all, you wouldn't use loops to do this in R. You'd just do
array222 <- matrix(sample(Data, One*Two, replace=T), nrow=One, ncol=Two)
But going back to your code, you fail to properly initialize your array222 variable. The matrix() syntax is probably easier for a 2-D array, but you could also use array(0, dim=c(One,Two)). You need to create it with the proper dimensions.
And additionally, the proper way to index a dimensional array is
array222[i,j] #NOT array222[i][j]

Parallel `for` loop with an array as output

How can I run a for loop in parallel (so I can use all the processors on my windows machine) with the result being a 3 dimension array? The code I have now takes about an hour to run and is something like:
guad = array(NA,c(1680,170,15))
for (r in 1:15)
{
name = paste("P:/......",r,".csv",sep="")
pp = read.table(name,sep=",",header=T)
#lots of stuff to calculate x (which is a matrix)
guad[,,r]= x #
}
I have been looking at related questions and thought I could use foreach but I couldn't find a way to combine the matrices into an array.
I am new to parallel programming so any help will be very much appreciated!
You could do that with foreach using the abind function. Here's an example using the doParallel package as the parallel backend which is fairly portable:
library(doParallel)
library(abind)
cl <- makePSOCKcluster(3)
registerDoParallel(cl)
acomb <- function(...) abind(..., along=3)
guad <- foreach(r=1:4, .combine='acomb', .multicombine=TRUE) %dopar% {
x <- matrix(rnorm(16), 4) # compute x somehow
x # return x as the task result
}
This uses a combine function called acomb that uses the abind function from the abind package to combine the matrices generated by the cluster workers into a 3 dimensional array.
In this case, you can also combine the results using cbind and then modify the dim attribute afterwards to convert the resulting matrix into a 3 dimensional array:
guad <- foreach(r=1:4, .combine='cbind') %dopar% {
x <- matrix(rnorm(16), 4) # compute x somehow
x # return x as the task result
}
dim(guad) <- c(4,4,4)
The use of abind is useful since it can combine matrices and arrays in a variety of ways. Also, be aware that resetting the dim attribute may cause the matrix to be duplicated which could be a problem for large arrays.
Note that it's a good idea to shutdown the cluster at the end of the script using stopCluster(cl).

Resources