R parallel code for two arrays - arrays

I'm trying to process two 3d arrays using parallel computing in R. I have a function that takes two vectors as input, so I need to loop through the rows and columns of my arrays. Doing this in serial code is simply too slow and R gets stuck as the arrays are large.
I've not found a solution for doing this with parallel functions and would appreciate any suggestions. I've tried parApply but do not know how to incorporate a second input, and mcmapply but it is hard to use over rows/cols. Ideally the output should also be an array of the same dimension.
Below is a reproducible example of what I'm trying to do in serial code. Any help on how this could be written in parallel code would be much appreciated!
fun <- function(a,b)
{
a*b
}
input1 <- array(data=1:1000, dim=c(10,10,10))
input2 <- array(data=2:1001, dim=c(10,10,10))
result <- array(data=NA, dim=c(10,10,10))
for(i in 1:nrow(mat1))
{ for(j in 1:ncol(mat1)) {
result[,i,j] <- fun(input1[,i,j], input2[,i,j])
}}

Here's one way to do it with the foreach package:
library(doSNOW)
library(abind)
cl <- makeSOCKcluster(parallel::detectCores())
registerDoSNOW(cl)
fun <- function(a,b) a*b
input1 <- array(rnorm(60), dim=c(4,5,3))
input2 <- array(rnorm(60), dim=c(4,5,3))
rdim <- dim(input1)[1:2]
comb <- function(...) abind(..., along=3)
result <-
foreach(i1=iapply(input1, 3), i2=iapply(input2, 3),
.multicombine=TRUE, .combine='comb') %dopar% {
r <- array(data=NA, dim=rdim)
for (i in 1:ncol(i1)) {
r[,i] <- fun(i1[,i], i2[,i])
}
r
}
The "iapply" function from the iterators package is used to split the 3D input arrays into 2D matrices. The result matrices are combined into a 3D array using the "abind" function from the abind package.
Note that I'm specifically using the "doSNOW" parallel backend because it sends data from the two "iapply" iterators to the workers and processes the results on-the-fly. This reduces the memory needed by the master process. The "doParallel" backend can't work on-the-fly because the "parallel" package doesn't export the necessary functions.

Related

looping multiple vectors in a for loop

I'm programming an objloader and this is a small part of its code.I want to be able to loop through different vectors in a single for loop.The for loop doesn't work here but is it possible to implement this concept somehow? make the (for ((GLdouble* val : container)&&(GLdouble* val2:NContainer)) condition work somehow?
aclass e;
std::vector<GLdouble*> container = e.function();
Nclass n;
std::vector<GLdouble*> Ncontainer = n.function();
for ((GLdouble* val : container)&&(GLdouble* val2:NContainer))
{
glVertex3dv(val);
glNormal3dv(val2);
}

Stata: Reshape command number of j dimensions

After reshaping a dataset from long to wide using
reshape wide v1 v2 v3, i(i1 i2) j(jdimens)
I need to run a loop exactly max(jdimens) times. Example: Assume that the above code creates the new variables jdimens1 jdimens2 and jdimens3. Then I would like to have the loop run three times.
Any ideas how this can be neatly done?
You can count the variables:
foreach i of varlist jdimens* {
di "iteration `i'"
}
reshape also leaves some characteristics behind that you can use if you don't want to specify names:
local myvars: char _dta[ReS_Xij_wide1]
foreach i of local myvars {
di "iteration `i'"
}

Sparse Matrix Cholesky decomposition Rcpp Eigen

I'm dealing with a big sparse matrix (10k x 10k variance/covariance matrix, so symmetric and positive definite) in R. I need a fast way to find the Cholesky decomposition of that matrix. I understand that using SparseLU from RcppEigen package can be a solution but I can't figure out how it works.
In the next script, I put the example of my function in R. It loads a SparseMatrix SS, of which I need the Cholesky decomposition. I'd like to have the same output as chol(SS), with typical R function chol.
cholScript<-'using Eigen::Map;
using Eigen::SparseMatrix;
const SparseMatrix<double> Sigma(as<SparseMatrix<double> >(Sigma));
using namespace Rcpp;
// compute C, the Cholesky decomposition of Sigma
return wrap(C);'
cxxfunction(signature(Sigma = "dgCMatrix"), cholScript, plugin = "RcppEigen")
Thank you very much.
I think that this can be a solution:
CholeskyCppSparse<-'using Eigen::Map;
using Eigen::SparseMatrix;
using Eigen::LLT;
const SparseMatrix<double> SS(as<SparseMatrix<double> >(Sigma));
typedef Eigen::SimplicialLLT<SparseMatrix<double> > SpChol;
const SpChol Ch(SS);
return wrap(Ch.matrixL());'
CholSparse <- cxxfunction(signature(Sigma = "dgCMatrix"), CholeskyCppSparse, plugin = "RcppEigen")
Change matrixL for matrixU if you want the upper triangular part of the decomposition (as for chol() function in R).

RNGscope segmentation fault

I've been coding some simulations using inline/RcppArmadillo and stumbled upon a problem with RNGScope. Is this a bug or am I doing something really dumb?? I've emptied the function out to make it readable (see below).
Cheers,
Ed
library(inline)
code_cpp <- '
using namespace arma;
// From R
uvec x0 = as<uvec>(x0_r);
vec time_vec = as<vec>(time_vec_r);
// Declare variables
umat simulation_data=zeros<umat>(x0.n_elem, time_vec.n_elem);
RNGScope scope;
return wrap(simulation_data);
'
gillespie_sim <- cxxfunction(body = code_cpp,
sig = signature(x0_r = "integer", time_vec_r= "numeric"),
plugin = "RcppArmadillo")
x0 <- c(1,0,0,0,0,0)
time_vec <- 1:800
set.seed(23)
for(i in 1:100000) out2 <- gillespie_sim(x0_r=x0,time_vec_r=time_vec)
The error I get
R(43305) malloc: * error for object 0x108c30a00: incorrect checksum for freed object - object was probably modified after being freed.
* set a breakpoint in malloc_error_break to debug
Abort trap: 6
Hm, I see two issues:
a) You use umat, but we have no unsigned int in R, so this will get lots of very inefficient copies. I changed it to mat, but imat should work too.
b) You loop a lot with for(i in 1:100000). We have seen similar issue with "gazillions" of object creations. We are not sure where the bug is.
With a smaller N it does not seem to happen (as often). We'll take a look and see if RNGScope has anything to do with it -- but that is a very simple object.
Thanks for the bug report. Consider using rcpp-devel next time, please.
Edit: Also note that when using Rcpp vectors, the error does not seem to manifest itself. So you could use the two-step method of first initializing Rcpp objects and then initializing Armadillo objects from it -- the fastLm.r file in the package has an example of that.
suppressMessages(library(Rcpp))
suppressMessages(library(inline))
useRcpp <- function() {
code_cpp <- '
// From R
NumericVector x0(x0_r);
NumericVector time_vec(time_vec_r);
// Declare variables
NumericMatrix simulation_data(x0.size(), time_vec.size());
RNGScope scope;
return simulation_data;
'
cxxfunction(body = code_cpp,
sig = signature(x0_r = "integer", time_vec_r= "numeric"),
plugin = "Rcpp")
}
gillespie_sim <- useRcpp()
x0 <- c(1,0,0,0,0,0)
time_vec <- 1:800
set.seed(23)
for(i in 1:100000) out2 <- gillespie_sim(x0_r=x0,time_vec_r=time_vec)
cat("Done\n")

R - Loop in matrix

I have two variables, the first is 1D flow vector containing 230 data and the second is 2D temperature matrix (230*44219).
I am trying to find the correlation matrix between each flow value and corresponding 44219 temperature. This is my code below.
Houlgrave_flow_1981_2000 = window(Houlgrave_flow_average, start = as.Date("1981-11-15"),end = as.Date("2000-12-15"))
> str(Houlgrave_flow_1981_2000)
‘zoo’ series from 1981-11-15 to 2000-12-15
Data: num [1:230] 0.085689 0.021437 0.000705 0 0.006969 ...
Index: Date[1:230], format: "1981-11-15" "1981-12-15" "1982-01-15" "1982-02-15" ...
Hulgrave_SST_1981_2000=X_sst[1:230,]
> str(Hulgrave_SST_1981_2000)
num [1:230, 1:44219] -0.0733 0.432 0.2783 -0.1989 0.1028 ...
sf_Houlgrave_SF_SST = NULL
sst_Houlgrave_SF_SST = NULL
cor_Houlgrave_SF_SST = NULL
for (i in 1:230) {
for(j in 1:44219){
sf_Houlgrave_SF_SST[i] = Houlgrave_flow_1981_2000[i]
sst_Houlgrave_SF_SST[i,j] = Hulgrave_SST_1981_2000[i,j]
cor_Houlgrave_SF_SST[i,j] = cor(sf_Houlgrave_SF_SST[i],Hulgrave_SST_1981_2000[i,j])
}
}
The error message always says:
Error in sst_Houlgrave_SF_SST[i, j] = Hulgrave_SST_1981_2000[i, j] :
incorrect number of subscripts on matrix
Thank you for your help.
try this:
# prepare empty matrix of correct size
cor_Houlgrave_SF_SST <- matrix(nrow=dim(Hulgrave_SST_1981_2000)[1],
ncol=dim(Hulgrave_SST_1981_2000)[2])
# Good practice to not specify "230" or "44219" directly, instead
for (i in 1:dim(Hulgrave_SST_1981_2000)[1]) {
for(j in 1:dim(Hulgrave_SST_1981_2000)[2]){
cor_Houlgrave_SF_SST[i,j] <- cor(sf_Houlgrave_SF_SST[i],Hulgrave_SST_1981_2000[i,j])
}
}
The two redefinitions inside your loop were superfluous I believe. The main problem with your code was not defining the matrix - i.e. the cor variable did not have 2 dimensions, hence the error.
It is apparently also good practice to define empty matrices for results in for-loops by explicitly giving them correct dimensions in advance - is meant to make the code more efficient.

Resources