I am making a Rcpp code for Gibbs sampling. Inside the code, I first want to make a 3 dimensional array with row number= number of iteration (500), column number=number of parameter(4) and slice number= number of chain(3). I wrote it in this way:
#include <RcppArmadillo.h>
#include <math.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace std;
using namespace arma;
//Gibbs sampling code starts here
Rcpp::List mcmc(const int iter,const int chains, const NumericVector data){
arma::cube posteriorC = arma::zeros(iter, 5, chains);
\\ rest of the codes
List out(Rcpp::List::create(Rcpp::Named("posteriorC") =posteriorC));
return out;
}
. While compelling it does not show any error. But when I want to run the code with:
res<- mcmc(iter=500,chains=2,data)
it shows the error:
Error: Cube::operator(): index out of bounds
. I want to know if there any mistake while making the 3D array. Please note that I want to get estimates of 5 parameters of my model.
You need to specify the template for arma::zeros to correctly fill an arma::cube, c.f. arma::zeros<template>
Generate a vector, matrix or cube with the elements set to zero
Usage:
vector_type v = zeros<vector_type>( n_elem )
matrix_type X = zeros<matrix_type>( n_rows, n_cols )
matrix_type Y = zeros<matrix_type>( size(X) )
cube_type Q = zeros<cube_type>( n_rows, n_cols, n_slices )
cube_type R = zeros<cube_type>( size(Q) )
Thus, in your case it would be:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List mcmc(const int iter, const int chains,
const Rcpp::NumericVector data){
arma::cube posteriorC = arma::zeros<arma::cube>(iter, 5, chains);
// --------------------------------- ^^^^^^^^
// Not Shown
Rcpp::List out = Rcpp::List::create(Rcpp::Named("posteriorC") =posteriorC);
return out;
}
Two final notes:
You explicitly state that the code as it stands now will create 4 columns to store 4 variables. However, you explicitly mention that you needed to estimate 5 parameters. You may need to increase this to prevent an out of bounds when saving into the arma::cube slices.
The way the Rcpp::List out is being created isn't quite correct. In general, the best way to create the list is to do: Rcpp::List out = Rcpp::List::create(Rcpp::Named("Blah"), Blah);
Related
I have a large module that uses a very large input buffer, consisting of many structures which, in turn, contain other structures and in the end each structure has several variables.
Out of these hundreds of input variables, my module (standalone C entity) uses only a fraction.
I would like to know if there is a way to make a list that will contain only the variables used in my module (would be perfect if it contains the variable type and links to structure/s that contains it).
I tried Doxygen (1.8.5) but I could generate a doc with all input variables, only.
[Later EDIT]
I add an example code and the desired outcome:
#include <stdio.h>
typedef struct subS1{
unsigned char bIn1;
unsigned char bIn2;
} subS1;
typedef struct S1{
struct subS1 stMySubStruct1;
struct subS1 stMySubStruct2;
struct subS1 stMySubStruct3;
} MyInputStruct_t;
void Foo1(MyInputStruct_t *Input);
void Foo2(MyInputStruct_t *Input);
MyInputStruct_t stMyInputStruct = {{1, 2}, {0, 0}, {9, 6}}; // large input buffer
int main() {
Foo1(&stMyInputStruct); // call to my Module 'main' function
return 0;
}
void Foo1(MyInputStruct_t *Input)
{
if(Input->stMySubStruct1.bIn1 == 1)
{
printf("bIn1 = %d\n", Input->stMySubStruct1.bIn1); // stMySubStruct1.bIn1 is used (read or write)
}
Foo2(Input);
return;
}
void Foo2(MyInputStruct_t *Input)
{
if(Input->stMySubStruct3.bIn2 == 0)
{
printf("bIn2 = %d\n", Input->stMySubStruct3.bIn2); // stMySubStruct3.bIn2 is used (read or write)
}
return;
}
The list with just the used inputs for Foo1(): e.g
stMyInputStruct.stMySubStruct1.bIn1 -> is used in Foo1()
stMyInputStruct.stMySubStruct1.bIn2 -> is NOT used
...
stMyInputStruct.stMySubStruct3.bIn2 -> is used in Foo2()
This is just a five-minute hack to demonstrate what I mean, so take it with a grain of salt and for what it is.
So first I downloaded pycparser from https://github.com/eliben/pycparser/
Then I edit the C-generator from https://github.com/eliben/pycparser/blob/master/pycparser/c_generator.py
... adding two lines to the constructor-code (adding two vars struct_refs + struct_ref):
class CGenerator(object):
""" Uses the same visitor pattern as c_ast.NodeVisitor, but modified to
return a value from each visit method, using string accumulation in
generic_visit.
"""
def __init__(self, reduce_parentheses=False):
""" Constructs C-code generator
reduce_parentheses:
if True, eliminates needless parentheses on binary operators
"""
# Statements start with indentation of self.indent_level spaces, using
# the _make_indent method.
self.indent_level = 0
self.reduce_parentheses = reduce_parentheses
# newly added variables here
self.struct_refs = set()
self.struct_ref = None
Then I edit two visitor-functions, to make them save information about the struct-references they parse:
def visit_ID(self, n):
if self.struct_ref:
self.struct_refs.add(self.struct_ref + "->" + n.name)
return n.name
def visit_StructRef(self, n):
sref = self._parenthesize_unless_simple(n.name)
self.struct_ref = sref
self.struct_refs.add(sref)
res = sref + n.type + self.visit(n.field)
self.struct_ref = None
return res
Running this modified piece of Python script over your example code, collects this information:
>>> cgen.struct_refs
{'Input',
'Input->stMySubStruct1',
'Input->stMySubStruct1->bIn1',
'Input->stMySubStruct3',
'Input->stMySubStruct3->bIn2'}
So with a bit more work, it should be able to do the job more generally.
This of course breaks apart in the face of memcpy, struct-member-access-through-pointers etc.
You can also try exploiting structure in your code as well. E.g. If you always call your input-struct "Input", things gets easier.
I would like to write something like foo4 similar to foo3 in the Eigen::Ref doc here :
#include <Eigen/Dense>
using namespace Eigen;
void foo3(Ref<VectorXf, 0, Eigen::InnerStride<> >){};
void foo4(Ref<Vector3f, 0, Eigen::InnerStride<> >){};
int main()
{
Eigen::Matrix3f fmat = Eigen::Matrix3f::Identity();
Eigen::MatrixXf dmat = Eigen::Matrix3f::Identity();
foo3(dmat.row(1)); // OK
foo3(fmat.row(1)); // Error : YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES
foo4(fmat.row(1)); // Error : YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES
}
I'm using Eigen version 3.3.7
You are getting size-mismatch errors, because you are trying to pass row-vectors where column vectors are expected.
There are two solutions:
Change the function to accept row-vectors:
void foo3(Ref<RowVectorXf, 0, Eigen::InnerStride<> >){};
void foo4(Ref<RowVector3f, 0, Eigen::InnerStride<> >){};
Explicitly transpose the vector you pass to the function:
foo3(fmat.row(1).transpose());
foo4(fmat.row(1).transpose());
Note that there are some cases where Eigen implicitly transposes row-vectors to column-vectors (like the following example). But generally, I would not rely on that and always explicitly transpose vectors to match the orientation:
Eigen::MatrixXd A(rows,cols);
Eigen::VectorXd v1 = A.row(0); // this works
Eigen::VectorXd v2 = A.row(0).transpose(); // more verbose, but what actually happens
I would like to use the lua script do to some mathematic precalculations in my application i don't want to hardcode it. I use the LUA as a DLL linked libary. Caller program code languange is not C-based language.
The application is handling pretty big array. The array is normaly (25k-65k) * 8 double number array.
My target is:
put this array into the lua script using global variable
read back this array from the lua script
i would like to reach this action is less than 100ms.
Currently i tested with 28000 x 6 array but the time is 5 sec.
I am using lua_gettable function and iterating across the array, it is a huge amount of stack write and read.
My question is no have any other solution for that? I checked the API but maybe i skipped some function. Any possibilities to ask lua to put array subset into the stack? And of course the opposite way.
Thank you so much for any help and suggestion!
As suggested by DarkWiiPlayer, I believe the best way to achieve this in a reasonably fast speed would be to use Lua's userdata. I did an example using a class with a double matrix with [65536][65536][8] dimensions, as you said yours would be:
class MatrixHolder {
public:
double matrix[65536][65536][8];
};
Then, I created a method to create a new MatrixHolder and another one to perform an operation in one of the positions of the matrix (passing I, J and K as parameters).
static int newMatrixHolder(lua_State *lua) {
MatrixHolder* object;
size_t nbytes = sizeof(MatrixHolder);
object = static_cast<MatrixHolder*>(lua_newuserdata(lua, nbytes));
return 1;
}
static int performOperation(lua_State *lua) {
MatrixHolder* object = static_cast<MatrixHolder*>(lua_touserdata(lua, 1));
int i = luaL_checkinteger(lua, -3);
int j = luaL_checkinteger(lua, -2);
int k = luaL_checkinteger(lua, -1);
object->matrix[i][j][k] += 1.0;
lua_pushinteger(lua, object->matrix[i][j][k]);
return 1;
}
static const struct luaL_Reg matrixHolderLib [] = {
{"new", newMatrixHolder},
{"performOperation", performOperation},
{NULL, NULL} // - signals the end of the registry
};
In my computer, it executed the given Lua scripts in the following times:
m = matrixHolder.new()
i = matrixHolder.performOperation(m, 1,1,1);
j = matrixHolder.performOperation(m, 1,2,1);
i = matrixHolder.performOperation(m, 1,1,1);
~845 microseconds
for i = 1, 1000
do
m = matrixHolder.new()
i = matrixHolder.performOperation(m, 1,1,1);
j = matrixHolder.performOperation(m, 1,2,1);
i = matrixHolder.performOperation(m, 1,1,1);
end
~617 milliseconds
I'm unsure if it will serve your purpose, but it seems already way faster than the 5 seconds you mentioned. My computer is a 2,3 GHz 8-Core Intel Core i9 16 GB RAM, for comparison.
#include <stdio.h>
#include <math.h>
int iii;
double E_[100], t_[100], length, lowest_temp, acceptance;
int main(int argc, char *argv[]) {
length=(double)20;
lowest_temp=(double)0.20; /* setting lowest temperature */
acceptance=(double)0.50; /* Acceptance i--> j */
E_[10]=(double)1.3; /*starting E */
t_[0]=lowest_temp;
for(iii>=11;iii<=length-1;iii++){/*CB2*/
E_[10]=1.3;
E_[iii]=E_[iii-1]+(E_[iii-1]/10);
/*Can change 10 to other percentage or random number */
printf("E_[%i]=%f\n",iii,E_[iii]);
}/*CB2*/
for(iii<=9;iii>=0;iii--){/*CB3*/
E_[10]=1.3;
E_[iii]=E_[iii+1]-(E_[iii+1]/10);
printf("E_[%i]=%f\n",iii,E_[iii]);
}/*CB3*/
for(iii=0;iii<=length-1;iii++){/*CB4*/
t_[iii+1]=-((int)1/(t_[iii]))-(log(acceptance)/(E_[iii]+E_[iii+1]));
printf("t_[%i]=%f\n",iii,t_[iii]);
}/*CB4*/
When I run the code I get the following print out:
E_[0]=0.000000
E_[1]=0.000000
E_[2]=0.000000
E_[3]=0.000000
E_[4]=0.000000
E_[5]=0.000000
E_[6]=0.000000
E_[7]=0.000000
E_[8]=0.000000
E_[9]=0.000000
E_[10]=0.000000
E_[11]=1.430000
E_[12]=1.573000
E_[13]=1.730300
E_[14]=1.903330
E_[15]=2.093663
E_[16]=2.303029
E_[17]=2.533332
E_[18]=2.786665
E_[19]=3.065332
E_[20]=0.000000
E_[19]=0.000000
E_[18]=0.000000
E_[17]=0.000000
E_[16]=0.000000
E_[15]=0.000000
E_[14]=0.000000
E_[13]=0.000000
E_[12]=0.000000
E_[11]=0.000000
E_[10]=0.000000
E_[9]=1.170000
E_[8]=1.053000
E_[7]=0.947700
E_[6]=0.852930
E_[5]=0.767637
E_[4]=0.690873
E_[3]=0.621786
E_[2]=0.559607
E_[1]=0.503647
E_[0]=0.453282
t_[0]=0.200000
t_[1]=-4.275654
t_[2]=0.885794
t_[3]=-0.542211
t_[4]=2.372348
t_[5]=0.053720
t_[6]=-18.187354
t_[7]=0.439930
t_[8]=-1.926635
t_[9]=0.830847
t_[10]=-0.922965
t_[11]=1.616655
t_[12]=inf
t_[13]=inf
t_[14]=inf
t_[15]=inf
t_[16]=inf
t_[17]=inf
t_[18]=inf
t_[19]=inf
My objective is to populate the t_[iii] array using the computed E_[iii] array. In order to generate the E_[iii] array I take a pre-existing E_[iii] value from elsewhere in the code (1.3 in this case), then either side of this create for E_[iii>10] values greater than E_[10] and correspondingly below create for E_[iii<10] values less than E_[10].
Arbitrarily I set this to ten percent lower for adjacent E values. Using this I successfully generate the correct values either side of E_[10] however I cannot generate them within the same loop and it seems to be resetting the unused iii indices to zero as can be seen from the print outs.
I would very much appreciate any help in this issue!
One problem: for-loop initialisation. At CB2 and CB3, the iii>=11 and iii<=9 expressions don't initialise iii.
I would like to be able to use my own memory allocation function for certain data structures (real valued vectors and arrays) in R. The reason for this is that I need my data to be 64bit aligned and I would like to use the numa library for having control over which memory node is used (I'm working on compute nodes with four 12-core AMD Opteron 6174 CPUs).
Now I have two functions for allocating and freeing memory: numa_alloc_onnode and numa_free (courtesy of this thread). I'm using R version 3.1.1, so I have access to the function allocVector3 (src/main/memory.c), which seems to me as the intended way of adding a custom memory allocator. I also found the struct R_allocator in src/include/R_ext
However it is not clear to me how to put these pieces together. Let's say, in R, I want the result res of an evaluation such as
res <- Y - mean(Y)
to be saved in a memory area allocated with my own function, how would I do this? Can I integrate allocVector3 directly at the R level? I assume I have to go through the R-C interface. As far as I know, I cannot just return a pointer to the allocated area, but have to pass the result as an argument. So in R I call something like
n <- length(Y)
res <- numeric(length=1)
.Call("R_allocate_using_myalloc", n, res)
res <- Y - mean(Y)
and in C
#include <R.h>
#include <Rinternals.h>
#include <numa.h>
SEXP R_allocate_using_myalloc(SEXP R_n, SEXP R_res){
PROTECT(R_n = coerceVector(R_n, INTSXP));
PROTECT(R_res = coerceVector(R_res, REALSXP));
int *restrict n = INTEGER(R_n);
R_allocator_t myAllocator;
myAllocator.mem_alloc = numa_alloc_onnode;
myAllocator.mem_free = numa_free;
myAllocator.res = NULL;
myAllocator.data = ???;
R_res = allocVector3(REALSXP, n, myAllocator);
UNPROTECT(2);
}
Unfortunately I cannot get beyond a variable has incomplete type 'R_allocator_t' compilation error (I had to remove the .data line since I have no clue as to what I should put there). Does any of the above code make sense? Is there an easier way of achieving what I want to? It seems a bit odd to have to allocate a small vector in R and the change its location in C just to be able to both control the memory allocation and have the vector available in R...
I'm trying to avoid using Rcpp, as I'm modifying a fairly large package and do not want to convert all C calls and thought that mixing different C interfaces could perform sub-optimally.
Any help is greatly appreciated.
I made some progress in solving my problem and I would like to share in case anyone else encounters a similar situation. Thanks to Kevin for his comment. I was missing the include statement he mentions. Unfortunately this was only one among many problems.
dyn.load("myAlloc.so")
size <- 3e9
myBigmat <- .Call("myAllocC", size)
print(object.size(myBigmat), units = "auto")
rm(myBigmat)
#include <R.h>
#include <Rinternals.h>
#include <R_ext/Rallocators.h>
#include <numa.h>
typedef struct allocator_data {
size_t size;
} allocator_data;
void* my_alloc(R_allocator_t *allocator, size_t size) {
((allocator_data*)allocator->data)->size = size;
return (void*) numa_alloc_local(size);
}
void my_free(R_allocator_t *allocator, void * addr) {
size_t size = ((allocator_data*)allocator->data)->size;
numa_free(addr, size);
}
SEXP myAllocC(SEXP a) {
allocator_data* my_allocator_data = malloc(sizeof(allocator_data));
my_allocator_data->size = 0;
R_allocator_t* my_allocator = malloc(sizeof(R_allocator_t));
my_allocator->mem_alloc = &my_alloc;
my_allocator->mem_free = &my_free;
my_allocator->res = NULL;
my_allocator->data = my_allocator_data;
R_xlen_t n = asReal(a);
SEXP result = PROTECT(allocVector3(REALSXP, n, my_allocator));
UNPROTECT(1);
return result;
}
For compiling the c code, I use R CMD SHLIB -std=c99 -L/usr/lib64 -lnuma myAlloc.c. As far as I can tell, this works fine. If anyone has improvements/corrections to offer, I'd be happy to include them.
One requirement from the original question that remains unresolved is the alignment issue. The block of memory returned by numa_alloc_local is correctly aligned, but other fields of the new VECTOR_SEXPREC (eg. the sxpinfo_struct header) push back the start of the data array. Is it somehow possible to align this starting point (the address returned by REAL())?
R has, in memory.c:
main/memory.c
84:#include <R_ext/Rallocators.h> /* for R_allocator_t structure */
so I think you need to include that header as well to get the custom allocator (RInternals.h merely declares it, without defining the struct or including that header)