Parallelisation in R: Using parLapply with pointer to C object

Parallelisation in R: Using parLapply with pointer to C object - c

I'm trying to parallelise an R function that conducts some arithmetic in C.
A C object is constructed once from an R dataset using some function, which I'll call InitializeCObject, that returns a pointer to the object. I want to create an instance of this object on each worker that I can reuse many times.
This is where I've got so far:
nCores <- 2
cluster <- makeCluster(nCores)
on.exit(stopCluster(cluster))
clusterEvalQ(cluster, {library(pkgName); NULL})
The simplest solution is to make a new C object on each call:
x <- list(val1, val2) # list of length `nCores`
parLapply(cluster, x, function (x_i) pkgName::MakeObjectAndCalc(x_i, dataset))
But the time spent initializing the C object on every single call outweighs the benefits of parallelization.
I've tried creating nCores C objects and exporting all of them to each worker, then making worker n use local object n:
cPointer <- lapply(seq_len(nCores), function(xx) InitializeCObject(dataset))
on.exit(DestroyCObject(cPointer), add=TRUE)
clusterExport(cluster, 'cPointer')
parLapply(cluster, seq_len(nCores), function (i) Calculate(x[[i]], cPointer[[i]]))
But this doesn't work; the objects on the workers seem not to be initialized.
So I tried creating a separate C object locally on each worker:
clusterExport(cluster, 'dataset')
clusterEvalQ(cluster, {
localPointer <- InitializeCObject(dataset)
LocalCalc <- function (x) Calculate(x, localPointer)
on.exit(DestroyCObject(localPointer))
}
parLapply(cluster, x, LocalCalc)
But this causes the workers to crash. Any suggestions as to how I might move forwards would be appreciated.
edit: minimal C example
Here's my attempt to provide a minimal example of the associated C code. I'm far from fluent in C structures but hopefully this code is sufficient to demonstrate my problem.
// Define object structure
typedef struct CObject_t {
int data;
} CObject_t *CObject;
// Allocate memory to new (empty) object and return pointer to it
CObject new_object_t(void) {
CObject new = (CObject)calloc(1, sizeof(CObject_t));
return new;
}
int initialize_object(const int *dataset, CObject cObj) {
cObj->data = *dataset;
}
int use_object_to_calculate(int *x, CObject cObj) {
*x = *x + cObj->data;
return x;
}

Related

How to make a list with just the used inputs for a C module

I have a large module that uses a very large input buffer, consisting of many structures which, in turn, contain other structures and in the end each structure has several variables.
Out of these hundreds of input variables, my module (standalone C entity) uses only a fraction.
I would like to know if there is a way to make a list that will contain only the variables used in my module (would be perfect if it contains the variable type and links to structure/s that contains it).
I tried Doxygen (1.8.5) but I could generate a doc with all input variables, only.
[Later EDIT]
I add an example code and the desired outcome:
#include <stdio.h>
typedef struct subS1{
unsigned char bIn1;
unsigned char bIn2;
} subS1;
typedef struct S1{
struct subS1 stMySubStruct1;
struct subS1 stMySubStruct2;
struct subS1 stMySubStruct3;
} MyInputStruct_t;
void Foo1(MyInputStruct_t *Input);
void Foo2(MyInputStruct_t *Input);
MyInputStruct_t stMyInputStruct = {{1, 2}, {0, 0}, {9, 6}}; // large input buffer
int main() {
Foo1(&stMyInputStruct); // call to my Module 'main' function
return 0;
}
void Foo1(MyInputStruct_t *Input)
{
if(Input->stMySubStruct1.bIn1 == 1)
{
printf("bIn1 = %d\n", Input->stMySubStruct1.bIn1); // stMySubStruct1.bIn1 is used (read or write)
}
Foo2(Input);
return;
}
void Foo2(MyInputStruct_t *Input)
{
if(Input->stMySubStruct3.bIn2 == 0)
{
printf("bIn2 = %d\n", Input->stMySubStruct3.bIn2); // stMySubStruct3.bIn2 is used (read or write)
}
return;
}
The list with just the used inputs for Foo1(): e.g
stMyInputStruct.stMySubStruct1.bIn1 -> is used in Foo1()
stMyInputStruct.stMySubStruct1.bIn2 -> is NOT used
...
stMyInputStruct.stMySubStruct3.bIn2 -> is used in Foo2()

This is just a five-minute hack to demonstrate what I mean, so take it with a grain of salt and for what it is.
So first I downloaded pycparser from https://github.com/eliben/pycparser/
Then I edit the C-generator from https://github.com/eliben/pycparser/blob/master/pycparser/c_generator.py
... adding two lines to the constructor-code (adding two vars struct_refs + struct_ref):
class CGenerator(object):
""" Uses the same visitor pattern as c_ast.NodeVisitor, but modified to
return a value from each visit method, using string accumulation in
generic_visit.
"""
def __init__(self, reduce_parentheses=False):
""" Constructs C-code generator
reduce_parentheses:
if True, eliminates needless parentheses on binary operators
"""
# Statements start with indentation of self.indent_level spaces, using
# the _make_indent method.
self.indent_level = 0
self.reduce_parentheses = reduce_parentheses
# newly added variables here
self.struct_refs = set()
self.struct_ref = None
Then I edit two visitor-functions, to make them save information about the struct-references they parse:
def visit_ID(self, n):
if self.struct_ref:
self.struct_refs.add(self.struct_ref + "->" + n.name)
return n.name
def visit_StructRef(self, n):
sref = self._parenthesize_unless_simple(n.name)
self.struct_ref = sref
self.struct_refs.add(sref)
res = sref + n.type + self.visit(n.field)
self.struct_ref = None
return res
Running this modified piece of Python script over your example code, collects this information:
>>> cgen.struct_refs
{'Input',
'Input->stMySubStruct1',
'Input->stMySubStruct1->bIn1',
'Input->stMySubStruct3',
'Input->stMySubStruct3->bIn2'}
So with a bit more work, it should be able to do the job more generally.
This of course breaks apart in the face of memcpy, struct-member-access-through-pointers etc.
You can also try exploiting structure in your code as well. E.g. If you always call your input-struct "Input", things gets easier.

Increase performance in Lua get_table slow

I would like to use the lua script do to some mathematic precalculations in my application i don't want to hardcode it. I use the LUA as a DLL linked libary. Caller program code languange is not C-based language.
The application is handling pretty big array. The array is normaly (25k-65k) * 8 double number array.
My target is:
put this array into the lua script using global variable
read back this array from the lua script
i would like to reach this action is less than 100ms.
Currently i tested with 28000 x 6 array but the time is 5 sec.
I am using lua_gettable function and iterating across the array, it is a huge amount of stack write and read.
My question is no have any other solution for that? I checked the API but maybe i skipped some function. Any possibilities to ask lua to put array subset into the stack? And of course the opposite way.
Thank you so much for any help and suggestion!

As suggested by DarkWiiPlayer, I believe the best way to achieve this in a reasonably fast speed would be to use Lua's userdata. I did an example using a class with a double matrix with [65536][65536][8] dimensions, as you said yours would be:
class MatrixHolder {
public:
double matrix[65536][65536][8];
};
Then, I created a method to create a new MatrixHolder and another one to perform an operation in one of the positions of the matrix (passing I, J and K as parameters).
static int newMatrixHolder(lua_State *lua) {
MatrixHolder* object;
size_t nbytes = sizeof(MatrixHolder);
object = static_cast<MatrixHolder*>(lua_newuserdata(lua, nbytes));
return 1;
}
static int performOperation(lua_State *lua) {
MatrixHolder* object = static_cast<MatrixHolder*>(lua_touserdata(lua, 1));
int i = luaL_checkinteger(lua, -3);
int j = luaL_checkinteger(lua, -2);
int k = luaL_checkinteger(lua, -1);
object->matrix[i][j][k] += 1.0;
lua_pushinteger(lua, object->matrix[i][j][k]);
return 1;
}
static const struct luaL_Reg matrixHolderLib [] = {
{"new", newMatrixHolder},
{"performOperation", performOperation},
{NULL, NULL} // - signals the end of the registry
};
In my computer, it executed the given Lua scripts in the following times:
m = matrixHolder.new()
i = matrixHolder.performOperation(m, 1,1,1);
j = matrixHolder.performOperation(m, 1,2,1);
i = matrixHolder.performOperation(m, 1,1,1);
~845 microseconds
for i = 1, 1000
do
m = matrixHolder.new()
i = matrixHolder.performOperation(m, 1,1,1);
j = matrixHolder.performOperation(m, 1,2,1);
i = matrixHolder.performOperation(m, 1,1,1);
end
~617 milliseconds
I'm unsure if it will serve your purpose, but it seems already way faster than the 5 seconds you mentioned. My computer is a 2,3 GHz 8-Core Intel Core i9 16 GB RAM, for comparison.

ocaml c interop passing struct

I hit weird case when trying to call c from ocaml.
This is the c side of things:
typedef struct {
TSNode node;
} AstNode;
CAMLprim value caml_ts_document_root_node(value document) {
CAMLparam1(document);
TSNode root_node = ts_document_root_node(document);
AstNode elNode;
elNode.node = root_node;
CAMLreturn(&elNode);
}
CAMLprim value caml_ts_node_string(value node) {
CAMLparam1(node)
CAMLlocal1(mls);
AstNode* n = (AstNode*) node;
char *s = ts_node_string(n->node);
mls = caml_copy_string(s);
CAMLreturn(mls);
}
On the ocaml side
type ts_point
type ts_document
external ts_node_string : ts_node -> string = "caml_ts_node_string"
external ts_document_root_node : ts_document -> ts_node = "caml_ts_document_root_node"
If you see the code, I'm wrapping in caml_ts_document_root_node the TSNode root_node = ts_document_root_node(document); in an extra defined struct AstNode.
When I write the following implementation however:
CAMLprim value caml_ts_document_root_node(value document) {
CAMLparam1(document);
TSNode root_node = ts_document_root_node(document);
CAMLreturn(&root_node);
}
My code segfaults when calling caml_ts_node_string on the returned node by caml_ts_document_root_node.
Does anyone have any hints on why the segfault appears when I don't wrap a TSNode in an extra struct when interoping from ocaml?

That's definitely not the right usage of the foreign interface! You can't just take a value and cast it to OCaml value. OCaml values are specially encoded, even integers, and have a different representation than C values.
If you want to encode a C value as an OCaml value, you shall use custom values.
First of all, you need to implement the interface of a custom value, fortunately, you can rely on defaults for that:
static struct custom_operations ast_ops = {
"ast_node",
custom_finalize_default
custom_compare_default,
custom_hash_default,
custom_serialize_default,
custom_deserialize_default,
custom_compare_ext_default
};
Next, you need to learn how to allocate custom blocks. For example, the following call will allocate the new AstNode in the OCaml heap:
res = caml_alloc_custom(&ast_ops, sizeof(AstNode), 0, 1);
To access the value itself, you need to use the Data_custom_val macro, e.g.,
if (res) {
AstNode *node = Data_custom_val(res);
TsNode *tsnode = res->node;
}
The complete example of a correct (I hope) implementation of your first function is below:
CAMLprim value caml_ts_document_root_node(value document) {
CAMLparam1(document);
CAMLlocal1(res);
res = caml_alloc_custom(&ast_ops, sizeof(AstNodes), 0, 1);
if (res) {
AstNode *ast = (AstNode *)Data_custom_val(res);
ast->node = ts_document_root_node(document);
}
CAMLreturn(res);
}
As you may see, this is not trivial and rather low-level. Though nothing really magical, especially after you've read the corresponding parts of the OCaml documentation. However, it is much easier to use the CTypes library, that hides most of those complexities and allows you to call C function directly from OCaml

This seems to be unrelated to the ocaml interop part; you are returning the address of a local variable in this function:
CAMLprim value caml_ts_document_root_node(value document) {
// ...
AstNode elNode;
// ...
CAMLreturn(&elNode);
}
When it returns, the (stack) memory it refers to is invalid (in the sense that it will be reused at the next function call).

passing struct to function C

I have initialised 3 instances of a cache I have defined using typedef. I have done some processing on them in a serious of if statements in the following way :
cache cache1;
cache cache2;
cache cache3;
int a;
void main(...) {
if (a == 0) {
cache1.attribute = 5;
}
else if (a == 1) {
cache2.attribute = 1;
}
else if (a == 2) {
cache3.attribute = 2 ;
}
However now I need to make the design modular in the following way:
cache cache1;
cache cache2;
cache cache3;
void cache_operator( cache user_cache, int a ) {
user_cache.attribute = a;
}
void main(...) {
if (a == 0) {
cache_operator(cache1,5);
}
else if (a == 1) {
cache_operator(cache2,1);
}
...
I am having trouble with passing the cache to the method. I'm used to java programming and I'm not very familiar with c pointers. However, if I pass the cache itself as shown above I am passing a copy of the cache on the stack which then produces results different to the original code. How do I properly transform the first design into the second design when it comes to passing the appropriate cache to the function and making sure it is accessed properly.

In C language, if you want to keep track of the original 'data' instead of creating a copy in the function, you have to pass the pointer of that data to that function.
Pointer in C is just like the reference to object in JAVA.
Following is how you do it.
void cache_operator( cache *user_cache, int a )
{
user_cache->attribute = a;
}
Following is how you call the function.
cache_operator(&cache1,5);
I also started with JAVA. I don't know why some universities nowadays use JAVA as beginning language... It is quite strange, since JAVA is a high-level language making the abstraction of low-level detail, whereas C is a rather low-level language. In the past, this will never be the case..

Using a custom memory allocation function in R

I would like to be able to use my own memory allocation function for certain data structures (real valued vectors and arrays) in R. The reason for this is that I need my data to be 64bit aligned and I would like to use the numa library for having control over which memory node is used (I'm working on compute nodes with four 12-core AMD Opteron 6174 CPUs).
Now I have two functions for allocating and freeing memory: numa_alloc_onnode and numa_free (courtesy of this thread). I'm using R version 3.1.1, so I have access to the function allocVector3 (src/main/memory.c), which seems to me as the intended way of adding a custom memory allocator. I also found the struct R_allocator in src/include/R_ext
However it is not clear to me how to put these pieces together. Let's say, in R, I want the result res of an evaluation such as
res <- Y - mean(Y)
to be saved in a memory area allocated with my own function, how would I do this? Can I integrate allocVector3 directly at the R level? I assume I have to go through the R-C interface. As far as I know, I cannot just return a pointer to the allocated area, but have to pass the result as an argument. So in R I call something like
n <- length(Y)
res <- numeric(length=1)
.Call("R_allocate_using_myalloc", n, res)
res <- Y - mean(Y)
and in C
#include <R.h>
#include <Rinternals.h>
#include <numa.h>
SEXP R_allocate_using_myalloc(SEXP R_n, SEXP R_res){
PROTECT(R_n = coerceVector(R_n, INTSXP));
PROTECT(R_res = coerceVector(R_res, REALSXP));
int *restrict n = INTEGER(R_n);
R_allocator_t myAllocator;
myAllocator.mem_alloc = numa_alloc_onnode;
myAllocator.mem_free = numa_free;
myAllocator.res = NULL;
myAllocator.data = ???;
R_res = allocVector3(REALSXP, n, myAllocator);
UNPROTECT(2);
}
Unfortunately I cannot get beyond a variable has incomplete type 'R_allocator_t' compilation error (I had to remove the .data line since I have no clue as to what I should put there). Does any of the above code make sense? Is there an easier way of achieving what I want to? It seems a bit odd to have to allocate a small vector in R and the change its location in C just to be able to both control the memory allocation and have the vector available in R...
I'm trying to avoid using Rcpp, as I'm modifying a fairly large package and do not want to convert all C calls and thought that mixing different C interfaces could perform sub-optimally.
Any help is greatly appreciated.

I made some progress in solving my problem and I would like to share in case anyone else encounters a similar situation. Thanks to Kevin for his comment. I was missing the include statement he mentions. Unfortunately this was only one among many problems.
dyn.load("myAlloc.so")
size <- 3e9
myBigmat <- .Call("myAllocC", size)
print(object.size(myBigmat), units = "auto")
rm(myBigmat)
#include <R.h>
#include <Rinternals.h>
#include <R_ext/Rallocators.h>
#include <numa.h>
typedef struct allocator_data {
size_t size;
} allocator_data;
void* my_alloc(R_allocator_t *allocator, size_t size) {
((allocator_data*)allocator->data)->size = size;
return (void*) numa_alloc_local(size);
}
void my_free(R_allocator_t *allocator, void * addr) {
size_t size = ((allocator_data*)allocator->data)->size;
numa_free(addr, size);
}
SEXP myAllocC(SEXP a) {
allocator_data* my_allocator_data = malloc(sizeof(allocator_data));
my_allocator_data->size = 0;
R_allocator_t* my_allocator = malloc(sizeof(R_allocator_t));
my_allocator->mem_alloc = &my_alloc;
my_allocator->mem_free = &my_free;
my_allocator->res = NULL;
my_allocator->data = my_allocator_data;
R_xlen_t n = asReal(a);
SEXP result = PROTECT(allocVector3(REALSXP, n, my_allocator));
UNPROTECT(1);
return result;
}
For compiling the c code, I use R CMD SHLIB -std=c99 -L/usr/lib64 -lnuma myAlloc.c. As far as I can tell, this works fine. If anyone has improvements/corrections to offer, I'd be happy to include them.
One requirement from the original question that remains unresolved is the alignment issue. The block of memory returned by numa_alloc_local is correctly aligned, but other fields of the new VECTOR_SEXPREC (eg. the sxpinfo_struct header) push back the start of the data array. Is it somehow possible to align this starting point (the address returned by REAL())?

R has, in memory.c:
main/memory.c
84:#include <R_ext/Rallocators.h> /* for R_allocator_t structure */
so I think you need to include that header as well to get the custom allocator (RInternals.h merely declares it, without defining the struct or including that header)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Parallelisation in R: Using parLapply with pointer to C object - c

Related

How to make a list with just the used inputs for a C module

Increase performance in Lua get_table slow

ocaml c interop passing struct

passing struct to function C

Using a custom memory allocation function in R

Categories

Resources