Using a custom memory allocation function in R

Using a custom memory allocation function in R - c

I would like to be able to use my own memory allocation function for certain data structures (real valued vectors and arrays) in R. The reason for this is that I need my data to be 64bit aligned and I would like to use the numa library for having control over which memory node is used (I'm working on compute nodes with four 12-core AMD Opteron 6174 CPUs).
Now I have two functions for allocating and freeing memory: numa_alloc_onnode and numa_free (courtesy of this thread). I'm using R version 3.1.1, so I have access to the function allocVector3 (src/main/memory.c), which seems to me as the intended way of adding a custom memory allocator. I also found the struct R_allocator in src/include/R_ext
However it is not clear to me how to put these pieces together. Let's say, in R, I want the result res of an evaluation such as
res <- Y - mean(Y)
to be saved in a memory area allocated with my own function, how would I do this? Can I integrate allocVector3 directly at the R level? I assume I have to go through the R-C interface. As far as I know, I cannot just return a pointer to the allocated area, but have to pass the result as an argument. So in R I call something like
n <- length(Y)
res <- numeric(length=1)
.Call("R_allocate_using_myalloc", n, res)
res <- Y - mean(Y)
and in C
#include <R.h>
#include <Rinternals.h>
#include <numa.h>
SEXP R_allocate_using_myalloc(SEXP R_n, SEXP R_res){
PROTECT(R_n = coerceVector(R_n, INTSXP));
PROTECT(R_res = coerceVector(R_res, REALSXP));
int *restrict n = INTEGER(R_n);
R_allocator_t myAllocator;
myAllocator.mem_alloc = numa_alloc_onnode;
myAllocator.mem_free = numa_free;
myAllocator.res = NULL;
myAllocator.data = ???;
R_res = allocVector3(REALSXP, n, myAllocator);
UNPROTECT(2);
}
Unfortunately I cannot get beyond a variable has incomplete type 'R_allocator_t' compilation error (I had to remove the .data line since I have no clue as to what I should put there). Does any of the above code make sense? Is there an easier way of achieving what I want to? It seems a bit odd to have to allocate a small vector in R and the change its location in C just to be able to both control the memory allocation and have the vector available in R...
I'm trying to avoid using Rcpp, as I'm modifying a fairly large package and do not want to convert all C calls and thought that mixing different C interfaces could perform sub-optimally.
Any help is greatly appreciated.

I made some progress in solving my problem and I would like to share in case anyone else encounters a similar situation. Thanks to Kevin for his comment. I was missing the include statement he mentions. Unfortunately this was only one among many problems.
dyn.load("myAlloc.so")
size <- 3e9
myBigmat <- .Call("myAllocC", size)
print(object.size(myBigmat), units = "auto")
rm(myBigmat)
#include <R.h>
#include <Rinternals.h>
#include <R_ext/Rallocators.h>
#include <numa.h>
typedef struct allocator_data {
size_t size;
} allocator_data;
void* my_alloc(R_allocator_t *allocator, size_t size) {
((allocator_data*)allocator->data)->size = size;
return (void*) numa_alloc_local(size);
}
void my_free(R_allocator_t *allocator, void * addr) {
size_t size = ((allocator_data*)allocator->data)->size;
numa_free(addr, size);
}
SEXP myAllocC(SEXP a) {
allocator_data* my_allocator_data = malloc(sizeof(allocator_data));
my_allocator_data->size = 0;
R_allocator_t* my_allocator = malloc(sizeof(R_allocator_t));
my_allocator->mem_alloc = &my_alloc;
my_allocator->mem_free = &my_free;
my_allocator->res = NULL;
my_allocator->data = my_allocator_data;
R_xlen_t n = asReal(a);
SEXP result = PROTECT(allocVector3(REALSXP, n, my_allocator));
UNPROTECT(1);
return result;
}
For compiling the c code, I use R CMD SHLIB -std=c99 -L/usr/lib64 -lnuma myAlloc.c. As far as I can tell, this works fine. If anyone has improvements/corrections to offer, I'd be happy to include them.
One requirement from the original question that remains unresolved is the alignment issue. The block of memory returned by numa_alloc_local is correctly aligned, but other fields of the new VECTOR_SEXPREC (eg. the sxpinfo_struct header) push back the start of the data array. Is it somehow possible to align this starting point (the address returned by REAL())?

R has, in memory.c:
main/memory.c
84:#include <R_ext/Rallocators.h> /* for R_allocator_t structure */
so I think you need to include that header as well to get the custom allocator (RInternals.h merely declares it, without defining the struct or including that header)

Related

Why calling `free(malloc(8))`?

The Objective-C runtime's hashtable2.mm file contains the following code:
static void bootstrap (void) {
free(malloc(8));
prototypes = ALLOCTABLE (DEFAULT_ZONE);
prototypes->prototype = &protoPrototype;
prototypes->count = 1;
prototypes->nbBuckets = 1; /* has to be 1 so that the right bucket is 0 */
prototypes->buckets = ALLOCBUCKETS(DEFAULT_ZONE, 1);
prototypes->info = NULL;
((HashBucket *) prototypes->buckets)[0].count = 1;
((HashBucket *) prototypes->buckets)[0].elements.one = &protoPrototype;
};
Why does it allocate and release the 8-bytes space immediately?
Another source of confusion is this method from objc-os.h:
static __inline void *malloc_zone_malloc(malloc_zone_t z, size_t size) { return malloc(size); }
While it uses only one parameter, does the signature ask for two?

For the first question I can only assume. My bet it was done to avoid/reduce memory churn, or segment the memory for some other reason. You can briefly find where it's discussed in the Changelog of bmalloc (which is not quite relevant, but i could not find a better reference):
2017-06-02 Geoffrey Garen <ggaren#apple.com>
...
Updated for new APIs. Note that we cache one free chunk per page
class. This avoids churn in the large allocator when you
free(malloc(X))
It's unclear however, if the memory churn is caused by this technique or it was supposed to address it.
For the second question, Objective-C runtime used to work with "zones" in order to destroy all allocated variables by just destroying the said zone, but it proved being error prone and later it was agreed to not use it anymore. The API, however still uses it for historical reasons (backward compatibility, i assume), but says that zones are ignored:
Zones are ignored on iOS and 64-bit runtime on OS X. You should not use zones in current development.

How to solve "bad pointer in write barrier" panic in cgo when C library uses opaque struct pointers

I'm currently writing a Go wrapper around a C library. That C library uses opaque struct pointers to hide information across the interface. However, the underlying implementation stores size_t values in there. This leads to runtime errors in the resulting program.
A minimum working example to reproduce the problem looks like this:
main.go:
package main
/*
#include "stddef.h"
// Create an opaque type to hide the details of the underlying data structure.
typedef struct HandlePrivate *Handle;
// In reality, the implementation uses a type derived from size_t for the Handle.
Handle getInvalidPointer() {
size_t actualHandle = 1;
return (Handle) actualHandle;
}
*/
import "C"
// Create a temporary slice containing invalid pointers.
// The idea is that the local variable slice can be garbage collected at the end of the function call.
// When the slice is scanned for linked objects, the GC comes across the invalid pointers.
func getTempSlice() {
slice := make([]C.Handle, 1000000)
for i, _ := range slice {
slice[i] = C.getInvalidPointer()
}
}
func main() {
getTempSlice()
}
Running this program will lead to the following error
runtime: writebarrierptr *0xc42006c000 = 0x1
fatal error: bad pointer in write barrier
[...stack trace omitted...]
Note that the errors disappear when the GC is disabled by setting the environment variable GOGC=off.
My question is which is the best way to solve or work around this problem. The library stores integer values in pointers for the sake of information hiding and this seems to confuse the GC. For obvious reasons I don't want to start messing with the library itself but rather absorb this behaviour in my wrapping layer.
My environment is Ubuntu 16.04, with gcc 5.4.0 and Go 1.9.2.
Documentation of cgo

I can reproduce the error for go1.8.5 and go1.9.2. I cannot reproduce the error for tip: devel +f01b928 Sat Nov 11 06:17:48 2017 +0000 (effectively go1.10alpha).
// Create a temporary slice containing invalid pointers.
// The idea is that the local variable slice can be garbage collected at the end of the function call.
// When the slice is scanned for linked objects, the GC comes across the invalid pointers.
A Go mantra is do not ignore errors. However, you seem to assume that that the GC will gracefully ignore errors. The GC should complain loudly (go1.8.5 and go1.9.2). At worst, with undefined behavior that may vary from release to release, the GC may appear to ignore errors (go devel).
The Go compiler sees a pointer and the Go runtime GC expects a valid pointer.
// go tool cgo
// type _Ctype_Handle *_Ctype_struct_HandlePrivate
// var handle _Ctype_Handle
var handle C.Handle
// main._Ctype_Handle <nil> 0x0
fmt.Fprintf(os.Stderr, "%[1]T %[1]v %[1]p\n", handle)
slice := make([]C.Handle, 1000000)
for i, _ := range slice {
slice[i] = C.getInvalidPointer()
}
Use type uintptr. For example,
package main
import "unsafe"
/*
#include "stddef.h"
// Create an opaque type to hide the details of the underlying data structure.
typedef struct HandlePrivate *Handle;
// In reality, the implementation uses a type derived from size_t for the Handle.
Handle getInvalidPointer() {
size_t actualHandle = 1;
return (Handle) actualHandle;
}
*/
import "C"
// Create a temporary slice of C pointers as Go integer type uintptr.
func getTempSlice() {
slice := make([]uintptr, 1000000)
for i, _ := range slice {
slice[i] = uintptr(unsafe.Pointer(C.getInvalidPointer()))
}
}
func main() {
getTempSlice()
}

Having trouble passing array of structs from user space to kernel space and then back again

I am trying to create a simple top utility xv6. To do this, I have created a system call that will allow me to to access the kernel space. I have followed many guides on how to create system calls and my code compiles without issues.
My problem exists when I try running top in qemu. I will get a trap 14 error whenever I try to access my struct array whether it be in kernel or user space.
To break things down a little, I have a top.c file:
...
int main(){
struct uproc table[MAX];//where MAX is defined as 10
if(!getprocs(MAX, table))
printf("YAY");
...
in sysproc.c:
...
int
sys_getprocs(int max, struct uproc table[]){
return gettable(table);
}
and then in procs.c
....
int gettable(struct uproc table[]){
struct uproc u;
struct proc *p;
int i = 0;
aquire(&ptable.lock);
for(p->state.proc; p < &ptable.proc[NPROC];p++){
if(//p->state is valid){
u.state = p->state;
...
table[i] = u;//where I get the trap 14 error
}
}
}
Again, my assumption is that when I pass table around from user to kernel it's getting corrupted, but with that said I'm not sure how I could properly pass it.

All xv6 sys_* functions are parameterless.
Read the argint, argstr, and argptr functions in order to figure our how to pass parameters to the kernel from user mode,

Error of mexfunction variables in an array of a structure variable

Recently, I tried to write mexfunctions using structure variables.
I watched the tutorial but got confused because of how the variable values are passed.
The following example (mexfunction_using_ex_wrong.m & mexfunction_using_ex_wrong.cpp) demonstrates how to fetch the variables passed from matlab in mexfunction.
However, in this case, the result is:
address i_c1=2067094464 i_c2=2067094464
i_c1=10 i_c2=10
address i_c1=1327990656 i_c2=2067100736
i_c1=2 i_c2=20
address i_c1=2067101056 i_c2=2067063424
i_c1=3 i_c2=30
As can be seen, the 1st element of the c1 & c2 array of a structure variable is accidentally the same.
But, in another example (mexfunction_using_ex_correct.m & mexfunction_using_ex_correct.cpp), the elements of array 1 (b1) and array 2(b2) of a structure variable are unrelated as I expect.
The result is:
address i_b1=1978456576 i_b2=1326968576
i_b1=1 i_b2=10
address i_b1=1978456584 i_b2=1326968584
i_b1=2 i_b2=20
address i_b1=1978456592 i_b2=1326968592
i_b1=3 i_b2=30
However, it's more common to use the 1st example in programming. so could anybody explain why in the 1st example the addresses of i_c1 & i_c2 are the same?
The following code is mexfunction_using_ex_wrong.m
clc
clear all
close all
mex mexfunction_using_ex_c_wrong.cpp;
a.b(1).c1=double(1);
a.b(2).c1=double(2);
a.b(3).c1=double(3);
a.b(1).c2=double(1);
a.b(2).c2=double(2);
a.b(3).c2=double(3);
mexfunction_using_ex_c_wrong(a);
The following code is mexfunction_using_ex_c_wrong.cpp
#include "mex.h"
void mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
int i, j, k;
double *i_c1;
double *i_c2;
// for struct variables(pointers) inside fcwcontext
mxArray *mx_b, *mx_c1, *mx_c2;
mx_b=mxGetField(prhs[0], 0, "b");
for(i = 0;i < 3;i=i+1)
{
mx_c1=mxGetField(mx_b, i, "c1");
mx_c2=mxGetField(mx_b, i, "c2");
i_c1=mxGetPr(mx_c1);
i_c2=mxGetPr(mx_c2);
*i_c2=(*i_c2)*10;
printf("address i_c1=%d i_c2=%d\n", i_c1, i_c2);
printf(" i_c1=%g i_c2=%g\n", *i_c1, *i_c2);
}
}
The following code is mexfunction_using_ex_c_correct.m
clc
clear all
close all
mex mexfunction_using_ex_correct.cpp;
a.b1(1)=double(1);
a.b1(2)=double(2);
a.b1(3)=double(3);
a.b2(1)=double(1);
a.b2(2)=double(2);
a.b2(3)=double(3);
mexfunction_using_ex_correct(a);
The following code is mexfunction_using_ex_c_correct.cpp
#include "mex.h"
void mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
int i, j, k;
double *i_b1;
double *i_b2;
mxArray *mx_b1, *mx_b2;
mx_b1=mxGetField(prhs[0], 0, "b1");
mx_b2=mxGetField(prhs[0], 0, "b2");
for(i = 0;i < 3;i=i+1)
{
i_b1=mxGetPr(mx_b1);
i_b2=mxGetPr(mx_b2);
i_b2[i]=i_b2[i]*10;
printf("address i_b1=%d i_b2=%d\n", &i_b1[i], &i_b2[i]);
printf(" i_b1=%g i_b2=%g\n", i_b1[i], i_b2[i]);
}
}

The addresses are not "accidentally the same" - they're intentionally the same, due to MATLAB's internal copy-on-write optimisations. If you look at the MEX documentation, you'll see warnings scattered around...
Do not modify any prhs values in your MEX-file. Changing the data in these read-only mxArrays can produce undesired side effects.
...in various forms...
Note Inputs to a MEX-file are constant read-only mxArrays. Do not modify the inputs. Using mxSetCell* or mxSetField* functions to modify the cells or fields of a MATLAB® argument causes unpredictable results.
...trying to make it very clear that you should absolutely not modify anything you recieve as an input. By calling mxGetPr() on input data and writing back to that pointer as you do with i_b2 and i_c2, you're getting right into that "unpredictable results" territory - if you look at a.b(1).c1 in the MATLAB workspace after the call, it'll really be 10 even though you "only" changed c2.
From MEX, you're looking at the raw data storage without any knowledge of, or access to, MATLAB's internal housekeeping, so the only safe way to modify anything is to use the mxCreate* or mxDuplicate* functions to get your own safe arrays you can then do whatever you want with, and pass back to MATLAB via plhs.
That said, I will admit to having abused in-place modification for a significant performance gain in one instance where I could guarantee my data was unique and unshared, but it's at best unsupported and at worst downright perilous.

get function address from name [.debug_info ??]

I was trying to write a small debug utility and for this I need to get the function/global variable address given its name. This is built-in debug utility, which means that the debug utility will run from within the code to be debugged or in plain words I cannot parse the executable file.
Now is there a well-known way to do that ? The plan I have is to make the .debug_* sections to to be loaded into to memory [which I plan to do by a cheap trick like this in ld script]
.data {
*(.data)
__sym_start = .;
(debug_);
__sym_end = .;
}
Now I have to parse the section to get the information I need, but I am not sure this is doable or is there issues with this - this is all just theory. But it also seems like too much of work :-) is there a simple way. Or if someone can tell upfront why my scheme will not work, it ill also be helpful.
Thanks in Advance,
Alex.

If you are running under a system with dlopen(3) and dlsym(3) (like Linux) you should be able to:
char thing_string[] = "thing_you_want_to_look_up";
void * handle = dlopen(NULL, RTLD_LAZY | RTLD_NOLOAD);
// you could do RTLD_NOW as well. shouldn't matter
if (!handle) {
fprintf(stderr, "Dynamic linking on main module : %s\n", dlerror() );
exit(1);
}
void * addr = dlsym(handle, thing_string);
fprintf(stderr, "%s is at %p\n", thing_string, addr);
I don't know the best way to do this for other systems, and this probably won't work for static variables and functions. C++ symbol names will be mangled, if you are interested in working with them.
To expand this to work for shared libraries you could probably get the names of the currently loaded libraries from /proc/self/maps and then pass the library file names into dlopen, though this could fail if the library has been renamed or deleted.
There are probably several other much better ways to go about this.
edit without using dlopen
/* name_addr.h */
struct name_addr {
const char * sym_name;
const void * sym_addr;
};
typedef struct name_addr name_addr_t;
void * sym_lookup(cost char * name);
extern const name_addr_t name_addr_table;
extern const unsigned name_addr_table_size;
/* name_addr_table.c */
#include "name_addr.h"
#define PREMEMBER( X ) extern const void * X
#define REMEMBER( X ) { .sym_name = #X , .sym_addr = (void *) X }
PREMEMBER(strcmp);
PREMEMBER(printf);
PREMEMBER(main);
PREMEMBER(memcmp);
PREMEMBER(bsearch);
PREMEMBER(sym_lookup);
/* ... */
const name_addr_t name_addr_table[] =
{
/* You could do a #include here that included the list, which would allow you
* to have an empty list by default without regenerating the entire file, as
* long as your compiler only warns about missing include targets.
*/
REMEMBER(strcmp),
REMEMBER(printf),
REMEMBER(main),
REMEMBER(memcmp),
REMEMBER(bsearch),
REMEMBER(sym_lookup);
/* ... */
};
const unsigned name_addr_table_size = sizeof(name_addr_table)/sizeof(name_addr_t);
/* name_addr_code.c */
#include "name_addr.h"
#include <string.h>
void * sym_lookup(cost char * name) {
unsigned to_go = name_addr_table_size;
const name_addr_t *na = name_addr_table;
while(to_to) {
if ( !strcmp(name, na->sym_name) ) {
return na->sym_addr;
}
na++;
to_do--;
}
/* set errno here if you are using errno */
return NULL; /* Or some other illegal value */
}
If you do it this way the linker will take care of filling in the addresses for you after everything has been laid out. If you include header files for all of the symbols that you are listing in your table then you will not get warnings when you compile the table file, but it will be much easier just to have them all be extern void * and let the compiler warn you about all of them (which it probably will, but not necessarily).
You will also probably want to sort your symbols by name such that you can use a binary search of the list rather than iterate through it.
You should note that if you have members in the table which are not otherwise referenced by the program (like if you had an entry for sqrt in the table, but didn't call it) the linker will then want (need) to link those functions into your image. This can make it blow up.
Also, if you were taking advantage of global optimizations having this table will likely make those less effective since the compiler will think that all of the functions listed could be accessed via pointer from this list and that it cannot see all of the call points.
Putting static functions in this list is not straight forward. You could do this by changing the table to dynamic and doing it at run time from a function in each module, or possibly by generating a new section in your object file that the table lives in. If you are using gcc:
#define SECTION_REMEMBER(X) \
static const name_addr_t _name_addr##X = \
{.sym_name= #X , .sym_addr = (void *) X } \
__attribute__(section("sym_lookup_table" ) )
And tack a list of these onto the end of each .c file with all of the symbols that you want to remember from that file. This will require linker work so that the linker will know what to do with these members, but then you can iterate over the list by looking at the begin and end of the section that it resides in (I don't know exactly how to do this, but I know it can be done and isn't TOO difficult). This will make having a sorted list more difficult, though. Also, I'm not entirely certain initializing the .sym_name to a string literal's address would not result in cramming the string into this section, but I don't think it would. If it did then this would break things.
You can still use objdump to get a list of the symbols that the object file (probably elf) contains, and then filter this for the symbols you are interested in, and then regenerate the table file the table's members listed.