Understanding how .Internal C functions are handled in R - c

I wonder if anyone can illustrate to me how R executes a C call from an R command typed at the console prompt. I am particularly confused by R's treatment of a) function arguments and b) the function call itself.
Let's take an example, in this case set.seed(). Wondering how it works I type the name in at the prompt, get the source (look here for more on that), see there is eventually a .Internal(set.seed(seed, i.knd, normal.kind), so dutifully look up the relevant function name in the .Internals section of /src/names.c, find it is called do_setseed and is in RNG.c which leads me to...
SEXP attribute_hidden do_setseed (SEXP call, SEXP op, SEXP args, SEXP env)
{
SEXP skind, nkind;
int seed;
checkArity(op, args);
if(!isNull(CAR(args))) {
seed = asInteger(CAR(args));
if (seed == NA_INTEGER)
error(_("supplied seed is not a valid integer"));
} else seed = TimeToSeed();
skind = CADR(args);
nkind = CADDR(args);
//...
//DO RNG here
//...
return R_NilValue;
}
What are CAR, CADR, CADDR? My research leads me to believe they are a Lisp influenced construct concerning lists but beyond that I do not understand what these functions do or why they are needed.
What does checkArity() do?
SEXP args seems self explanatory, but is this a list of the
arguments that is passed in the function call?
What does SEXP op represent? I take this to mean operator (like in binary functions such as +), but then what is the SEXP call for?
Is anyone able to flow through what happens when I type
set.seed(1)
at the R console prompt, up to the point at which skind and nkind are defined? I find I am not able to well understand the source code at this level and path from interpreter to C function.

CAR and CDR are how you access pairlist objects, as explained in section 2.1.11 of R Language Definition. CAR contains the first element, and CDR contains the remaining elements. An example is given in section 5.10.2 of Writing R Extensions:
#include <R.h>
#include <Rinternals.h>
SEXP convolveE(SEXP args)
{
int i, j, na, nb, nab;
double *xa, *xb, *xab;
SEXP a, b, ab;
a = PROTECT(coerceVector(CADR(args), REALSXP));
b = PROTECT(coerceVector(CADDR(args), REALSXP));
...
}
/* The macros: */
first = CADR(args);
second = CADDR(args);
third = CADDDR(args);
fourth = CAD4R(args);
/* provide convenient ways to access the first four arguments.
* More generally we can use the CDR and CAR macros as in: */
args = CDR(args); a = CAR(args);
args = CDR(args); b = CAR(args);
There's also a TAG macro to access the names given to the actual arguments.
checkArity ensures that the number of arguments passed to the function is correct. args are the actual arguments passed to the function. op is offset pointer "used for C functions that deal with more than one R function" (quoted from src/main/names.c, which also contains the table showing the offset and arity for each function).
For example, do_colsum handles col/rowSums and col/rowMeans.
/* Table of .Internal(.) and .Primitive(.) R functions
* ===== ========= ==========
* Each entry is a line with
*
* printname c-entry offset eval arity pp-kind precedence rightassoc
* --------- ------- ------ ---- ----- ------- ---------- ----------
{"colSums", do_colsum, 0, 11, 4, {PP_FUNCALL, PREC_FN, 0}},
{"colMeans", do_colsum, 1, 11, 4, {PP_FUNCALL, PREC_FN, 0}},
{"rowSums", do_colsum, 2, 11, 4, {PP_FUNCALL, PREC_FN, 0}},
{"rowMeans", do_colsum, 3, 11, 4, {PP_FUNCALL, PREC_FN, 0}},
Note that arity in the above table is 4 because (even though rowSums et al only have 3 arguments) do_colsum has 4, which you can see from the .Internal call in rowSums:
> rowSums
function (x, na.rm = FALSE, dims = 1L)
{
if (is.data.frame(x))
x <- as.matrix(x)
if (!is.array(x) || length(dn <- dim(x)) < 2L)
stop("'x' must be an array of at least two dimensions")
if (dims < 1L || dims > length(dn) - 1L)
stop("invalid 'dims'")
p <- prod(dn[-(1L:dims)])
dn <- dn[1L:dims]
z <- if (is.complex(x))
.Internal(rowSums(Re(x), prod(dn), p, na.rm)) + (0+1i) *
.Internal(rowSums(Im(x), prod(dn), p, na.rm))
else .Internal(rowSums(x, prod(dn), p, na.rm))
if (length(dn) > 1L) {
dim(z) <- dn
dimnames(z) <- dimnames(x)[1L:dims]
}
else names(z) <- dimnames(x)[[1L]]
z
}

The basic C-level pairlist extraction functions are CAR and CDR. (Pairlists are very similar to lists but are implemented as a linked-list and are used internally for argument lists). They have simple R equivalents: x[[1]] and x[-1]. R also provides lots of combinations of the two:
CAAR(x) = CAR(CAR(x)) which is equivalent to x[[1]][[1]]
CADR(x) = CAR(CDR(x)) which is equivalent to x[-1][[1]], i.e. x[[2]]
CADDR(x) = CAR(CDR(CDR(x)) is equivalent to x[-1][-1][[1]], i.e. x[[3]]
and so on
Accessing the nth element of a pairlist is an O(n) operation, unlike accessing the nth element of a list which is O(1). This is why there aren't nicer functions for accessing the nth element of a pairlist.
Internal/primitive functions don't do matching by name, they only use positional matching, which is why they can use this simple system for extracting the arguments.
Next you need to understand what the arguments to the C function are. I'm not sure where these are documented, so I might not be completely right about the structure, but I should be the general pieces:
call: the complete call, as might be captured by match.call()
op: the index of the .Internal function called from R. This is needed because there is a many-to-1 mapping from .Internal functions to C functions. (e.g. do_summary implements sum, mean, min, max and prod). The number is the third entry in names.c - it's always 0 for do_setseed and hence never used
args: a pair list of the arguments supplied to the function.
env: the environment from which the function was called.
checkArity is a macro which calls Rf_checkArityCall, which basically looks up the number of arguments (the fifth column in names.c is arity) and make sure the supplied number matches. You have to follow through quite a few macros and functions in C to see what's going on - it's very helpful to have a local copy of R-source that you can grep through.

Related

How to make a list with just the used inputs for a C module

I have a large module that uses a very large input buffer, consisting of many structures which, in turn, contain other structures and in the end each structure has several variables.
Out of these hundreds of input variables, my module (standalone C entity) uses only a fraction.
I would like to know if there is a way to make a list that will contain only the variables used in my module (would be perfect if it contains the variable type and links to structure/s that contains it).
I tried Doxygen (1.8.5) but I could generate a doc with all input variables, only.
[Later EDIT]
I add an example code and the desired outcome:
#include <stdio.h>
typedef struct subS1{
unsigned char bIn1;
unsigned char bIn2;
} subS1;
typedef struct S1{
struct subS1 stMySubStruct1;
struct subS1 stMySubStruct2;
struct subS1 stMySubStruct3;
} MyInputStruct_t;
void Foo1(MyInputStruct_t *Input);
void Foo2(MyInputStruct_t *Input);
MyInputStruct_t stMyInputStruct = {{1, 2}, {0, 0}, {9, 6}}; // large input buffer
int main() {
Foo1(&stMyInputStruct); // call to my Module 'main' function
return 0;
}
void Foo1(MyInputStruct_t *Input)
{
if(Input->stMySubStruct1.bIn1 == 1)
{
printf("bIn1 = %d\n", Input->stMySubStruct1.bIn1); // stMySubStruct1.bIn1 is used (read or write)
}
Foo2(Input);
return;
}
void Foo2(MyInputStruct_t *Input)
{
if(Input->stMySubStruct3.bIn2 == 0)
{
printf("bIn2 = %d\n", Input->stMySubStruct3.bIn2); // stMySubStruct3.bIn2 is used (read or write)
}
return;
}
The list with just the used inputs for Foo1(): e.g
stMyInputStruct.stMySubStruct1.bIn1 -> is used in Foo1()
stMyInputStruct.stMySubStruct1.bIn2 -> is NOT used
...
stMyInputStruct.stMySubStruct3.bIn2 -> is used in Foo2()
This is just a five-minute hack to demonstrate what I mean, so take it with a grain of salt and for what it is.
So first I downloaded pycparser from https://github.com/eliben/pycparser/
Then I edit the C-generator from https://github.com/eliben/pycparser/blob/master/pycparser/c_generator.py
... adding two lines to the constructor-code (adding two vars struct_refs + struct_ref):
class CGenerator(object):
""" Uses the same visitor pattern as c_ast.NodeVisitor, but modified to
return a value from each visit method, using string accumulation in
generic_visit.
"""
def __init__(self, reduce_parentheses=False):
""" Constructs C-code generator
reduce_parentheses:
if True, eliminates needless parentheses on binary operators
"""
# Statements start with indentation of self.indent_level spaces, using
# the _make_indent method.
self.indent_level = 0
self.reduce_parentheses = reduce_parentheses
# newly added variables here
self.struct_refs = set()
self.struct_ref = None
Then I edit two visitor-functions, to make them save information about the struct-references they parse:
def visit_ID(self, n):
if self.struct_ref:
self.struct_refs.add(self.struct_ref + "->" + n.name)
return n.name
def visit_StructRef(self, n):
sref = self._parenthesize_unless_simple(n.name)
self.struct_ref = sref
self.struct_refs.add(sref)
res = sref + n.type + self.visit(n.field)
self.struct_ref = None
return res
Running this modified piece of Python script over your example code, collects this information:
>>> cgen.struct_refs
{'Input',
'Input->stMySubStruct1',
'Input->stMySubStruct1->bIn1',
'Input->stMySubStruct3',
'Input->stMySubStruct3->bIn2'}
So with a bit more work, it should be able to do the job more generally.
This of course breaks apart in the face of memcpy, struct-member-access-through-pointers etc.
You can also try exploiting structure in your code as well. E.g. If you always call your input-struct "Input", things gets easier.

.C() returns me an empty list

I'm a beginner in R and I'm trying to load a .dll file, named dll.dll, that's written in C, into R. It seems to work, now I want to use the functions that are stored in the .dll file and I encounter problems.
I've searched for a solution or other method in manuals, here and on google. Would be very thankful if I could get a suggestion of what to use or any idea!
My code:
setwd("C:/Users/MyUser/R")
dyn.load("dll.dll")
is.loaded("DLL_FUNK")
# For some reason True with capital letters, not in lower case
output <- .C("DLL_FUNK", in9 = as.integer(7))
#output # R Crashes before I can write this.
# R Crashes
# In outdata.txt: "in-value= 139375128"
The function should return a number, 1955. But I can't seem to get to that value. What am I doing wrong?
Update with code (Fortran runned as C), this is the code in dll.dll:
subroutine dll_funk(in9)
implicit none
!+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
!*** Declarations: variables, functions
!+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
integer(4) :: in9
!integer :: in9
! Definitions of variables in the external function calls
!!dec$ attributes c,alias :'dll_funk' :: dll_funk
!dec$ attributes dllexport :: dll_funk
!dec$ attributes value :: in9
open(194,file='outdata.txt')
write(194,*) 'in-value=', in9
! in9 = 1955
close(194)
end subroutine
!end function
So now when it runs, R crashes but before it writes to my file (outdata.txt) but it't not my number, maybe some kind of address...
Another question, do you recommend me to run the code with .C and from C run the Fortran code or is it better to run it with .Fortran with only Fortran code?
It seems like .Fortran have problem handling strings, or that's what I understood from: Interface func .C and .Fortran
Why did not you pass any arguments to your C function dll_function? When you use .C(), you have to pass function arguments as a list. .C() will return modified list. So, If you pass in nothing, you get nothing.
What does your C function dll_function looks like? Note that:
dll_function must be a void C function, with no return values. If this function should return something, it must return by modifying function arguments;
all function arguments of dll_function must be pointers.
Follow-up
The dll_function is only to test if I can get access to it.
You can use is.loaded() after dyn.load() to test whether you have access to the C function:
dyn.load("dll.dll")
is.loaded("dll_function") ## TRUE
Note that, is.loaded takes C function name, while dyn.load() takes .dll name. In general you can have multiple functions in a single .dll file. You can use is.loaded() to check either of them, to test whether shared library has been loaded successfully.
So if I want it to return something, I should give it an argument (of same type?)?
Yes. The other answer here does give a toy example. You can have a look at this answer I made half a month ago. At the bottom there is a summary of variable type.
When using .C, the extra arguments passed to .C are copied and passed on as pointers to the called c-function. This function can then modify the data pointer to by the pointers. The return value of the function is ignored by .C. So, you c-function should look something like:
void dll_function(int* result) {
/* Do some complicated computation that results in 1955 */
(*result) = 1955;
}
And your call from R:
.C("dll_function", integer(1))
An example with input (this calculates the sum of an integer vector; this example assumes that there are no missing values in vector):
void dll_function2(int* result, int* vector, int* length) {
int sum = 0;
for (int i = 0; i < (*length); ++i, ++vector) {
sum += (*vector)
}
(*result) = sum;
}
Called from R:
x <- c(1000, 900, 55)
.C("dll_function2", integer(1), as.integer(x), length(x))[[1]]

SimGrid. How to write comparator?

I have a dynamic array of hosts:
xbt_dynar_t dynar_host = xbt_dynar_new(sizeof(MSG_host_t), NULL);
Each host contains information about its speed in flops.
I want to sort it by their host's speed. In documentation I found function xbt_dynar_sort. This function accepts two parameters: dynamic array itself and comparator int_f_cpvoid_cpvoid_t compar_fn.
Any advises or example how can this comparator be written?
This function only apply the standard qsort function to the data stored in the dynar, so you should also read the libc documentation, the man page or this tutorial for more info.
So you should write a function somehow similar to the following:
int mycmp(void *a,void*b)
{
MSG_host_t hostA = *(MSG_host_t*)a;
MSG_host_t hostB = *(MSG_host_t*)b;
double valA = MSG_host_get_speed(hostA);
double valB = MSG_host_get_speed(hostB)
return (valA > valB) - (valA < valB);
}
And then, call xbt_dynar_sort(dynar, mycmp) to sort your dynar.
Note that the actual comparison on the return line of the function is a bit complicated. This is a way to obey the function semantic (return -1 if A < B, 0 if A==B and 1 if A > B) in a way that is numerically stable. This is as advised in the relevant documentation of libc.

passing unevaluated expressions to C/C++

I'd like to pass a variable number of arguments from a function to C/C++, but would like to leave the arguments unevaluated and at the same time don't want to do any computations in R (aside from calling the C/C++ function), i.e. I don't want to call substitute in my R function. One option for this that I thought I could use is .External and doing smth like this:
R_fn = function(...) .External("cpp_fn", ...)
...
# and in C code:
SEXP cpp_fn (SEXP arglist) {
}
However .External is evaluating arguments in ..., so if I try something like
rm(x, y) # just making sure these don't exist
R_fn(x*y)
I get an error because R is trying to evaluate x*y before sending it to the function.
To contrast, the following works in R:
f = function(...) g(...)
g = function(x, ...) print(substitute(x))
f(x*y*z)
# x * y * z
What other options do I have? Clearly it's possible to do as R itself does it for a number of functions, e.g. substitute itself, but I don't understand how to do it. I added the rcpp tag because my eventual usage of this is going to be in Rcpp.
One possibility is to do what match.call does (thanks to Ricardo Saporta for pointing me in that direction). This requires copy-pasting a few definitions from R source code that I won't do here, but the basic idea is to get the calling function from R_GlobalContext and then extract the function arguments from there. The rough sketch is as follows:
R_fn = function(...) .Call("cpp_fn")
// and in C++ code
Language cpp_fn() {
SEXP sysp = ((RCNTXT*)R_GlobalContext)->sysparent;
RCNTXT *cptr = (RCNTXT*)R_GlobalContext;
while (cptr != NULL) {
if (cptr->callflag & CTXT_FUNCTION && cptr->cloenv == sysp)
break;
cptr = cptr->nextcontext;
}
cptr = cptr->nextcontext; // because this is called from .Call and not from R_fn
// and now cptr->promargs has the unevaluated arguments to do as one pleases
// e.g.
Language firstArg(R_PromiseExpr(CAR(cptr->promargs)));
return firstArg;
}

Using R random number generators in C [duplicate]

I would like to, within my own compiled C++ code, check to see if a library package is loaded in R (if not, load it), call a function from that library and get the results back to in my C++ code.
Could someone point me in the right direction? There seems to be a plethora of info on R and different ways of calling R from C++ and vis versa, but I have not come across exactly what I am wanting to do.
Thanks.
Dirk's probably right that RInside makes life easier. But for the die-hards... The essence comes from Writing R Extensions sections 8.1 and 8.2, and from the examples distributed with R. The material below covers constructing and evaluating the call; dealing with the return value is a different (and in some sense easier) topic.
Setup
Let's suppose a Linux / Mac platform. The first thing is that R must have been compiled to allow linking, either to a shared or static R library. I work with an svn copy of R's source, in the directory ~/src/R-devel. I switch to some other directory, call it ~/bin/R-devel, and then
~/src/R-devel/configure --enable-R-shlib
make -j
this generates ~/bin/R-devel/lib/libR.so; perhaps whatever distribution you're using already has this? The -j flag runs make in parallel, which greatly speeds the build.
Examples for embedding are in ~/src/R-devel/tests/Embedding, and they can be made with cd ~/bin/R-devel/tests/Embedding && make. Obviously, the source code for these examples is extremely instructive.
Code
To illustrate, create a file embed.cpp. Start by including the header that defines R data structures, and the R embedding interface; these are located in bin/R-devel/include, and serve as the primary documentation. We also have a prototype for the function that will do all the work
#include <Rembedded.h>
#include <Rdefines.h>
static void doSplinesExample();
The work flow is to start R, do the work, and end R:
int
main(int argc, char *argv[])
{
Rf_initEmbeddedR(argc, argv);
doSplinesExample();
Rf_endEmbeddedR(0);
return 0;
}
The examples under Embedding include one that calls library(splines), sets a named option, then runs a function example("ns"). Here's the routine that does this
static void
doSplinesExample()
{
SEXP e, result;
int errorOccurred;
// create and evaluate 'library(splines)'
PROTECT(e = lang2(install("library"), mkString("splines")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
if (errorOccurred) {
// handle error
}
UNPROTECT(1);
// 'options(FALSE)' ...
PROTECT(e = lang2(install("options"), ScalarLogical(0)));
// ... modified to 'options(example.ask=FALSE)' (this is obscure)
SET_TAG(CDR(e), install("example.ask"));
R_tryEval(e, R_GlobalEnv, NULL);
UNPROTECT(1);
// 'example("ns")'
PROTECT(e = lang2(install("example"), mkString("ns")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
UNPROTECT(1);
}
Compile and run
We're now ready to put everything together. The compiler needs to know where the headers and libraries are
g++ -I/home/user/bin/R-devel/include -L/home/user/bin/R-devel/lib -lR embed.cpp
The compiled application needs to be run in the correct environment, e.g., with R_HOME set correctly; this can be arranged easily (obviously a deployed app would want to take a more extensive approach) with
R CMD ./a.out
Depending on your ambitions, some parts of section 8 of Writing R Extensions are not relevant, e.g., callbacks are needed to implement a GUI on top of R, but not to evaluate simple code chunks.
Some detail
Running through that in a bit of detail... An SEXP (S-expression) is a data structure fundamental to R's representation of basic types (integer, logical, language calls, etc.). The line
PROTECT(e = lang2(install("library"), mkString("splines")));
makes a symbol library and a string "splines", and places them into a language construct consisting of two elements. This constructs an unevaluated language object, approximately equivalent to quote(library("splines")) in R. lang2 returns an SEXP that has been allocated from R's memory pool, and it needs to be PROTECTed from garbage collection. PROTECT adds the address pointed to by e to a protection stack, when the memory no longer needs to be protected, the address is popped from the stack (with UNPROTECT(1), a few lines down). The line
R_tryEval(e, R_GlobalEnv, &errorOccurred);
tries to evaluate e in R's global environment. errorOccurred is set to non-0 if an error occurs. R_tryEval returns an SEXP representing the result of the function, but we ignore it here. Because we no longer need the memory allocated to store library("splines"), we tell R that it is no longer PROTECT'ed.
The next chunk of code is similar, evaluating options(example.ask=FALSE), but the construction of the call is more complicated. The S-expression created by lang2 is a pair list, conceptually with a node, a left pointer (CAR) and a right pointer (CDR). The left pointer of e points to the symbol options. The right pointer of e points to another node in the pair list, whose left pointer is FALSE (the right pointer is R_NilValue, indicating the end of the language expression). Each node of a pair list can have a TAG, the meaning of which depends on the role played by the node. Here we attach an argument name.
SET_TAG(CDR(e), install("example.ask"));
The next line evaluates the expression that we have constructed (options(example.ask=FALSE)), using NULL to indicate that we'll ignore the success or failure of the function's evaluation. A different way of constructing and evaluating this call is illustrated in R-devel/tests/Embedding/RParseEval.c, adapted here as
PROTECT(tmp = mkString("options(example.ask=FALSE)"));
PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
R_tryEval(VECTOR_ELT(e, 0), R_GlobalEnv, NULL);
UNPROTECT(2);
but this doesn't seem like a good strategy in general, as it mixes R and C code and does not allow computed arguments to be used in R functions. Instead write and manage R code in R (e.g., creating a package with functions that perform complicated series of R manipulations) that your C code uses.
The final block of code above constructs and evaluates example("ns"). Rf_tryEval returns the result of the function call, so
SEXP result;
PROTECT(result = Rf_tryEval(e, R_GlobalEnv, &errorOccurred));
// ...
UNPROTECT(1);
would capture that for subsequent processing.
There is Rcpp which allows you to easily extend R with C++ code, and also have that C++ code call back to R. There are examples included in the package which show that.
But maybe what you really want is to keep your C++ program (i.e. you own main()) and call out to R? That can be done most easily with
RInside which allows you to very easily embed R inside your C++ application---and the test for library, load if needed and function call are then extremely easy to do, and the (more than a dozen) included examples show you how to. And Rcpp still helps you to get results back and forth.
Edit: As Martin was kind enough to show things the official way I cannot help and contrast it with one of the examples shipping with RInside. It is something I once wrote quickly to help someone who had asked on r-help about how to load (a portfolio optimisation) library and use it. It meets your requirements: load a library, accesses some data in pass a weights vector down from C++ to R, deploy R and get the result back.
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8; -*-
//
// Simple example for the repeated r-devel mails by Abhijit Bera
//
// Copyright (C) 2009 Dirk Eddelbuettel
// Copyright (C) 2010 - 2011 Dirk Eddelbuettel and Romain Francois
#include <RInside.h> // for the embedded R via RInside
int main(int argc, char *argv[]) {
try {
RInside R(argc, argv); // create an embedded R instance
std::string txt = "suppressMessages(library(fPortfolio))";
R.parseEvalQ(txt); // load library, no return value
txt = "M <- as.matrix(SWX.RET); print(head(M)); M";
// assign mat. M to NumericMatrix
Rcpp::NumericMatrix M = R.parseEval(txt);
std::cout << "M has "
<< M.nrow() << " rows and "
<< M.ncol() << " cols" << std::endl;
txt = "colnames(M)"; // assign columns names of M to ans and
// into string vector cnames
Rcpp::CharacterVector cnames = R.parseEval(txt);
for (int i=0; i<M.ncol(); i++) {
std::cout << "Column " << cnames[i]
<< " in row 42 has " << M(42,i) << std::endl;
}
} catch(std::exception& ex) {
std::cerr << "Exception caught: " << ex.what() << std::endl;
} catch(...) {
std::cerr << "Unknown exception caught" << std::endl;
}
exit(0);
}
This rinside_sample2.cpp, and there are lots more examples in the package. To build it, you just say 'make rinside_sample2' as the supplied Makefile is set up to find R, Rcpp and RInside.

Resources