Using R random number generators in C [duplicate] - c

I would like to, within my own compiled C++ code, check to see if a library package is loaded in R (if not, load it), call a function from that library and get the results back to in my C++ code.
Could someone point me in the right direction? There seems to be a plethora of info on R and different ways of calling R from C++ and vis versa, but I have not come across exactly what I am wanting to do.
Thanks.

Dirk's probably right that RInside makes life easier. But for the die-hards... The essence comes from Writing R Extensions sections 8.1 and 8.2, and from the examples distributed with R. The material below covers constructing and evaluating the call; dealing with the return value is a different (and in some sense easier) topic.
Setup
Let's suppose a Linux / Mac platform. The first thing is that R must have been compiled to allow linking, either to a shared or static R library. I work with an svn copy of R's source, in the directory ~/src/R-devel. I switch to some other directory, call it ~/bin/R-devel, and then
~/src/R-devel/configure --enable-R-shlib
make -j
this generates ~/bin/R-devel/lib/libR.so; perhaps whatever distribution you're using already has this? The -j flag runs make in parallel, which greatly speeds the build.
Examples for embedding are in ~/src/R-devel/tests/Embedding, and they can be made with cd ~/bin/R-devel/tests/Embedding && make. Obviously, the source code for these examples is extremely instructive.
Code
To illustrate, create a file embed.cpp. Start by including the header that defines R data structures, and the R embedding interface; these are located in bin/R-devel/include, and serve as the primary documentation. We also have a prototype for the function that will do all the work
#include <Rembedded.h>
#include <Rdefines.h>
static void doSplinesExample();
The work flow is to start R, do the work, and end R:
int
main(int argc, char *argv[])
{
Rf_initEmbeddedR(argc, argv);
doSplinesExample();
Rf_endEmbeddedR(0);
return 0;
}
The examples under Embedding include one that calls library(splines), sets a named option, then runs a function example("ns"). Here's the routine that does this
static void
doSplinesExample()
{
SEXP e, result;
int errorOccurred;
// create and evaluate 'library(splines)'
PROTECT(e = lang2(install("library"), mkString("splines")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
if (errorOccurred) {
// handle error
}
UNPROTECT(1);
// 'options(FALSE)' ...
PROTECT(e = lang2(install("options"), ScalarLogical(0)));
// ... modified to 'options(example.ask=FALSE)' (this is obscure)
SET_TAG(CDR(e), install("example.ask"));
R_tryEval(e, R_GlobalEnv, NULL);
UNPROTECT(1);
// 'example("ns")'
PROTECT(e = lang2(install("example"), mkString("ns")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
UNPROTECT(1);
}
Compile and run
We're now ready to put everything together. The compiler needs to know where the headers and libraries are
g++ -I/home/user/bin/R-devel/include -L/home/user/bin/R-devel/lib -lR embed.cpp
The compiled application needs to be run in the correct environment, e.g., with R_HOME set correctly; this can be arranged easily (obviously a deployed app would want to take a more extensive approach) with
R CMD ./a.out
Depending on your ambitions, some parts of section 8 of Writing R Extensions are not relevant, e.g., callbacks are needed to implement a GUI on top of R, but not to evaluate simple code chunks.
Some detail
Running through that in a bit of detail... An SEXP (S-expression) is a data structure fundamental to R's representation of basic types (integer, logical, language calls, etc.). The line
PROTECT(e = lang2(install("library"), mkString("splines")));
makes a symbol library and a string "splines", and places them into a language construct consisting of two elements. This constructs an unevaluated language object, approximately equivalent to quote(library("splines")) in R. lang2 returns an SEXP that has been allocated from R's memory pool, and it needs to be PROTECTed from garbage collection. PROTECT adds the address pointed to by e to a protection stack, when the memory no longer needs to be protected, the address is popped from the stack (with UNPROTECT(1), a few lines down). The line
R_tryEval(e, R_GlobalEnv, &errorOccurred);
tries to evaluate e in R's global environment. errorOccurred is set to non-0 if an error occurs. R_tryEval returns an SEXP representing the result of the function, but we ignore it here. Because we no longer need the memory allocated to store library("splines"), we tell R that it is no longer PROTECT'ed.
The next chunk of code is similar, evaluating options(example.ask=FALSE), but the construction of the call is more complicated. The S-expression created by lang2 is a pair list, conceptually with a node, a left pointer (CAR) and a right pointer (CDR). The left pointer of e points to the symbol options. The right pointer of e points to another node in the pair list, whose left pointer is FALSE (the right pointer is R_NilValue, indicating the end of the language expression). Each node of a pair list can have a TAG, the meaning of which depends on the role played by the node. Here we attach an argument name.
SET_TAG(CDR(e), install("example.ask"));
The next line evaluates the expression that we have constructed (options(example.ask=FALSE)), using NULL to indicate that we'll ignore the success or failure of the function's evaluation. A different way of constructing and evaluating this call is illustrated in R-devel/tests/Embedding/RParseEval.c, adapted here as
PROTECT(tmp = mkString("options(example.ask=FALSE)"));
PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
R_tryEval(VECTOR_ELT(e, 0), R_GlobalEnv, NULL);
UNPROTECT(2);
but this doesn't seem like a good strategy in general, as it mixes R and C code and does not allow computed arguments to be used in R functions. Instead write and manage R code in R (e.g., creating a package with functions that perform complicated series of R manipulations) that your C code uses.
The final block of code above constructs and evaluates example("ns"). Rf_tryEval returns the result of the function call, so
SEXP result;
PROTECT(result = Rf_tryEval(e, R_GlobalEnv, &errorOccurred));
// ...
UNPROTECT(1);
would capture that for subsequent processing.

There is Rcpp which allows you to easily extend R with C++ code, and also have that C++ code call back to R. There are examples included in the package which show that.
But maybe what you really want is to keep your C++ program (i.e. you own main()) and call out to R? That can be done most easily with
RInside which allows you to very easily embed R inside your C++ application---and the test for library, load if needed and function call are then extremely easy to do, and the (more than a dozen) included examples show you how to. And Rcpp still helps you to get results back and forth.
Edit: As Martin was kind enough to show things the official way I cannot help and contrast it with one of the examples shipping with RInside. It is something I once wrote quickly to help someone who had asked on r-help about how to load (a portfolio optimisation) library and use it. It meets your requirements: load a library, accesses some data in pass a weights vector down from C++ to R, deploy R and get the result back.
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8; -*-
//
// Simple example for the repeated r-devel mails by Abhijit Bera
//
// Copyright (C) 2009 Dirk Eddelbuettel
// Copyright (C) 2010 - 2011 Dirk Eddelbuettel and Romain Francois
#include <RInside.h> // for the embedded R via RInside
int main(int argc, char *argv[]) {
try {
RInside R(argc, argv); // create an embedded R instance
std::string txt = "suppressMessages(library(fPortfolio))";
R.parseEvalQ(txt); // load library, no return value
txt = "M <- as.matrix(SWX.RET); print(head(M)); M";
// assign mat. M to NumericMatrix
Rcpp::NumericMatrix M = R.parseEval(txt);
std::cout << "M has "
<< M.nrow() << " rows and "
<< M.ncol() << " cols" << std::endl;
txt = "colnames(M)"; // assign columns names of M to ans and
// into string vector cnames
Rcpp::CharacterVector cnames = R.parseEval(txt);
for (int i=0; i<M.ncol(); i++) {
std::cout << "Column " << cnames[i]
<< " in row 42 has " << M(42,i) << std::endl;
}
} catch(std::exception& ex) {
std::cerr << "Exception caught: " << ex.what() << std::endl;
} catch(...) {
std::cerr << "Unknown exception caught" << std::endl;
}
exit(0);
}
This rinside_sample2.cpp, and there are lots more examples in the package. To build it, you just say 'make rinside_sample2' as the supplied Makefile is set up to find R, Rcpp and RInside.

Related

How to 'tag' a location in a C source file for a later breakpoint definition?

Problem:
I want to be able to put different potentially unique or repeated "tags" across my C code, such that I can use them in gdb to create breakpoints.
Similar Work:
Breakpoints to line-numbers: The main difference with breakpoints on source lines, is that if the code previous to the tag is modified in such a way that it results in more or less lines, a reference to the tag would still be semantically correct, a reference to the source line would not.
Labels: I am coming from my previous question, How to tell gcc to keep my unused labels?, in which I preconceived the idea that the answer was to insert labels. Upon discussion with knowledgeable members of the platform, I was taught that label's names are not preserved after compilation. Labels not used within C are removed by the compiler.
Injecting asm labels: Related to the previous approach, if I inject asm code in the C source, certain problems arise, due to inline functions, compiler optimizations, and lack of scoping. This makes this approach not robust.
Define a dummy function: On this other question, Set GDB breakpoint in C file, there is an interesting approach, in which a "dummy" function can be placed in the code, and then add a breakpoint to the function call. The problem with this approach is that the definition of such function must be replicated for each different tag.
Is there a better solution to accomplish this? Or a different angle to attack the presented problem?
Using SDT (Statically Defined Tracing) probe points appears to satisfy all the requirements.
GDB documentation links to examples of how to define the probes.
Example use: (gdb) break -probe-stap my_probe (this should be documented in the GDB breakpoints section, but currently isn't).
You could create a dummy variable and set it to different values. Then you can use conditional watchpoints. Example:
#include <stdio.h>
static volatile int loc;
int main()
{
loc = 1;
puts("hello world");
loc = 2;
return 0;
}
(gdb) watch loc if loc == 2
Hardware watchpoint 1: loc
(gdb) r
Starting program: /tmp/a.out
hello world
Hardware watchpoint 1: loc
Old value = 1
New value = 2
main () at test.c:8
8 return 0;
You can of course wrap the assignment in a macro so you only get it in debug builds. Usual caveats apply: optimizations and inlining may be affected.
Use python to search a source file for some predefined labels, and set breakpoints there:
def break_on_labels(source, label):
"""add breakpoint on each SOURCE line containing LABEL"""
with open(source) as file:
l = 0
for line in file:
l = l + 1
if label in line:
gdb.Breakpoint(source=source, line=l)
main_file = gdb.lookup_global_symbol("main").symtab.fullname()
break_on_labels(main_file, "BREAK-HERE")
Example:
int main(void)
{
int a = 15;
a = a + 23; // BREAK-HERE
return a;
}
You could insert a #warning at each line where you want a breakpoint, then have a script to parse the file and line numbers from the compiler messages and write a .gdbinit file placing breakpoints at those locations.

why handle to an object frequently appears as pointer-to-pointer

What is the intention to set handle to an object as pointer-to pointer but not pointer? Like following code:
FT_Library library;
FT_Error error = FT_Init_FreeType( &library );
where
typedef struct FT_LibraryRec_ *FT_Library
so &library is a FT_LIBraryRec_ handle of type FT_LIBraryRec_**
It's a way to emulate pass by reference in C, which otherwise only have pass by value.
The 'C' library function FT_Init_FreeType has two outputs, the error code and/or the library handle (which is a pointer).
In C++ we'd more naturally either:
return an object which encapsulated the success or failure of the call and the library handle, or
return one output - the library handle, and throw an exception on failure.
C APIs are generally not implemented this way.
It is not unusual for a C Library function to return a success code, and to be passed the addresses of in/out variables to be conditionally mutated, as per the case above.
The approach hides implementation. It speeds up compilation of your code. It allows to upgrade data structures used by the library without breaking existing code that uses them. Finally, it makes sure the address of that object never changes, and that you don’t copy these objects.
Here’s how the version with a single pointer might be implemented:
struct FT_Struct
{
// Some fields/properties go here, e.g.
int field1;
char* field2;
}
FT_Error Init( FT_Struct* p )
{
p->field1 = 11;
p->field2 = malloc( 100 );
if( nullptr == p->field2 )
return E_OUTOFMEMORY;
return S_OK;
}
Or C++ equivalent, without any pointers:
class FT_Struct
{
int field1;
std::vector<char> field2;
public:
FT_Struct() :
field1( 11 )
{
field2.resize( 100 );
}
};
As a user of the library, you have to include struct/class FT_Struct definition. Libraries can be very complex so this will slow down compilation of your code.
If the library is dynamic i.e. *.dll on windows, *.so on linux or *.dylib on osx, you upgrade the library and if the new version changes memory layout of the struct/class, old applications will crash.
Because of the way C++ works, objects are passed by value, i.e. you normally expect them to be movable and copiable, which is not necessarily what library author wants to support.
Now consider the following function instead:
FT_Error Init( FT_Struct** pp )
{
try
{
*pp = new FT_Struct();
return S_OK;
}
catch( std::exception& ex )
{
return E_FAIL;
}
}
As a user of the library, you no longer need to know what’s inside FT_Struct or even what size it is. You don’t need to #include the implementation details, i.e. compilation will be faster.
This plays nicely with dynamic libraries, library author can change memory layout however they please, as long as the C API is stable, old apps will continue to work.
The API guarantees you won’t copy or move the values, you can’t copy structures of unknown lengths.

"not resolved from current namespace" error, when calling C routines from R

I am recently doing some computational testing with mgcv GAM. Some original functions are modified, and some are added. In order not to break compatibility, for every function I want to modify, I create a new version with a .zheyuan surfix in function name. For example, with Sl.fit function for doing penalized least squares fitting, I will have an Sl.fit.zheyuan. I would simply collect all R functions I write into a standalone R script "zheyuan.R". By adding this file into the R directory of mgcv_1.8-17 package source, and compiling this modified package into a local path, I could load it for testing purpose.
I have no problem in adding R routines, but not when adding C routines. No error occurs when installing the modified package, but when I call the R wrapper function of my added C routine, I would get the error as in my question title. If you are interested in my case, you may follow the following steps to reproduce such error.
Step 1: download the latest package source
Download the 1.8-17 version from the above link. Such link will die when new mgcv release is published on CRAN. But you can always go to mgcv CRAN page to download the latest release.
Let's untar the source. First, remove the file MD5, so that we don't get annoying MD5 warning when compiling the modified version. In the following, we would add new stuff into R directory, and src directory.
Step 2: create an R script
Consider the following R wrapper function:
RX <- function (R, X) {
X <- X + 0
.Call("C_mgcv_RX", R, X)
X
}
Create an R script "zheyuan.R" to place this function, and add it into mgcv/R.
Step 3: adding C routines
C routines regarding matrix computations are normally under src/mat.c. So let's append a new function to the end of this script:
void mgcv_RX (SEXP R, SEXP X) {
int nrowX = nrows(X);
int ncolX = ncols(X);
double one = 1.0;
F77_CALL(dtrmm)("l", "u", "n", "n", &nrowX, &ncolX, &one, REAL(R), &nrowX, REAL(X), &nrowX);
}
This is a simple routine multiplying an upper triangular matrix R, with a rectangular matrix X. The output matrix will overwrite X. Level-3 BLAS dtrmm will be called for this purpose. We don't need to worry about header files or runtime linking to BLAS library. Headers are availalbe in mat.c, and linking to BLAS is managed by R.
Step 4: register C routine
The above is insufficient. Each C routine in mgcv will appear in three places. For example, let's try searching a native C routine:
grep mgcv_RPPt mgcv/src/*
# mgcv/src/init.c: {"mgcv_RPPt",(DL_FUNC)&mgcv_RPPt,3},
# mgcv/src/mat.c:void mgcv_RPPt(SEXP a,SEXP r, SEXP NT) {
# mgcv/src/mgcv.h:void mgcv_RPPt(SEXP a,SEXP r, SEXP NT);
We also need to append a header file mgcv.h, and register this C routine in init.c.
Let's append
void mgcv_RX (SEXP R, SEXP X);
to the end of mgcv.h, and inside init.c, do:
R_CallMethodDef CallMethods[] = {
{"mgcv_pmmult2", (DL_FUNC) &mgcv_pmmult2,5},
{"mgcv_Rpiqr", (DL_FUNC) &mgcv_Rpiqr,5},
{"mgcv_tmm",(DL_FUNC)&mgcv_tmm,5},
{"mgcv_Rpbsi",(DL_FUNC)&mgcv_Rpbsi,2},
{"mgcv_RPPt",(DL_FUNC)&mgcv_RPPt,3},
{"mgcv_Rpchol",(DL_FUNC)&mgcv_Rpchol,4},
{"mgcv_Rpforwardsolve",(DL_FUNC)&mgcv_Rpforwardsolve,3},
{"mgcv_Rpcross",(DL_FUNC)&mgcv_Rpcross,3},
{"mgcv_RX",(DL_FUNC)&mgcv_RX,2}, // we add this line
{NULL, NULL, 0}
};
Step 5: complie and load
tar the modified mgcv folder to mgcv.tar.gz.
Open up a new, clean R session (possibly you need R --vanilla for start-up). Then specifying a local library path and run:
path <- getwd() ## let's just use current working directory
## make sure you move "mgcv.tar.gz" into current working path
install.packages("mgcv.tar.gz", repos = NULL, lib = path)
library(mgcv, lib.loc = path)
Step 6: test and get error
R <- matrix(runif(25), 5)
R[lower.tri(R)] <- 0
X <- matrix(runif(25), 5)
mgcv:::RX(R, X) ## function is not exported, so use `mgcv:::` to find it
# Error in .Call("C_mgcv_RX", R, X) :
# "C_mgcv_RX" not resolved from current namespace (mgcv)
Could anyone explain why and how to resolve this?
I have a temporary "fix" now. Instead of
.Call("C_mgcv_RX", R, X)
use either of the following:
.Call(mgcv:::"C_mgcv_RX", R, X)
.Call(getNativeSymbolInfo("mgcv_RX"), R, X)
I came about this because I suddenly realize that C routines can be extracted by :::, too. Since package compilation is successful, there is no way that mgcv::: can not find this C routine. And yes, it works.
To check that our defined C routine is available in the shared library loaded, try
is.loaded("mgcv_RX")
# TRUE
To list all registered C routines in the loaded shared library, use
getDLLRegisteredRoutines("mgcv")

Using a custom memory allocation function in R

I would like to be able to use my own memory allocation function for certain data structures (real valued vectors and arrays) in R. The reason for this is that I need my data to be 64bit aligned and I would like to use the numa library for having control over which memory node is used (I'm working on compute nodes with four 12-core AMD Opteron 6174 CPUs).
Now I have two functions for allocating and freeing memory: numa_alloc_onnode and numa_free (courtesy of this thread). I'm using R version 3.1.1, so I have access to the function allocVector3 (src/main/memory.c), which seems to me as the intended way of adding a custom memory allocator. I also found the struct R_allocator in src/include/R_ext
However it is not clear to me how to put these pieces together. Let's say, in R, I want the result res of an evaluation such as
res <- Y - mean(Y)
to be saved in a memory area allocated with my own function, how would I do this? Can I integrate allocVector3 directly at the R level? I assume I have to go through the R-C interface. As far as I know, I cannot just return a pointer to the allocated area, but have to pass the result as an argument. So in R I call something like
n <- length(Y)
res <- numeric(length=1)
.Call("R_allocate_using_myalloc", n, res)
res <- Y - mean(Y)
and in C
#include <R.h>
#include <Rinternals.h>
#include <numa.h>
SEXP R_allocate_using_myalloc(SEXP R_n, SEXP R_res){
PROTECT(R_n = coerceVector(R_n, INTSXP));
PROTECT(R_res = coerceVector(R_res, REALSXP));
int *restrict n = INTEGER(R_n);
R_allocator_t myAllocator;
myAllocator.mem_alloc = numa_alloc_onnode;
myAllocator.mem_free = numa_free;
myAllocator.res = NULL;
myAllocator.data = ???;
R_res = allocVector3(REALSXP, n, myAllocator);
UNPROTECT(2);
}
Unfortunately I cannot get beyond a variable has incomplete type 'R_allocator_t' compilation error (I had to remove the .data line since I have no clue as to what I should put there). Does any of the above code make sense? Is there an easier way of achieving what I want to? It seems a bit odd to have to allocate a small vector in R and the change its location in C just to be able to both control the memory allocation and have the vector available in R...
I'm trying to avoid using Rcpp, as I'm modifying a fairly large package and do not want to convert all C calls and thought that mixing different C interfaces could perform sub-optimally.
Any help is greatly appreciated.
I made some progress in solving my problem and I would like to share in case anyone else encounters a similar situation. Thanks to Kevin for his comment. I was missing the include statement he mentions. Unfortunately this was only one among many problems.
dyn.load("myAlloc.so")
size <- 3e9
myBigmat <- .Call("myAllocC", size)
print(object.size(myBigmat), units = "auto")
rm(myBigmat)
#include <R.h>
#include <Rinternals.h>
#include <R_ext/Rallocators.h>
#include <numa.h>
typedef struct allocator_data {
size_t size;
} allocator_data;
void* my_alloc(R_allocator_t *allocator, size_t size) {
((allocator_data*)allocator->data)->size = size;
return (void*) numa_alloc_local(size);
}
void my_free(R_allocator_t *allocator, void * addr) {
size_t size = ((allocator_data*)allocator->data)->size;
numa_free(addr, size);
}
SEXP myAllocC(SEXP a) {
allocator_data* my_allocator_data = malloc(sizeof(allocator_data));
my_allocator_data->size = 0;
R_allocator_t* my_allocator = malloc(sizeof(R_allocator_t));
my_allocator->mem_alloc = &my_alloc;
my_allocator->mem_free = &my_free;
my_allocator->res = NULL;
my_allocator->data = my_allocator_data;
R_xlen_t n = asReal(a);
SEXP result = PROTECT(allocVector3(REALSXP, n, my_allocator));
UNPROTECT(1);
return result;
}
For compiling the c code, I use R CMD SHLIB -std=c99 -L/usr/lib64 -lnuma myAlloc.c. As far as I can tell, this works fine. If anyone has improvements/corrections to offer, I'd be happy to include them.
One requirement from the original question that remains unresolved is the alignment issue. The block of memory returned by numa_alloc_local is correctly aligned, but other fields of the new VECTOR_SEXPREC (eg. the sxpinfo_struct header) push back the start of the data array. Is it somehow possible to align this starting point (the address returned by REAL())?
R has, in memory.c:
main/memory.c
84:#include <R_ext/Rallocators.h> /* for R_allocator_t structure */
so I think you need to include that header as well to get the custom allocator (RInternals.h merely declares it, without defining the struct or including that header)

How to write unit tests in plain C?

I've started to dig into the GLib documentation and discovered that it also offers a unit testing framework.
But how could you do unit tests in a procedural language? Or does it require to program OO in C?
Unit testing only requires "cut-planes" or boundaries at which testing can be done. It is quite straightforward to test C functions which do not call other functions, or which call only other functions that are also tested. Some examples of this are functions which perform calculations or logic operations, and are functional in nature. Functional in the sense that the same input always results in the same output. Testing these functions can have a huge benefit, even though it is a small part of what is normally thought of as unit testing.
More sophisticated testing, such as the use of mocks or stubs is also possible, but it is not nearly as easy as it is in more dynamic languages, or even just object oriented languages such as C++. One way to approach this is to use #defines. One example of this is this article, Unit testing OpenGL applications, which shows how to mock out OpenGL calls. This allows you to test that valid sequences of OpenGL calls are made.
Another option is to take advantage of weak symbols. For example, all MPI API functions are weak symbols, so if you define the same symbol in your own application, your implementation overrides the weak implementation in the library. If the symbols in the library weren't weak, you would get duplicate symbol errors at link time. You can then implement what is effectively a mock of the entire MPI C API, which allows you to ensure that calls are matched up properly and that there aren't any extra calls that could cause deadlocks. It is also possible to load the library's weak symbols using dlopen() and dlsym(), and pass the call on if necessary. MPI actually provides the PMPI symbols, which are strong, so it is not necessary to use dlopen() and friends.
You can realize many of the benefits of unit testing for C. It is slightly harder, and it may not be possible to get the same level of coverage you might expect from something written in Ruby or Java, but it's definitely worth doing.
At the most basic level, unit tests are just bits of code that execute other bits of code and tell you if they worked as expected.
You could simply make a new console app, with a main() function, that executed a series of test functions. Each test would call a function in your app and return a 0 for success or another value for failure.
I'd give you some example code, but I'm really rusty with C. I'm sure there are some frameworks out there that would make this a little easier too.
You can use libtap which provides a number of functions which can provide diagnostics when a test fails. An example of its use:
#include <mystuff.h>
#include <tap.h>
int main () {
plan(3);
ok(foo(), "foo returns 1");
is(bar(), "bar", "bar returns the string bar");
cmp_ok(baz(), ">", foo(), "baz returns a higher number than foo");
done_testing;
}
Its similar to tap libraries in other languages.
Here's an example of how you would implement multiple tests in a single test program for a given function that might call a library function.
Suppose we want to test the following module:
#include <stdlib.h>
int my_div(int x, int y)
{
if (y==0) exit(2);
return x/y;
}
We then create the following test program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <setjmp.h>
// redefine assert to set a boolean flag
#ifdef assert
#undef assert
#endif
#define assert(x) (rslt = rslt && (x))
// the function to test
int my_div(int x, int y);
// main result return code used by redefined assert
static int rslt;
// variables controling stub functions
static int expected_code;
static int should_exit;
static jmp_buf jump_env;
// test suite main variables
static int done;
static int num_tests;
static int tests_passed;
// utility function
void TestStart(char *name)
{
num_tests++;
rslt = 1;
printf("-- Testing %s ... ",name);
}
// utility function
void TestEnd()
{
if (rslt) tests_passed++;
printf("%s\n", rslt ? "success" : "fail");
}
// stub function
void exit(int code)
{
if (!done)
{
assert(should_exit==1);
assert(expected_code==code);
longjmp(jump_env, 1);
}
else
{
_exit(code);
}
}
// test case
void test_normal()
{
int jmp_rval;
int r;
TestStart("test_normal");
should_exit = 0;
if (!(jmp_rval=setjmp(jump_env)))
{
r = my_div(12,3);
}
assert(jmp_rval==0);
assert(r==4);
TestEnd();
}
// test case
void test_div0()
{
int jmp_rval;
int r;
TestStart("test_div0");
should_exit = 1;
expected_code = 2;
if (!(jmp_rval=setjmp(jump_env)))
{
r = my_div(2,0);
}
assert(jmp_rval==1);
TestEnd();
}
int main()
{
num_tests = 0;
tests_passed = 0;
done = 0;
test_normal();
test_div0();
printf("Total tests passed: %d\n", tests_passed);
done = 1;
return !(tests_passed == num_tests);
}
By redefining assert to update a boolean variable, you can continue on if an assertion fails and run multiple tests, keeping track of how many succeeded and how many failed.
At the start of each test, set rslt (the variables used by the assert macro) to 1, and set any variables that control your stub functions. If one of your stubs gets called more than once, you can set up arrays of control variables so that the stubs can check for different conditions on different calls.
Since many library functions are weak symbols, they can be redefined in your test program so that they get called instead. Prior to calling the function to test, you can set a number of state variables to control the behavior of the stub function and check conditions on the function parameters.
In cases where you can't redefine like that, give the stub function a different name and redefine the symbol in the code to test. For example, if you want to stub fopen but find that it isn't a weak symbol, define your stub as my_fopen and compile the file to test with -Dfopen=my_fopen.
In this particular case, the function to be tested may call exit. This is tricky, since exit can't return to the function being tested. This is one of the rare times when it makes sense to use setjmp and longjmp. You use setjmp before entering the function to test, then in the stubbed exit you call longjmp to return directly back to your test case.
Also note that the redefined exit has a special variable that it checks to see if you actually want to exit the program and calls _exit to do so. If you don't do this, your test program may not quit cleanly.
This test suite also counts the number of attempted and failed tests and returns 0 if all tests passed and 1 otherwise. That way, make can check for test failures and act accordingly.
The above test code will output the following:
-- Testing test_normal ... success
-- Testing test_div0 ... success
Total tests passed: 2
And the return code will be 0.
There is nothing intrinsically object-oriented about testing small pieces of code in isolation. In procedural languages you test functions and collections thereof.
If you are desperate, and you'd have to be desperate, I banged together a little C preprocessor and gmake based framework. It started as a toy, and never really grew up, but I have used it to develop and test a couple of medium sized (10,000+ line) projects.
Dave's Unit Test is minimally intrusive yet it can do some tests I had originally thought would not be possible for a preprocessor based framework (you can demand that a certain stretch of code throw a segmentation fault under certain conditions, and it will test it for you).
It is also an example of why making heavy use of the preprocessor is hard to do safely.
The simplest way of doing a unit test is to build a simple driver code that gets linked with the other code, and call each function in each case...and assert the values of the results of the functions and build up bit by bit...that's how I do it anyway
int main(int argc, char **argv){
// call some function
int x = foo();
assert(x > 1);
// and so on....
}
Hope this helps.
With C it must go further than simply implementing a framework on top of existing code.
One thing I've always done is make a testing module (with a main) that you can run little tests from to test your code. This allows you to do very small increments between code and test cycles.
The bigger concern is writing your code to be testable. Focus on small, independent functions that do not rely on shared variables or state. Try writing in a "Functional" manner (without state), this will be easier to test. If you have a dependency that can't always be there or is slow (like a database), you may have to write an entire "mock" layer that can be substituted for your database during tests.
The principle unit testing goals still apply: ensure the code under test always resets to a given state, test constantly, etc...
When I wrote code in C (back before Windows) I had a batch file that would bring up an editor, then when I was done editing and exited, it would compile, link, execute tests and then bring up the editor with the build results, test results and the code in different windows. After my break (a minute to several hours depending on what was being compiled) I could just review results and go straight back to editing. I'm sure this process could be improved upon these days :)
I use assert. It's not really a framework though.
You can write a simple minimalistic test framework yourself:
// test_framework.h
#define BEGIN_TESTING int main(int argc, char **argv) {
#define END_TESTING return 0;}
#define TEST(TEST_NAME) if (run_test(TEST_NAME, argc, argv))
int run_test(const char* test_name, int argc, char **argv) {
// we run every test by default
if (argc == 1) { return 1; }
// else we run only the test specified as a command line argument
for (int i = 1; i < argc; i++) {
if (!strcmp(test_name, argv[i])) { return 0; }
}
return 0;
}
Now in the actual test file do this:
#include test_framework.h
BEGIN_TESTING
TEST("MyPassingTest") {
assert(1 == 1);
}
TEST("MyFailingTest") {
assert(1 == 2);
}
END_TESTING
If you want to run all tests, execute ./binary without command line arguments, if you want to run just a particular test, execute ./binary MyFailingTest

Resources