R from C -- Simplest Possible Helloworld - c

What is the simplest possible C function for starting the R interpreter, passing in a small expression (eg, 2+2), and getting out the result? I'm trying to compile with MingW on Windows.

You want to call R from C?
Look at section 8.1 in the Writing R Extensions manual. You should also look into the "tests" directory (download the source package extract it and you'll have the tests directory). A similar question was previously asked on R-Help and here was the example:
#include <Rinternals.h>
#include <Rembedded.h>
SEXP hello() {
return mkString("Hello, world!\n");
}
int main(int argc, char **argv) {
SEXP x;
Rf_initEmbeddedR(argc, argv);
x = hello();
return x == NULL; /* i.e. 0 on success */
}
The simple example from the R manual is like so:
#include <Rembedded.h>
int main(int ac, char **av)
{
/* do some setup */
Rf_initEmbeddedR(argc, argv);
/* do some more setup */
/* submit some code to R, which is done interactively via
run_Rmainloop();
A possible substitute for a pseudo-console is
R_ReplDLLinit();
while(R_ReplDLLdo1() > 0) {
add user actions here if desired
}
*/
Rf_endEmbeddedR(0);
/* final tidying up after R is shutdown */
return 0;
}
Incidentally, you might want to consider using Rinside instead: Dirk provides a nice "hello world" example on the project homepage.
In you're interested in calling C from R, here's my original answer:
This isn't exactly "hello world", but here are some good resources:
Jay Emerson recently gave a talk on R package development at the New York useR group, and he provided some very nice examples of using C from within R. Have a look at the paper from this discussion on his website, starting on page 9. All the related source code is here: http://www.stat.yale.edu/~jay/Rmeetup/MyToolkitWithC/.
The course taught at Harvard by Gopi Goswami in 2005: C-C++-R (in Statistics). This includes extensive examples and source code.

Here you go. It's the main function, but you should be able to adapt it to a more general purpose function. This example builds an R expression from C calls and also from a C string. You're on your own for the compiling on windows, but I've provided compile steps on linux:
/* simple.c */
#include <Rinternals.h>
#include <Rembedded.h>
#include <R_ext/Parse.h>
int
main(int argc, char *argv[])
{
char *localArgs[] = {"R", "--no-save","--silent"};
SEXP e, tmp, ret;
ParseStatus status;
int i;
Rf_initEmbeddedR(3, localArgs);
/* EXAMPLE #1 */
/* Create the R expressions "rnorm(10)" with the R API.*/
PROTECT(e = allocVector(LANGSXP, 2));
tmp = findFun(install("rnorm"), R_GlobalEnv);
SETCAR(e, tmp);
SETCADR(e, ScalarInteger(10));
/* Call it, and store the result in ret */
PROTECT(ret = R_tryEval(e, R_GlobalEnv, NULL));
/* Print out ret */
printf("EXAMPLE #1 Output: ");
for (i=0; i<length(ret); i++){
printf("%f ",REAL(ret)[i]);
}
printf("\n");
UNPROTECT(2);
/* EXAMPLE 2*/
/* Parse and eval the R expression "rnorm(10)" from a string */
PROTECT(tmp = mkString("rnorm(10)"));
PROTECT(e = R_ParseVector(tmp, -1, &status, R_NilValue));
PROTECT(ret = R_tryEval(VECTOR_ELT(e,0), R_GlobalEnv, NULL));
/* And print. */
printf("EXAMPLE #2 Output: ");
for (i=0; i<length(ret); i++){
printf("%f ",REAL(ret)[i]);
}
printf("\n");
UNPROTECT(3);
Rf_endEmbeddedR(0);
return(0);
}
Compile steps:
$ gcc -I/usr/share/R/include/ -c -ggdb simple.c
$ gcc -o simple simple.o -L/usr/lib/R/lib -lR
$ LD_LIBRARY_PATH=/usr/lib/R/lib R_HOME=/usr/lib/R ./simple
EXAMPLE #1 Output: 0.164351 -0.052308 -1.102335 -0.924609 -0.649887 0.605908 0.130604 0.243198 -2.489826 1.353731
EXAMPLE #2 Output: -1.532387 -1.126142 -0.330926 0.672688 -1.150783 -0.848974 1.617413 -0.086969 -1.334659 -0.313699

I don't think any of the above has answered the question - which was to evaluate 2 + 2 ;). To use a string expression would be something like:
#include <Rinternals.h>
#include <R_ext/Parse.h>
#include <Rembedded.h>
int main(int argc, char **argv) {
SEXP x;
ParseStatus status;
const char* expr = "2 + 2";
Rf_initEmbeddedR(argc, argv);
x = R_ParseVector(mkString(expr), 1, &status, R_NilValue);
if (TYPEOF(x) == EXPRSXP) { /* parse returns an expr vector, you want the first */
x = eval(VECTOR_ELT(x, 0), R_GlobalEnv);
PrintValue(x);
}
Rf_endEmbeddedR(0);
return 0;
}
This lacks error checking, obviously, but works:
Z:\>gcc -o e.exe e.c -IC:/PROGRA~1/R/R-213~1.0/include -LC:/PROGRA~1/R/R-213~1.0/bin/i386 -lR
Z:\>R CMD e.exe
[1] 4
(To get the proper commands for your R use R CMD SHLIB e.c which gives you the relevant compiler flags)
You can also construct the expression by hand if it's simple enough - e.g., for rnorm(10) you would use
SEXP rnorm = install("rnorm");
SEXP x = eval(lang2(rnorm, ScalarInteger(10)), R_GlobalEnv);

I think you can't do much better than the inline package (which supports C, C++ and Fortran):
library(inline)
fun <- cfunction(signature(x="ANY"),
body='printf("Hello, world\\n"); return R_NilValue;')
res <- fun(NULL)
which will print 'Hello, World' for you. And you don't even know where / how / when the compiler and linker are invoked. [ The R_NilValue is R's NULL version of a SEXP and the .Call() signature used here requires that you return a SEXP -- see the 'Writing R Extensions' manual which you can't really avoid here. ]
You will then take such code and wrap it in a package. We had great success with using
inline for the
Rcpp unit tests (over 200 and counting now) and some of the examples.
Oh, and this inline example will work on any OS. Even Windoze provided you have the R package building tool chain installed, in the PATH etc pp.
Edit: I misread the question. What you want is essentially what the littler front-end does (using pure C) and what the RInside classes factored-out for C++.
Jeff and I never bothered with porting littler to Windoze, but RInside did work there in most-recent release. So you should be able to poke around the build recipes and create a C-only variant of RInside so that you can feed expression to an embedded R process. I suspect that you still want something like Rcpp for the clue as it gets tedious otherwise.
Edit 2: And as Shane mentions, there are indeed a few examples in the R sources in tests/Embedding/ along with a Makefile.win. Maybe that is the simplest start if you're willing to learn about R internals.

Related

LLVM C-API Lifecycles of core objects

I've started playing with LLVM, making a pet language. I'm using the C-API. I have a parser and basic AST, but I am at a bit of a road block with LLVM.
The following is a minified version of my code to illustrate my current issue:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "llvm-c/Core.h"
#include "llvm-c/ExecutionEngine.h"
#include "llvm-c/Target.h"
#include "llvm-c/Analysis.h"
#include "llvm-c/BitWriter.h"
static LLVMModuleRef mod;
static LLVMBuilderRef builder;
static LLVMExecutionEngineRef engine;
typedef struct oper_t {
const char * name;
LLVMTypeRef args[2];
LLVMTypeRef ret;
LLVMValueRef val;
} oper_t;
#define NUM_OPER 2
static oper_t oper[NUM_OPER] = {
{ .name = "function1" },
{ .name = "function2" },
};
void codegen_init(const char * filename)
{
char *error;
mod = LLVMModuleCreateWithName(filename);
builder = LLVMCreateBuilder();
error = NULL;
LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
if(error) printf("LLVM init Verify message \"%s\"\n", error);
LLVMDisposeMessage(error);
error = NULL;
LLVMLinkInMCJIT();
LLVMInitializeNativeTarget();
LLVMInitializeNativeAsmPrinter();
if (LLVMCreateExecutionEngineForModule(&engine, mod, &error) != 0)
{
fprintf(stderr, "LLVM failed to create execution engine\n");
abort();
}
if(error)
{
printf("LLVM Execution Engine message %s\n", error);
LLVMDisposeMessage(error);
exit(EXIT_FAILURE);
}
}
int runOper(oper_t * o, long a, long b)
{
LLVMValueRef v, l, r;
o->args[0] = LLVMInt32Type();
o->args[1] = LLVMInt32Type();
o->ret = LLVMFunctionType(LLVMInt32Type(), o->args, 2, 0);
o->val = LLVMAddFunction(mod, o->name, o->ret);
LLVMBasicBlockRef entry = LLVMAppendBasicBlock(o->val, "entry");
LLVMPositionBuilderAtEnd(builder, entry);
l = LLVMConstInt(LLVMInt32Type(), a, 0);
r = LLVMConstInt(LLVMInt32Type(), b, 0);
v = LLVMBuildAdd(builder, l, r, "add");
LLVMBuildRet(builder, v);
char *error = NULL;
LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
if(error) printf("LLVM func Verify message \"%s\"\n", error);
LLVMDisposeMessage(error);
LLVMGenericValueRef g = LLVMRunFunction(engine, o->val, 0, NULL);
printf("LLVM func executed without crash\n");
LLVMDeleteFunction(o->val);
return (long)LLVMGenericValueToInt(g, 1);
}
int main(int argc, char const *argv[])
{
long val;
codegen_init("test");
val = runOper(&oper[0], 3, 4);
printf("3 + 4 is %ld\n", val);
val = runOper(&oper[1], 6, 7);
printf("6 + 7 is %ld\n", val);
}
I can compile this using the command:
gcc test.c `llvm-config --cflags --cppflags --ldflags --libs core executionengine mcjit interpreter analysis native bitwriter --system-libs` -o test.exe
Or alternatively I've also tried:
gcc `llvm-config --cflags --cppflags` -c test.c
g++ test.o `llvm-config --cxxflags --ldflags --libs core executionengine mcjit interpreter analysis native bitwriter --system-libs` -o test.exe
Either way I get this result:
$ ./test.exe
LLVM init Verify message ""
LLVM func Verify message ""
LLVM func executed without crash
3 + 4 is 7
LLVM func Verify message ""
Segmentation fault
I've also tried using clang just for good measure.
Clearly I am misusing the LLVM C-API. I'm struggling mostly to get some understanding of when the API functions are safe to call, and also when can I safely free/delete the memory referenced by LLVM. For instance the LLVMTypeRef args[2] parameter, I see in the LLVM C-API source code for LLVMFunctionType that it is creating an ArrayRef to the args parameter. This means I must hang onto the args parameter until LLVM is done with it. I can't really tell when that is exactly. (I plan to allocate this memory on the heap)
Stated simply, I'd like it if someone could not just explain what I am doing wrong in this example, but more fundamentally explain how I should figure out what I am doing wrong.
The LLVM C-API docs gives a great breakdown of the functions available in the API, but I haven't found it to give much description of how API functions should be called, ie. what order is safe/expected.
I have also found this documentation to be helpful, as it can be easily searched for individual function prototypes. But again it gives no context or examples of how to use the C-API.
Finally I have to reference Paul Smith's Blog, it's a bit outdated now, but is definitely the reason I got this far.
P.S. I don't expect everything to be spelled out for me, I just want advise on how to self-learn LLVM
The basic design is most easily understood in C++: If you pass a pointer to an object y as a constructor argument, ie. x=new Foo(…, y, …), then y has to live longer than x. This also applies to wrappers such as CallInst::Create() and ConstantInt::get(), both of which take pointers to objects and return constructed objects.
But there's more. Some objects assume ownership of the constructed objects, so that you aren't permitted to delete the constructed object at all. You are for example not allowed to delete what ConstantInt::get() returns. As a general rule, anything that's called create… in the C++ API returns something you may delete and anything called get… returns something that's owned by another LLVM object. I'm sure there are exceptions.
You may find it helpful to build a debug version of LLVM, unless you're much smarter than I. The extra assertions are great.

Use gcc to get all functions from *.c/*.h-file [duplicate]

I want to do this:
extract_prototypes file1.c file2.cpp file3.c
and have whatever script/program print a nice list of function prototypes for all functions defined in the given C / C++ files. It must handle multi-line declarations nicely.
Is there a program that can do this job? The simpler the better.
EDIT: after trying to compile two C programs, bonus points for something that uses {perl, python, ruby}.
I use ctags
# p = function declaration, f = function definition
ctags -x --c-kinds=fp /usr/include/hal/libhal.h
Also works with C++
ctags -x --c++-kinds=pf --language-force=c++ /usr/include/c++/4.4.1/bits/deque.tcc
Note, you may need to add include paths, do this using the -I /path/to/includes.
The tool cproto does what you want and allows to tune the output to your requirements.
Note: This tool also only works for C files.
I use ctags and jq
ctags --output-format=json --totals=no --extras=-F --fields=nP file1.c |
jq -sr 'sort_by(.line) | .[].pattern | ltrimstr("/^") | rtrimstr("$/") | . + ";"'
If you have universal-ctags (https://ctags.io), --_xformat option may be useful though you need sed and tr commands to get what you want.
$ cat input.c
struct object *new_object (struct
/* COMMENT */
param
/* IGNORE ME */
*p)
{
return NULL;
}
int main (void)
{
return 0;
}
$ ./ctags -o - --kinds-C=f --kinds-C++=f -x --_xformat='%{typeref} %{name} %{signature};' input.c | tr ':' ' ' | sed -e 's/^typename //'
struct object * new_object (struct param * p);
int main (void);
$
This is similar to the answer posted by Steve Ward but this one requires sed, and tr instead of jq.
http://cfunctions.sourceforge.net
(This only does C and a limited subset of C++. Disclaimer: this is my program.)
I used to use doxygen to generate documentation for my C++ code. I am not an expert, but i think you can use doxygen to generate some sort of index file of the function prototypes.
Here is a thread of someone asking a similar question
gccxml is interesting, but it print a xml tree. You need to extract information about class, functions, types, and even the specialized templates of class and functions. gccxml use parser of GCC, so you don't need to do the worst job wich is parsing C++ file, and you are 100% sure that it's what probably the best compilator understand.
If you format your comments suitably, you could try DOxygen. In fact, if you've not tried it before I'd recommend giving it a go anyway - it will produce inheritance graphs as well as full member function lists and descriptions (from your comments).
In more modern versions of GCC, you can also use -aux-info to get this information when writing C code. See here.
Here's a sample of what the output looks like:
/* src/main.c:30:NC */ static void usage (const char *);
/* src/main.c:32:NF */ extern int main (int argc, char **argv); /* (argc, argv) int argc; char **argv; */
/* src/main.c:57:NF */ static void usage (const char *prog_name); /* (prog_name) const char *prog_name; */
gcc-xml might help, although as it is, it only does half the job you want. You'll need some processing of the XML output
You can run the source file through this program:
/* cproto_parser.c */
#include <stdio.h>
int main (void)
{
int c;
int infundef = 0;
int nb = 0,
np = 0;
while((c=getc(stdin))!=EOF){
if(c=='{'){
if((np==0)&&(nb==0)){infundef=1;}
nb++;
}
if (infundef==0) {putc(c,stdout);}
if(c=='}'){
if((np==0)&&(nb==1)){infundef=0;}
nb--;
}
if(c=='('){np++;}
if(c==')'){np--;}
}
return 0;
}
Run through the preprocessor to get rid of comments. If you have unmatched braces due to #ifdefs you have to set defines, include files to make it not so.
e.g., cc cproto_parser.c -o cproto_parser; cc -E your_source_file.c|./cproto_parser

Get the FUSE version string

Is there a function that returns the FUSE version string?
fuse_common.h has int fuse_version(void), which returns the major version, multiplied by 10, plus the minor version; both of which are derived from #define values. (e.g., This returns 27 on my platform). What I'm looking for, however, is some char* fuse_version(void) that would return something like 2.7.3.
As you said yourself, the version is defined in fuse_common.h. If you don't want to use helper_version, as #Alexguitar said you may just write a small program that does it -- but it seems that only the two first numbers (major and minor) are available:
#include <fuse/fuse.h>
#include <stdlib.h>
#include <stdio.h>
char* str_fuse_version(void) {
static char str[10] = {0,0,0,0,0,0,0,0,0,0};
if (str[0]==0) {
int v = fuse_version();
int a = v/10;
int b = v%10;
snprintf(str,10,"%d.%d",a,b);
}
return str;
}
int main () {
printf("%s\n", str_fuse_version());
exit(EXIT_SUCCESS);
}
Note: you should include fuse/fuse.h and not fuse_common.h; also, you may need to pass -D_FILE_OFFSET_BITS=64 when compiling.
$ gcc -Wall fuseversiontest.c -D_FILE_OFFSET_BITS=64 -lfuse
$ ./a.out
2.9
In the source code of fuse in include/config.h you have:
/* Define to the version of this package. */
#define PACKAGE_VERSION "2.9.4"
Additionally, there's a function in lib/helper.c that prints it.
static void helper_version(void)
{
fprintf(stderr, "FUSE library version: %s\n", PACKAGE_VERSION);
}
Edit:
I do realize that the package versioning strings are only for internal use so you're probably stuck with the major and minor numbers exposed by fuse_common.h . You'll probably have to write a function like #Jay suggests.

Treat functions by name

Suppose you created a main() to deal with an exercise you asked your students.
Every student is supposed to write their own function, with the same API. And a single file will be created, with all functions and the main calling them.
Lets say: int studentname(int a, int b) is the function pattern.
One way I deal with it was using a vector of pointer to functions int (*func[MAX])(). But you need to fulfill the vector one by one func[0]=studentname;.
I wonder, is there a way a function can be called by its name somehow?
Something like: int student1(int a , int b), student2(), etc.
And in main somehow we could just call sscanf(funcname,"student%d",i); funcname();.
Do you have any other idea? Maybe
int studentname(int a, int b, char *fname)
{
strcpy(fname, "studentname");
Anything creative will do! :)
Thanks!
Beco
PS. I tried just a vector of functions, but C won't allow me! :)
int func[2]()={{;},{;}};
This way I could just give to each student a number, and voilá... But no way. It was funny though.
Edited: I'm using linux.
Edited 2: Thanks! I've accepted an answer that helped me, but I've also documented a complete example as an answer bellow.
Maybe a bit overcomplicating it, but spontaneous idea:
Compile all student source files into one shared library with the students' functions being exports.
Then enumerate all exposed functions, call and test them.
As an alternative:
Write a small tool that will compile all "student units" using a preprocessor define to replace a predefined function name with an unique name ("func1", "func2", etc.).
Then let the tool write a small unit calling all these functions while performing tests, etc.
And yet another idea:
Use C++ to write a special class template that's going to register derived classes in a object factory and just embed student's code using extern "C". Depending on the implementation this might look a bit confusing and overcomplicated though.
Then use the factory to create one instance of each and run the code.
Example for the approach with dlopen() and dlsym() (whether only one function per library or all - doesn't matter):
void *pluginlib = dlopen("student1.so", RTLD_NOW); // RTLD_NOW will load the file right away
if (!pluginlib)
; // failed to load
studentproc func = (studentproc)dlsym(pluginlib, "student1"); // this loads the function called "student1"
if (!func)
; // failed to resolve
func("hello world!"); // call the lib
dlclose(pluginlib); // unloads the dll (this will make all further calls invalid)
Similar to what #Jamey-Sharp proposed:
ask each student to provide .c file with entry function of a given name/signature
compile each .c into a shared library, named by the student name, or given whatever unique name. This step can be easily automated with make or simple script.
make a simple host application which enumerates all .so files in a given directory, and uses dlopen() and dlsym() to get to the entry point function.
now you can simply call each student's implementation.
BTW, that's how plug-ins are implemented usually, isn't it?
Edit: Here's a working proof of concept (and a proof, that each student can use the same name of the entry point function).
Here's student1.c:
#include <stdio.h>
void student_task()
{
printf("Hello, I'm Student #1\n");
}
Here's student2.c:
#include <stdio.h>
void student_task()
{
printf("Hello, I'm Student #2\n");
}
And here's the main program, tester.c:
#include <stdio.h>
#include <dlfcn.h>
/* NOTE: Error handling intentionally skipped for brevity!
* It's not a production code!
*/
/* Type of the entry point function implemented by students */
typedef void (*entry_point_t)(void);
/* For each student we have to store... */
typedef struct student_lib_tag {
/* .. pointer to the entry point function, */
entry_point_t entry;
/* and a library handle, so we can play nice and close it eventually */
void* library_handle;
} student_solution_t;
void load(const char* lib_name, student_solution_t* solution)
{
/* Again - all error handling skipped, I only want to show the idea! */
/* Open the library. RTLD_LOCAL is quite important, it keeps the libs separated */
solution->library_handle = dlopen(lib_name, RTLD_NOW | RTLD_LOCAL);
/* Now we ask for 'student_task' function. Every student uses the same name.
* strange void** is needed for C99, see dlsym() manual.
*/
*(void**) (&solution->entry) = dlsym(solution->library_handle, "student_task");
/* We have to keep the library open */
}
int main()
{
/* Two entries hardcoded - you need some code here that would scan
* the directory for .so files, allocate array dynamically and load
* them all.
*/
student_solution_t solutions[2];
/* Load both solutions */
load("./student1.so", &solutions[0]);
load("./student2.so", &solutions[1]);
/* Now we can call them both, despite the same name of the entry point function! */
(solutions[0].entry)();
(solutions[1].entry)();
/* Eventually it's safe to close the libs */
dlclose(solutions[0].library_handle);
dlclose(solutions[1].library_handle);
return 0;
}
Let's compile it all:
czajnik#czajnik:~/test$ gcc -shared -fPIC student1.c -o student1.so -Wall
czajnik#czajnik:~/test$ gcc -shared -fPIC student2.c -o student2.so -Wall
czajnik#czajnik:~/test$ gcc tester.c -g -O0 -o tester -ldl -Wall
And see it works:
czajnik#czajnik:~/test$ ./tester
Hello, I'm Student #1
Hello, I'm Student #2
I'd take a different approach:
Require every student to use the same function name, and place each student's code in a separate source file.
Write one more source file with a main that calls the standard name.
Produce a separate executable from linking main.c with student1.c, then main.c with student2.c, and so on. You might be able to use wildcards in a makefile or shell script to automate this.
That said, at least on Unix-like OSes, you can do what you asked for.
Call dlopen(NULL) to get a handle on the symbols in the main program.
Pass that handle and the function name you want to dlsym. Coerce the resulting pointer to a function pointer of the right type, and call it.
Here is an ugly preprocessor hack:
#Makefile
FILE_NAME=student
${FILE_NAME}: main.c
cc -Wall -DFILE_NAME=\"${FILE_NAME}.c\" -o $# main.c -lm
Teacher's main.c:
#include <math.h>
#include <stdio.h>
#include FILE_NAME
char *my_name(void);
double my_sin(double val);
int main(void)
{
double dd;
dd = my_sin(3.1415923563);
printf("%s: %f\n", my_name(), dd);
return 0;
}
Student's .c File:
#include <math.h>
char * my_name(void);
double my_sin(double val);
char * my_name(void)
{
return "Wildplasser-1.0";
}
double my_sin(double val)
{
return sin (val);
}
The trick lies i the literal inclusion of the student's .c file.
To avoid this, you could also use a different make line, like:
cc -Wall -o $# ${FILE_NAME}.c main.c -lm
(and remove the ugly #include FILENAME, of course)
Thanks you all. I've accepted an answer that gave me the inspiration to solve the question. Here, just to document it, is my complete solution:
File shamain.c
/* Uses shared library shalib.so
* Compile with:
* gcc shamain.c -o shamain -ldl -Wall
*/
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(void)
{
void *libstud;
int (*student[2])(int, int);
char fname[32];
int i,r;
libstud = dlopen("./shalib.so", RTLD_NOW);
if (!libstud)
{
fprintf(stderr, "error: %s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
for(i=0; i<2; i++)
{
sprintf(fname, "func%d", i);
*(void **) (&student[i]) = dlsym(libstud, fname); /* c99 crap */
//student[i] = (int (*)(int, int)) dlsym(libstud, fname); /* c89 format */
}
for(i=0; i<2; i++)
{
r=student[i](i, i);
printf("i=%d,r=%d\n", i, r);
}
return 0;
}
File shalib.c
/* Shared library.
* Compile with:
* gcc -shared -fPIC shalib.c -o shalib.so -Wall
*/
#include <stdio.h>
int func0(int one, int jadv)
{
printf("%d = Smith\n", one);
return 0;
}
int func1(int one, int jadv)
{
printf("%d = John\n", one);
return 0;
}
It is a while since I have used shared libraries, but I have a feeling you can extract named functions from a DLL/shlib. Could you create a DLL/shared library containing all of the implementations and then access them by name from the main?
Per #william-morris's suggestion, you might have luck using dlsym() to do a dynamic lookup of the functions. (dlsym() may or may not be the library call to use on your particular platform.)

Dynamic obfuscation by self-modifying code

Here what's i am trying to do:
assume you have two fonction
void f1(int *v)
{
*v = 55;
}
void f2(int *v)
{
*v = 44;
}
char *template;
template = allocExecutablePages(...);
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror (“mprotect”);
}
}
I would like to do a comparison between f1 and f2 (so tell what is identical and what is not) (so get the assembly lines of those function and make a line by line comparison)
And then put those line in my template.
Is there a way in C to do that?
THanks
Update
Thank's for all you answers guys but maybe i haven't explained my need correctly.
basically I'm trying to write a little obfuscation method.
The idea consists in letting two or more functions share the same location in memory. A region of memory (which we will call a template) is set up containing some of the
machine code bytes from the functions, more specifically, the ones they all
have in common. Before a particular function is executed, an edit script is used
to patch the template with the necessary machine code bytes to create a
complete version of that function. When another function assigned to the same
template is about to be executed, the process repeats, this time with a
different edit script. To illustrate this, suppose you want to obfuscate a
program that contains two functions f1 and f2. The first one (f1) has the
following machine code bytes
Address Machine code
0 10
1 5
2 6
3 20
and the second one (f2) has
Address Machine code
0 10
1 9
2 3
3 20
At obfuscation time, one will replace f1 and f2 by the template
Address Machine code
0 10
1 ?
2 ?
3 20
and by the two edit scripts e1 = {1 becomes 5, 2 becomes 6} and e2 = {1
becomes 9, 2 becomes 3}.
#include <stdlib.h>
#include <string.h>
typedef unsigned int uint32;
typedef char * addr_t;
typedef struct {
uint32 offset;
char value;
} EDIT;
EDIT script1[200], script2[200];
char *template;
int template_len, script_len = 0;
typedef void(*FUN)(int *);
int val, state = 0;
void f1_stub ()
{
if (state != 1) {
patch (script1, script_len, template);
state = 1;
}
((FUN)template)(&val);
}
void f2_stub () {
if (state != 2) {
patch (script2, script_len, template);
state = 2;
}
((FUN)template)(&val);
}
int new_main (int argc, char **argv)
{
f1_stub ();
f2_stub ();
return 0;
}
void f1 (int *v) { *v = 99; }
void f2 (int *v) { *v = 42; }
int main (int argc, char **argv)
{
int f1SIZE, f2SIZE;
/* makeCodeWritable (...); */
/* template = allocExecutablePages(...); */
/* Computed at obfuscation time */
diff ((addr_t)f1, f1SIZE,
(addr_t)f2, f2SIZE,
script1, script2,
&script_len,
template,
&template_len);
/* We hide the proper code */
memset (f1, 0, f1SIZE);
memset (f2, 0, f2SIZE);
return new_main (argc, argv);
}
So i need now to write the diff function. that will take the addresses of my two function and that will generate a template with the associated script.
So that is why i would like to compare bytes by bytes my two function
Sorry for my first post who was not very understandable!
Thank you
Do you want to do this at runtime or during authorship?
You can probably instruct your C compiler to produce assembly language output, for example gcc has the -S option which will produce output in file.s Your compiler suite may also have a program like objdump which can decompile an object file or entire executable. However, you generally want to leave optimizations up to a modern compiler rather than do it yourself.
At runtime the & operator can take the address of a function and you can read through it, though you have to be prepared for the possibility of encountering a branch instruction before anything interesting, so you actually have to programatically "understand" at least a subset of the instruction set. What you will run into when reading function pointers will of course vary all over the place by machine, ABI, compiler, optimization flags, etc.
Put the functions into t1.c and t2.c use gcc -S to generate assembly output:
gcc -S t1.c
gcc -S t2.c
Now compare t1.s and t2.s.
If you are using Visual Studio, go to
Project Properties -> Configuration -> C/C++ -> Output Files -> Assembler output
or use compiler switches /FA, /FAc, /FAs, /FAcs. Lower-case c means output machine code, s-source code side-by-side with assembly code. And don't forget to disable compiler optimizations.
Having read through some of the answers and the comments there, I'm not sure I fully understand your question, but maybe you're looking for a gcc invocation like the following:
gcc -S -xc - -o -
This tells gcc to input C code from stdin and output assembly to stdout.
If you use a vi-like editor, you can highlight the function body in visual mode and then run the command:
:'<,'>!gcc -S -xc - -o - 2> /dev/null
...and this will replace the function body with assembly (the "stderr > /dev/null" business is to skip errors about #include's).
You could otherwise use this invocation of gcc as part of a pipeline in a script.

Resources