I have this program that I want other processes to be able to call functions on (through unix sockets). The message protocol is very simple, the function name, a function signature, and a buffer (char *) that holds the parameters.
When a module in my program wants to allow a function to be accessible, it registers the name and signature with the library. The problem I'm facing is with physically calling the function once the request comes in. I have looked at RPC and java RMI-like libraries, but those require that I generate stubs to wrap calls. The system I am working on is very dynamic and I also have to interface with other peoples code that I can't modify.
So basically, a function might look like:
int somefunc(int someparam, double another)
{
return 1234;
}
now I register with the library:
// func ptr name signature
REG_FUNCTION(somefunc, "somefunc", "i:id");
When the request comes in, I do some error checking, once valid I want to call the function. So I have the variables:
void * funcptr = (the requested function);
char * sig = (the function signature);
char * params = (a buffer of function parameters);
//note that the params buffer can hold data types of arbitrary lengths
How can I call the function with the parameters in C?
Thanks!
I don't think this is completely solvable in general, using only C. You don't know the calling convention used by the target function, for instance. There is a risk that you end up "cheating" the compiler, or at least having to second-guess it. What if the compiler decided to build the registered function using arguments passed in registers, perhaps due to some optimization setting (or what if it was built with a different compiler?).
There's also no way in general to express in C that you want to call a function with a given set of arguments, and that the values for the arguments need to be unpacked from a buffer of random bytes.
You could do something horrible like this:
enum { VOID_INT, VOID_INT_DOUBLE, VOID_DOUBLE_INT, ... } Signature;
void do_call(const void *userfunction, const void *args, Signature sig)
{
switch(signature)
{
case VOID_INT:
{
int x = *(int *) args;
void (*f)(int) = userfunction;
f(x);
break;
}
case VOID_INT_DOUBLE:
...
}
}
But it's quite clear that this doesn't live on the same continent as scalability. You could of course auto-generate this code, for some reasonable set of return types and argument types. You'd still be a bit out in the cold for casting function pointers to/from void *, but that might be acceptable. You'd still always run the risk of someone handing you a signature that you don't have pre-generated code for, though.
Check out libffi. This library allows to call function with a set of parameters specified at runtime.
Related
In my C program, I need to pass a callback function to a 3rd party library. The library calls this callback with a few arguments. However, I need to expose one more variable to this callback. This variable is accessible in the scope of the function from within I'm setting the callback.
In JS, I'd easily solve this with .bind(), more or less like this (pseudocode):
func my_callback(int a, int b) {
printf("A: %d\n", a);
printf("B: %d\n", b);
}
func new_instance() {
int a = 1;
// setup_callback expects (void(*)(int b)) as an argument
setup_callback(my_callback.bind(null, a));
}
There are two main restrictions:
Variable a cannot be global - it is declared and initialized inside the new_instance procedure.
It has to be a C solution (ideally ANSI), not C++.
After playing with it (function pointers) for a while, I don't seem to be any closer to a workable solution...
You are asking for a closure, and closures are not a feature of the C language. Since you're not in control of the callback-using API, there is no workaround to allow you to get an additional local variable through to the callback.
If you were in control of the API, then it would be possible to make the callback interface accept an additional parameter along with the function, to be stored alongside it and be passed to it as an argument when the function is called. Typically a parameter of type void * is used for such a purpose, as you can convey data of any type that way.
Even then, however, you need to be careful. You could provide the value of a local variable that way, but if you provided a pointer to a local variable then that would only be usable during the lifetime of that variable, which is not longer than the execution of the function containing it (and would be shorter under some circumstances).
C does not support that, period. What can be done instead is to store the "bound" variable in a scope accessible to the callback and expose it to the code that would create the closure were it's available.
It has the drawbacks of storing state, thus it might get messy when sharing data e.g. in multithreaded contexts.
Should it suffice? Depends on context, but bear in mind this is no function object by any means, it is merely a stored state.
Can it be improved? Depends, but if the b parameter passed by the API using the callback can utilized to somehow identify its caller, then maybe some container of such callback states can be created to separate them from each other.
The dead-simple solution would be sth like this. Again, I cannot tell it this suffices or not.
/* callback.c */
#include "callback.h"
static int first_arg;
/* May or may not be static.
I chose static for simplicity...
*/
static void my_callback(int a, int b)
{
/* whatever */
}
void api_compliant_callback(int b)
{
my_callback(first_arg, b);
}
void set_first_arg(int val)
{
first_arg = val;
}
/* callback.h */
#ifndef CALLBACK_H
#define CALLBACK_H
void set_first_arg(int);
void api_compliant_callback(int);
#endif /* CALLBACK_H */
/* file where the callbacks are registered */
#include "callback.h"
void fkn()
{
/* whatever */
set_first_arg(42);
setup_callback(api_compliant_callback);
/* anything that goes after */
}
I previously asked a question about C functions which take an unspecified number of parameters e.g. void foo() { /* code here */ } and which can be called with an unspecified number of arguments of unspecified type.
When I asked whether it is possible for a function like void foo() { /* code here */ } to get the parameters with which it was called e.g. foo(42, "random") somebody said that:
The only you can do is to use the calling conventions and knowledge of the architecture you are running at and get parameters directly from the stack. source
My question is:
If I have this function
void foo()
{
// get the parameters here
};
And I call it: foo("dummy1", "dummy2") is it possible to get the 2 parameters inside the foo function directly from the stack?
If yes, how? Is it possible to have access to the full stack? For example if I call a function recursively, is it possible to have access to each function state somehow?
If not, what's the point with the functions with unspecified number of parameters? Is this a bug in the C programming language? In which cases would anyone want foo("dummy1", "dummy2") to compile and run fine for a function which header is void foo()?
Lots of 'if's:
You stick to one version of a compiler.
One set of compiler options.
Somehow manage to convince your compiler to never pass arguments in registers.
Convince your compiler not to treat two calls f(5, "foo") and f(&i, 3.14) with different arguments to the same function as error. (This used to be a feature of, for example, the early DeSmet C compilers).
Then the activation record of a function is predictable (ie you look at the generated assembly and assume it will always be the same): the return address will be there somewhere and the saved bp (base pointer, if your architecture has one), and the sequence of the arguments will be the same. So how would you know what actual parameters were passed? You will have to encode them (their size, offset), presumably in the first argument, sort of what printf does.
Recursion (ie being in a recursive call makes no difference) each instance has its activation record (did I say you have to convince your compiler never optimise tail calls?), but in C, unlike in Pascal, you don't have a link backwards to the caller's activation record (ie local variables) since there are no nested function declarations. Getting access to the full stack ie all the activation records before the current instance is pretty tedious, error prone and mostly interest to writers of malicious code who would like to manipulate the return address.
So that's a lot of hassle and assumptions for essentially nothing.
Yes you can access passed parameters directly via stack. But no, you can't use old-style function definition to create function with variable number and type of parameters. Following code shows how to access a param via stack pointer. It is totally platform dependent , so i have no clue if it going to work on your machine or not, but you can get the idea
long foo();
int main(void)
{
printf( "%lu",foo(7));
}
long foo(x)
long x;
{
register void* sp asm("rsp");
printf("rsp = %p rsp_ value = %lx\n",sp+8, *((long*)(sp + 8)));
return *((long*)(sp + 8)) + 12;
}
get stack head pointer (rsp register on my machine)
add the offset of passed parameter to rsp => you get pointer to long x on stack
dereference the pointer, add 12 (do whatever you need) and return the value.
The offset is the issue since it depends on compiler, OS, and who knows on what else.
For this example i simple checked checked it in debugger, but if it really important for you i think you can come with some "general" for your machine solution.
If you declare void foo(), then you will get a compilation error for foo("dummy1", "dummy2").
You can declare a function that takes an unspecified number of arguments as follows (for example):
int func(char x,...);
As you can see, at least one argument must be specified. This is so that inside the function, you will be able to access all the arguments that follow the last specified argument.
Suppose you have the following call:
short y = 1000;
int sum = func(1,y,5000,"abc");
Here is how you can implement func and access each of the unspecified arguments:
int func(char x,...)
{
short y = (short)((int*)&x+1)[0]; // y = 1000
int z = (int )((int*)&x+2)[0]; // z = 5000
char* s = (char*)((int*)&x+3)[0]; // s[0...2] = "abc"
return x+y+z+s[0]; // 1+1000+5000+'a' = 6098
}
The problem here, as you can see, is that the type of each argument and the total number of arguments are unknown. So any call to func with an "inappropriate" list of arguments, may (and probably will) result in a runtime exception.
Hence, typically, the first argument is a string (const char*) which indicates the type of each of the following arguments, as well as the total number of arguments. In addition, there are standard macros for extracting the unspecified arguments - va_start and va_end.
For example, here is how you can implement a function similar in behavior to printf:
void log_printf(const char* data,...)
{
static char str[256] = {0};
va_list args;
va_start(args,data);
vsnprintf(str,sizeof(str),data,args);
va_end(args);
fprintf(global_fp,str);
printf(str);
}
P.S.: the example above is not thread-safe, and is only given here as an example...
A few years back I read a blog (now the link is lost in oblivion) and have even seen the example of heavily using the function pointers in developing the patch for the firmware in my previous organisation.
But due to security reasons/NDA signed I couldn't take the copy of code (and I am proud of not doing so and following best practices).
I have seen that the functions are coded in some naming conventions like:
filename_func_<version>_<minor_string>(arguments,other arguments)
and similar file is coded (part of patch) and flashed in the ROM and when the function is called it takes the address of the new definition of the function from the Patch location.
Any idea/detail on this?
The system has to be designed to make this work.
There are various aspects that must be coordinated:
There has to be a way to change the function pointers.
The code has to invoke the functions through the pointers.
There are multiple ways to do it in detail, but they end up being variants on a theme.
/* Declaration of function pointer used to invoke the function */
extern int (*filename_func_ptr)(int arg1, char const *arg2);
/* Two versions of the function */
extern int filename_func_1_1(int arg1, char const *arg2);
extern int filename_func_1_2(int arg1, char const *arg2);
/* Definition of function pointer - pointing to version 1.1 of the function */
int (*filename_func_ptr)(int arg1, char const *arg2) = filename_func_1_1;
/* Use of function pointer */
static void some_function(void)
{
printf("%d\n", (*filename_func_ptr)(1, "pi"));
}
Note that you might never have both filename_func_1_1() and filename_func_1_2() declared in the same file. The effect I'm describing is 'as if'.
After patching, by whatever means you choose, the result is as if you had this written:
int (*filename_func_ptr)(int arg1, char const *arg2) = filename_func_1_2;
You can get into issues with dynamic loading (dlsym(), plus dlopen() etc) to get the new symbols. That would need a file name for the library to dynamically load, plus a way to convert the function names into pointers to the right functions.
Each patchable function needs to be invoked uniformly through the pointer. This is what allows you to replace the function. Consider whether a dynamic (shared) library is sufficient (because it will probably be simpler). If you're on firmware, you probably don't have that luxury.
You need to consider how you'll handle multiple functions with divergent interfaces; will you use single global variables as shown here, or some generic function pointer type which has to be cast appropriately at each invocation, or a union type, or a structure holding pointers, or ... There are a lot of ways of organizing things.
Is there any way to make this code shorter?
long call_f(int argc, long *argv) {
switch (argc) {
case 0:
return f();
break;
case 1:
return f(argv[0]);
break;
case 2:
return f(argv[0], argv[1]);
break;
case 3:
return f(argv[0], argv[1], argv[2]);
break;
case 4:
return f(argv[0], argv[1], argv[2], argv[3]);
break;
// ...
}
return -1;
}
No, there isn't any good way to do this. See here:
http://c-faq.com/varargs/handoff.html
You can write a macro with token pasting to hide this behavior but that macro will be no simpler than this code, thus it's only worth writing if you have multiple functions like f() where you would otherwise have to duplicate this case statement.
I don't know how you can make your code shorter but I saw this line in your code:
return f();
From the next calls to f function, it seems that f is a function that takes variable number of arguments.
You can read in wikipedia that:
Variadic functions must have at least
one named parameter, so, for instance,
char *wrong(...);
is not allowed in C.
Based on that, maybe the return f(); statement is causing you trouble?
There's actually a method to call a function at run-time if you know its calling convention and which parameters it receives. This however lies out of the standard C/C++ language scope.
For x86 assembler:
Assuming the following:
You know to prepare all the parameters for your function in a solid buffer, exactly in the manner they'd be packed on the stack.
Your function doesn't take/return C++ objects by value.
You may use then the following function:
int CallAnyFunc(PVOID pfn, PVOID pParams, size_t nSizeParams)
{
// Reserve the space on the stack
// This is equivalent (in some sense) to 'push' all the parameters into the stack.
// NOTE: Don't just subtract the stack pointer, better to call _alloca, because it also takes
// care of ensuring all the consumed memory pages are accessible
_alloca(nSizeParams);
// Obtain the stack top pointer
char* pStack;
_asm {
mov pStack, esp
};
// Copy all the parameters into the stack
// NOTE: Don't use the memcpy function. Because the call to it
// will overwrite the stack (which we're currently building)
for (size_t i = 0; i < nSizeParams; i++)
pStack[i] = ((char*) pParams)[i];
// Call your function
int retVal;
_asm {
call pfn
// Most of the calling conventions return the value of the function (if anything is returned)
// in EAX register
mov retVal, eax
};
return retVal;
}
You may need to adjust this function, depending on the calling convention used
I'll post here the same answer as I posted at the duplicated question, but you should take a look at the discussion there:
What is libffi?
Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. ‘libffi’ can be used in such programs to provide a bridge from the interpreter program to compiled code.
The ‘libffi’ library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run time.
FFI stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language. The ‘libffi’ library really only provides the lowest, machine dependent layer of a fully featured foreign function interface. A layer must exist above ‘libffi’ that handles type conversions for values passed between the two languages.
‘libffi’ assumes that you have a pointer to the function you wish to call and that you know the number and types of arguments to pass it, as well as the return type of the function.
Historic background
libffi, originally developed by Anthony Green (SO user: anthony-green), was inspired by the Gencall library from Silicon Graphics. Gencall was developed by Gianni Mariani, then employed by SGI, for the purpose of allowing calls to functions by address and creating a call frame for the particular calling convention. Anthony Green refined the idea and extended it to other architectures and calling conventions and open sourcing libffi.
Calling pow with libffi
#include <stdio.h>
#include <math.h>
#include <ffi.h>
int main()
{
ffi_cif call_interface;
ffi_type *ret_type;
ffi_type *arg_types[2];
/* pow signature */
ret_type = &ffi_type_double;
arg_types[0] = &ffi_type_double;
arg_types[1] = &ffi_type_double;
/* prepare pow function call interface */
if (ffi_prep_cif(&call_interface, FFI_DEFAULT_ABI, 2, ret_type, arg_types) == FFI_OK)
{
void *arg_values[2];
double x, y, z;
/* z stores the return */
z = 0;
/* arg_values elements point to actual arguments */
arg_values[0] = &x;
arg_values[1] = &y;
x = 2;
y = 3;
/* call pow */
ffi_call(&call_interface, FFI_FN(pow), &z, arg_values);
/* 2^3=8 */
printf("%.0f^%.0f=%.0f\n", x, y, z);
}
return 0;
}
I think I can assert libffi is a portable way to do what I asked, contrary to Antti Haapala's assertion that there isn't such a way. If we can't call libffi a portable technology, given how far it's ported/implemented across compilers and architectures, and which interface complies with C standard, we too can't call C, or anything, portable.
Information and history extracted from:
https://github.com/atgreen/libffi/blob/master/doc/libffi.info
http://en.wikipedia.org/wiki/Libffi
You can check out my answer to:
Best Way to Store a va_list for Later Use in C/C++
Which seems to work, yet scare people. It's not guaranteed cross-platform or portable, but it seems to be workable on a couple of platforms, at least. ;)
Does f have to accept a variable number of pointers to long? Can you rewrite it to accept an array and a count?
I am trying to do something like the following
enum types {None, Bool, Short, Char, Integer, Double, Long, Ptr};
int main(int argc, char ** args) {
enum types params[10] = {0};
void* triangle = dlopen("./foo.so", RTLD_LAZY);
void * fun = dlsym(triangle, ars[1]);
<<pseudo code>>
}
Where pseudo code is something like
fun = {}
for param in params:
if param == None:
fun += void
if param == Bool:
fun += Boolean
if param == Integer:
fun += int
...
returnVal = fun.pop()
funSignature = returnval + " " + funName + "(" + Riffle(fun, ",") + ")"
exec funSignature
Thank you
Actually, you can do nearly all you want. In C language (unlike C++, for example), the functions in shared objects are referenced merely by their names. So, to find--and, what is most important, to call--the proper function, you don't need its full signature. You only need its name! It's both an advantage and disadvantage --but that's the nature of a language you chose.
Let me demonstrate, how it works.
#include <dlfcn.h>
typedef void* (*arbitrary)();
// do not mix this with typedef void* (*arbitrary)(void); !!!
int main()
{
arbitrary my_function;
// Introduce already loaded functions to runtime linker's space
void* handle = dlopen(0,RTLD_NOW|RTLD_GLOBAL);
// Load the function to our pointer, which doesn't know how many arguments there sould be
*(void**)(&my_function) = dlsym(handle,"something");
// Call something via my_function
(void) my_function("I accept a string and an integer!\n",(int)(2*2));
return 0;
}
In fact, you can call any function that way. However, there's one drawback. You actually need to know the return type of your function in compile time. By default, if you omit void* in that typedef, int is assumed as return type--and, yes, it's a correct C code. The thing is that the compiler needs to know the size of the return type to operate the stack properly.
You can workaround it by tricks, for example, by pre-declaring several function types with different sizes of return types in advance and then selecting which one you actually are going to call. But the easier solution is to require functions in your plugin to return void* or int always; the actual result being returned via pointers given as arguments.
What you must ensure is that you always call the function with the exact number and types of arguments it's supposed to accept. Pay closer attention to difference between different integer types (your best option would be to explicitly cast arguments to them).
Several commenters reported that the code above is not guaranteed to work for variadic functions (such as printf).
What dlsym() returns is normally a function pointer - disguised as a void *. (If you ask it for the name of a global variable, it will return you a pointer to that global variable, too.)
You then invoke that function just as you might using any other pointer to function:
int (*fun)(int, char *) = (int (*)(int, char *))dlsym(triangle, "function");
(*fun)(1, "abc"); # Old school - pre-C89 standard, but explicit
fun(1, "abc"); # New school - C89/C99 standard, but implicit
I'm old school; I prefer the explicit notation so that the reader knows that 'fun' is a pointer to a function without needing to see its declaration. With the new school notation, you have to remember to look for a variable 'fun' before trying to find a function called 'fun()'.
Note that you cannot build the function call dynamically as you are doing - or, not in general. To do that requires a lot more work. You have to know ahead of time what the function pointer expects in the way of arguments and what it returns and how to interpret it all.
Systems that manage more dynamic function calls, such as Perl, have special rules about how functions are called and arguments are passed and do not call (arguably cannot call) functions with arbitrary signatures. They can only call functions with signatures that are known about in advance. One mechanism (not used by Perl) is to push the arguments onto a stack, and then call a function that knows how to collect values off the stack. But even if that called function manipulates those values and then calls an arbitrary other function, that called function provides the correct calling sequence for the arbitrary other function.
Reflection in C is hard - very hard. It is not undoable - but it requires infrastructure to support it and discipline to use it, and it can only call functions that support the infrastructure's rules.
The Proper Solution
Assuming you're writing the shared libraries; the best solution I've found to this problem is strictly defining and controlling what functions are dynamically linked by:
Setting all symbols hidden
for example clang -dynamiclib Person.c -fvisibility=hidden -o libPerson.dylib when compiling with clang
Then using __attribute__((visibility("default"))) and extern "C" to selectively unhide and include functions
Profit! You know what the function's signature is. You wrote it!
I found this in Apple's Dynamic Library Design Guidelines. These docs also include other solutions to the problem above was just my favorite.
The Answer to your Question
As stated in previous answers, C and C++ functions with extern "C" in their definition aren't mangled so the function's symbols simply don't include the full function signature. If you're compiling with C++ without extern "C" however functions are mangled so you could demangle them to get the full function's signature (with a tool like demangler.com or a c++ library). See here for more details on what mangling is.
Generally speaking it's best to use the first option if you're trying to import functions with dlopen.