Implementing function delegates in C with unions and function pointers - c

I'd like to be able to generically pass a function to a function in C. I've used C for a few years, and I'm aware of the barriers to implementing proper closures and higher-order functions. It's almost insurmountable.
I scoured StackOverflow to see what other sources had to say on the matter:
higher-order-functions-in-c
anonymous-functions-using-gcc-statement-expressions
is-there-a-way-to-do-currying-in-c
functional-programming-currying-in-c-issue-with-types
emulating-partial-function-application-in-c
fake-anonymous-functions-in-c
functional-programming-in-c-with-macro-higher-order-function-generators
higher-order-functions-in-c-as-a-syntactic-sugar-with-minimal-effort
...and none had a silver-bullet generic answer, outside of either using varargs or assembly. I have no bones with assembly, but if I can efficiently implement a feature in the host language, I usually attempt to.
Since I can't have HOF easily...
I'd love higher-order functions, but I'll settle for delegates in a pinch. I suspect that with something like the code below I could get a workable delegate implementation in C.
An implementation like this comes to mind:
enum FUN_TYPES {
GENERIC,
VOID_FUN,
INT_FUN,
UINT32_FUN,
FLOAT_FUN,
};
typedef struct delegate {
uint32 fun_type;
union function {
int (*int_fun)(int);
uint32 (*uint_fun)(uint);
float (*float_fun)(float);
/* ... etc. until all basic types/structs in the
program are accounted for. */
} function;
} delegate;
Usage Example:
void mapint(struct fun f, int arr[20]) {
int i = 0;
if(f.fun_type == INT_FUN) {
for(; i < 20; i++) {
arr[i] = f.function.int_fun(arr[i]);
}
}
}
Unfortunately, there are some obvious downsides to this approach to delegates:
No type checks, save those which you do yourself by checking the 'fun_type' field.
Type checks introduce extra conditionals into your code, making it messier and more branchy than before.
The number of (safe) possible permutations of the function is limited by the size of the 'fun_type' variable.
The enum and list of function pointer definitions would have to be machine generated. Anything else would border on insanity, save for trivial cases.
Going through ordinary C, sadly, is not as efficient as, say a mov -> call sequence, which could probably be done in assembly (with some difficulty).
Does anyone know of a better way to do something like delegates in C?
Note: The more portable and efficient, the better
Also, Note: I've heard of Don Clugston's very fast delegates for C++. However, I'm not interested in C++ solutions--just C .

You could add a void* argument to all your functions to allow for bound arguments, delegation, and the like. Unfortunately, you'd need to write wrappers for anything that dealt with external functions and function pointers.

There are two questions where I have investigated techniques for something similar providing slightly different versions of the basic technique. The downside of this is that you lose compile time checks since the argument lists are built at run time.
The first is my answer to the question of Is there a way to do currying in C. This approach uses a proxy function to invoke a function pointer and the arguments for the function.
The second is my answer to the question C Pass arguments as void-pointer-list to imported function from LoadLibrary().
The basic idea is to have a memory area that is then used to build an argument list and to then push that memory area onto the stack as part of the call to the function. The result is that the called function sees the memory area as a list of parameters.
In C the key is to define a struct which contains an array which is then used as the memory area. When the called function is invoked, the entire struct is passed by value which means that the arguments set into the array are then pushed onto the stack so that the called function sees not a struct value but rather a list of arguments.
With the answer to the curry question, the memory area contains a function pointer as well as one or more arguments, a kind of closure. The memory area is then handed to a proxy function which actually invokes the function with the arguments in the closure.
This works because the standard C function call pushes arguments onto the stack, calls the function and when the function returns the caller cleans up the stack because it knows what was actually pushed onto the stack.

Related

C callbacks - optional argument?

Hey I have implemented some callbacks in my C program.
typedef void (*server_end_callback_t)(void *callbackArg);
then I have variable inside structure to store this callback
server->server_end_callback = on_sever_end;
What I have noticed it that I can pass in on_server_end callback function implementation that skips void *callbackArg and the code works correctly (no errors).
Is it correct to skip some arguments like void * implementing callback functions which prototypes takes such arguments?
void on_server_end(void) {
// some code goes here
}
I believe it is an undefined behavior from the C point of view, but it works because of the calling convention you are using.
For example, AMD64 ABI states that the first six arguments get passed to the calling function using CPU registers, not stack. So neither caller nor callee need no clean-up for the first six arguments and it works fine.
For more info please refer the Wikipedia.
The code works correctly because of the convention of passing arguments. Caller knows that callee expects some arguments - exactly one. So, it prepares the argument(s) (either in register or on stack - depending on ABI on your platform). Then callee uses those parameters or not. After return from callee, caller cleans up the stack if necessary. That's the mistery.
However, you shall not abuse this specific behaviour by passing incompatible function. It is a good practice to always compile your code with options -W -Wall -Werror (clang/gcc and compatible). Enabling such option would provide you a compilation error.
C allows a certain amount of playing fast and loose with function arguments. So
void (*fptr) ();
means "a pointer to a function which takes zero or more arguments". However this is for backwards compatibility, it's not wise to use it in new C code. The other way round
void (*fptr)(void *ptr)
{
/* don't use the void */
}
/* in another scope */
(*fptr)(); /* call with no arguments */
also works, as long as you don't use the void *, and I believe it is guaranteed to work though I'm not completely sure about that (on a modern machine the calling convention is to pass the first arguments in registers, so you just get a garbage register, and it will work). Again, it is a very bad idea to rely on it.
You can pass a void *, which you then cast to a structure of appropriate type containing as many arguments as you wish. That is a good idea and a sensible use of C's flexibility.
Is it correct to skip some arguments like void * implementing callback functions which prototypes takes such arguments?
No it is not. Any function with a given function declaration is not compatile with a function of a different function declaration. This rule applies for pointers to functions too.
So if you have a function such as pthread_create(..., my_callback, ...); and it expects you to pass a function pointer of type void* (*) (void*), then you cannot pass a function pointer of a different format. This invokes undefined behavior and compilers may generate incorrect code.
That being said, function pointer compatibility is a common non-standard extension on many systems. If the calling convention of the system is specified in a way that the function format doesn't matter, and the specific compiler port supports it, then such code might work just fine.
Such code is however not portable and not standard. It is best to avoid it whenever possible.

Error: Expected a “;” Visual Studio 2013 [duplicate]

This is not a lambda function question, I know that I can assign a lambda to a variable.
What's the point of allowing us to declare, but not define a function inside code?
For example:
#include <iostream>
int main()
{
// This is illegal
// int one(int bar) { return 13 + bar; }
// This is legal, but why would I want this?
int two(int bar);
// This gets the job done but man it's complicated
class three{
int m_iBar;
public:
three(int bar):m_iBar(13 + bar){}
operator int(){return m_iBar;}
};
std::cout << three(42) << '\n';
return 0;
}
So what I want to know is why would C++ allow two which seems useless, and three which seems far more complicated, but disallow one?
EDIT:
From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.
It is not obvious why one is not allowed; nested functions were proposed a long time ago in N0295 which says:
We discuss the introduction of nested functions into C++. Nested
functions are well understood and their introduction requires little
effort from either compiler vendors, programmers, or the committee.
Nested functions offer significant advantages, [...]
Obviously this proposal was rejected, but since we don't have meeting minutes available online for 1993 we don't have a possible source for the rationale for this rejection.
In fact this proposal is noted in Lambda expressions and closures for C
++ as a possible alternative:
One article [Bre88] and proposal N0295 to the C
++ committee [SH93] suggest adding nested functions to C
++ . Nested functions are similar to lambda expressions, but are defined as statements within a function body, and the resulting
closure cannot be used unless that function is active. These proposals
also do not include adding a new type for each lambda expression, but
instead implementing them more like normal functions, including
allowing a special kind of function pointer to refer to them. Both of
these proposals predate the addition of templates to C
++ , and so do not mention the use of nested functions in combination with generic algorithms. Also, these proposals have no way to copy
local variables into a closure, and so the nested functions they
produce are completely unusable outside their enclosing function
Considering we do now have lambdas we are unlikely to see nested functions since, as the paper outlines, they are alternatives for the same problem and nested functions have several limitations relative to lambdas.
As for this part of your question:
// This is legal, but why would I want this?
int two(int bar);
There are cases where this would be a useful way to call the function you want. The draft C++ standard section 3.4.1 [basic.lookup.unqual] gives us one interesting example:
namespace NS {
class T { };
void f(T);
void g(T, int);
}
NS::T parm;
void g(NS::T, float);
int main() {
f(parm); // OK: calls NS::f
extern void g(NS::T, float);
g(parm, 1); // OK: calls g(NS::T, float)
}
Well, the answer is "historical reasons". In C you could have function declarations at block scope, and the C++ designers did not see the benefit in removing that option.
An example usage would be:
#include <iostream>
int main()
{
int func();
func();
}
int func()
{
std::cout << "Hello\n";
}
IMO this is a bad idea because it is easy to make a mistake by providing a declaration that does not match the function's real definition, leading to undefined behaviour which will not be diagnosed by the compiler.
In the example you give, void two(int) is being declared as an external function, with that declaration only being valid within the scope of the main function.
That's reasonable if you only wish to make the name two available within main() so as to avoid polluting the global namespace within the current compilation unit.
Example in response to comments:
main.cpp:
int main() {
int foo();
return foo();
}
foo.cpp:
int foo() {
return 0;
}
no need for header files. compile and link with
c++ main.cpp foo.cpp
it'll compile and run, and the program will return 0 as expected.
You can do these things, largely because they're actually not all that difficult to do.
From the viewpoint of the compiler, having a function declaration inside another function is pretty trivial to implement. The compiler needs a mechanism to allow declarations inside of functions to handle other declarations (e.g., int x;) inside a function anyway.
It will typically have a general mechanism for parsing a declaration. For the guy writing the compiler, it doesn't really matter at all whether that mechanism is invoked when parsing code inside or outside of another function--it's just a declaration, so when it sees enough to know that what's there is a declaration, it invokes the part of the compiler that deals with declarations.
In fact, prohibiting these particular declarations inside a function would probably add extra complexity, because the compiler would then need an entirely gratuitous check to see if it's already looking at code inside a function definition and based on that decide whether to allow or prohibit this particular declaration.
That leaves the question of how a nested function is different. A nested function is different because of how it affects code generation. In languages that allow nested functions (e.g., Pascal) you normally expect that the code in the nested function has direct access to the variables of the function in which it's nested. For example:
int foo() {
int x;
int bar() {
x = 1; // Should assign to the `x` defined in `foo`.
}
}
Without local functions, the code to access local variables is fairly simple. In a typical implementation, when execution enters the function, some block of space for local variables is allocated on the stack. All the local variables are allocated in that single block, and each variable is treated as simply an offset from the beginning (or end) of the block. For example, let's consider a function something like this:
int f() {
int x;
int y;
x = 1;
y = x;
return y;
}
A compiler (assuming it didn't optimize away the extra code) might generate code for this roughly equivalent to this:
stack_pointer -= 2 * sizeof(int); // allocate space for local variables
x_offset = 0;
y_offset = sizeof(int);
stack_pointer[x_offset] = 1; // x = 1;
stack_pointer[y_offset] = stack_pointer[x_offset]; // y = x;
return_location = stack_pointer[y_offset]; // return y;
stack_pointer += 2 * sizeof(int);
In particular, it has one location pointing to the beginning of the block of local variables, and all access to the local variables is as offsets from that location.
With nested functions, that's no longer the case--instead, a function has access not only to its own local variables, but to the variables local to all the functions in which it's nested. Instead of just having one "stack_pointer" from which it computes an offset, it needs to walk back up the stack to find the stack_pointers local to the functions in which it's nested.
Now, in a trivial case that's not all that terrible either--if bar is nested inside of foo, then bar can just look up the stack at the previous stack pointer to access foo's variables. Right?
Wrong! Well, there are cases where this can be true, but it's not necessarily the case. In particular, bar could be recursive, in which case a given invocation of bar might have to look some nearly arbitrary number of levels back up the stack to find the variables of the surrounding function. Generally speaking, you need to do one of two things: either you put some extra data on the stack, so it can search back up the stack at run-time to find its surrounding function's stack frame, or else you effectively pass a pointer to the surrounding function's stack frame as a hidden parameter to the nested function. Oh, but there's not necessarily just one surrounding function either--if you can nest functions, you can probably nest them (more or less) arbitrarily deep, so you need to be ready to pass an arbitrary number of hidden parameters. That means you typically end up with something like a linked list of stack frames to surrounding functions, and access to variables of surrounding functions is done by walking that linked list to find its stack pointer, then accessing an offset from that stack pointer.
That, however, means that access to a "local" variable may not be a trivial matter. Finding the correct stack frame to access the variable can be non-trivial, so access to variables of surrounding functions is also (at least usually) slower than access to truly local variables. And, of course, the compiler has to generate code to find the right stack frames, access variables via any of an arbitrary number of stack frames, and so on.
This is the complexity that C was avoiding by prohibiting nested functions. Now, it's certainly true that a current C++ compiler is a rather different sort of beast from a 1970's vintage C compiler. With things like multiple, virtual inheritance, a C++ compiler has to deal with things on this same general nature in any case (i.e., finding the location of a base-class variable in such cases can be non-trivial as well). On a percentage basis, supporting nested functions wouldn't add much complexity to a current C++ compiler (and some, such as gcc, already support them).
At the same time, it rarely adds much utility either. In particular, if you want to define something that acts like a function inside of a function, you can use a lambda expression. What this actually creates is an object (i.e., an instance of some class) that overloads the function call operator (operator()) but it still gives function-like capabilities. It makes capturing (or not) data from the surrounding context more explicit though, which allows it to use existing mechanisms rather than inventing a whole new mechanism and set of rules for its use.
Bottom line: even though it might initially seem like nested declarations are hard and nested functions are trivial, more or less the opposite is true: nested functions are actually much more complex to support than nested declarations.
The first one is a function definition, and it is not allowed. Obvious, wt is the usage of putting a definition of a function inside another function.
But the other twos are just declarations. Imagine you need to use int two(int bar); function inside the main method. But it is defined below the main() function, so that function declaration inside the function makes you to use that function with declarations.
The same applies to the third. Class declarations inside the function allows you to use a class inside the function without providing an appropriate header or reference.
int main()
{
// This is legal, but why would I want this?
int two(int bar);
//Call two
int x = two(7);
class three {
int m_iBar;
public:
three(int bar):m_iBar(13 + bar) {}
operator int() {return m_iBar;}
};
//Use class
three *threeObj = new three();
return 0;
}
This language feature was inherited from C, where it served some purpose in C's early days (function declaration scoping maybe?).
I don't know if this feature is used much by modern C programmers and I sincerely doubt it.
So, to sum up the answer:
there is no purpose for this feature in modern C++ (that I know of, at least), it is here because of C++-to-C backward compatibility (I suppose :) ).
Thanks to the comment below:
Function prototype is scoped to the function it is declared in, so one can have a tidier global namespace - by referring to external functions/symbols without #include.
Actually, there is one use case which is conceivably useful. If you want to make sure that a certain function is called (and your code compiles), no matter what the surrounding code declares, you can open your own block and declare the function prototype in it. (The inspiration is originally from Johannes Schaub, https://stackoverflow.com/a/929902/3150802, via TeKa, https://stackoverflow.com/a/8821992/3150802).
This may be particularily useful if you have to include headers which you don't control, or if you have a multi-line macro which may be used in unknown code.
The key is that a local declaration supersedes previous declarations in the innermost enclosing block. While that can introduce subtle bugs (and, I think, is forbidden in C#), it can be used consciously. Consider:
// somebody's header
void f();
// your code
{ int i;
int f(); // your different f()!
i = f();
// ...
}
Linking may be interesting because chances are the headers belong to a library, but I guess you can adjust the linker arguments so that f() is resolved to your function by the time that library is considered. Or you tell it to ignore duplicate symbols. Or you don't link against the library.
This is not an answer to the OP question, but rather a reply to several comments.
I disagree with these points in the comments and answers: 1 that nested declarations are allegedly harmless, and 2 that nested definitions are useless.
1 The prime counterexample for the alleged harmlessness of nested function declarations is the infamous Most Vexing Parse. IMO the spread of confusion caused by it is enough to warrant an extra rule forbidding nested declarations.
2 The 1st counterexample to the alleged uselessness of nested function definitions is frequent need to perform the same operation in several places inside exactly one function. There is an obvious workaround for this:
private:
inline void bar(int abc)
{
// Do the repeating operation
}
public:
void foo()
{
int a, b, c;
bar(a);
bar(b);
bar(c);
}
However, this solution often enough contaminates the class definition with numerous private functions, each of which is used in exactly one caller. A nested function declaration would be much cleaner.
Specifically answering this question:
From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.
Because consider this code:
int main()
{
int foo() {
// Do something
return 0;
}
return 0;
}
Questions for language designers:
Should foo() be available to other functions?
If so, what should be its name? int main(void)::foo()?
(Note that 2 would not be possible in C, the originator of C++)
If we want a local function, we already have a way - make it a static member of a locally-defined class. So should we add another syntactic method of achieving the same result? Why do that? Wouldn't it increase the maintenance burden of C++ compiler developers?
And so on...
Just wanted to point out that the GCC compiler allows you to declare functions inside functions. Read more about it here. Also with the introduction of lambdas to C++, this question is a bit obsolete now.
The ability to declare function headers inside other functions, I found useful in the following case:
void do_something(int&);
int main() {
int my_number = 10 * 10 * 10;
do_something(my_number);
return 0;
}
void do_something(int& num) {
void do_something_helper(int&); // declare helper here
do_something_helper(num);
// Do something else
}
void do_something_helper(int& num) {
num += std::abs(num - 1337);
}
What do we have here? Basically, you have a function that is supposed to be called from main, so what you do is that you forward declare it like normal. But then you realize, this function also needs another function to help it with what it's doing. So rather than declaring that helper function above main, you declare it inside the function that needs it and then it can be called from that function and that function only.
My point is, declaring function headers inside functions can be an indirect method of function encapsulation, which allows a function to hide some parts of what it's doing by delegating to some other function that only it is aware of, almost giving an illusion of a nested function.
Nested function declarations are allowed probably for
1. Forward references
2. To be able to declare a pointer to function(s) and pass around other function(s) in a limited scope.
Nested function definitions are not allowed probably due to issues like
1. Optimization
2. Recursion (enclosing and nested defined function(s))
3. Re-entrancy
4. Concurrency and other multithread access issues.
From my limited understanding :)

Run-time parameters in gcc (inverse va_args/varargs)

I'm trying to make some improvements to a interpreter for microcontrollers that I'm working on. For executing built-in functions I currently have something like this (albeit a bit faster):
function executeBuiltin(functionName, functionArgs) {
if (functionName=="foo") foo(getIntFromArg(functionArgs[0]));
if (functionName=="bar") bar(getIntFromArg(functionArgs[0]),getBoolFromArg(functionArgs[1]),getFloatFromArg(functionArgs[2]));
if (functionName=="baz") baz();
...
}
But it is for an embedded device (ARM) with very limited resources, and I need to cut down on the code size drastically. What I'd like to do is to have a general-purpose function for calling other functions with different arguments - something like this:
function executeBuiltin(functionName, functionArgs) {
functionData = fast_lookup(functionName);
call_with_args(functionData.functionPointer, functionData.functionArgumentTypes, functionArgs);
}
So I want to be able to call a standard C function and pass it whatever arguments it needs (which could all be of different types). For this, I need a call_with_args function.
I want to avoid re-writing every function to take argc+argv. Ideally each function that was called would be an entirely standard C function.
There's a discussion about this here - but has anything changed since 1993 when that post was written? Especially as I'm running on ARM where arguments are in registers rather than on the stack. Even if it's not in standard C, is there anything GCC specific that can be done?
UPDATE: It seems that despite behaviour being 'undefined' according to the spec, it looks like because of the way C calls work, you can pass more arguments to a function than it is expecting and everything will be fine, so you can unpack all the arguments into an array of uint32s, and can then just pass each uint32 to the function.
That makes writing 'nice' code for calls much easier, and it appears to work pretty well (on 32 bit platforms). The only problem seems to be when passing 64 bit numbers and compiling for 64bit x86 as it seems to do something particularly strange in that case.
Would it be possible to do at compile time with macros?
Something along the lines of:
https://www.redhat.com/archives/libvir-list/2014-March/msg00730.html
If runtime was required, perhaps __buildin_apply_args() could be leveraged.
from this document, section 5.5, Parameter Passing, it seems like parameters are passed both in registers and in stack, as with most of today platforms.
With "non standard C" I am thinking to pack the parameters and call the function following the documentation with some asm(). However you need a minimal information about the signature of the function being called anyway (I mean, how many bits for each argument to be passed).
From this point of view I would prefer to prepare an array of function names, an array of function pointers and an array of enumerated function signatures (in the number of bits of each argument... you don't need to differentiate void* from char* for example) and a switch/case on the signatures, and a switch/case on the last one. So I have reported two answers here.
You can do a very simple serialization to pass arbitrary arguments. Create an array and memcpy sizeof(arg) bytes into it for each passed argument.
Or you can create structs for function arguments.
Every function takes a char* or a void*. Then you pass either a pointer to a struct with that functions parameters, or you define a set of macros or functions to encode and decode arbitrary data from an array and pass the pointer to that array.

Is the only use of function pointers to implement callbacks?

I have come across the fact that function pointers can be used to implement callbacks. Is there any other usage of function pointers? Is there any other situation that function pointers proved to be useful?
How about sorting? Pass in a function pointer to compare any two elements.
How about filtering? Pass in a function pointer to decide whether an input element should be contained in the output of a filter.
How about transformations? Pass in a function pointer to convert an input element to an output element.
These are all collection-based uses, but they're very useful. (The broad equivalent of function pointers in .NET is delegates, and they're the basis of LINQ, which allows very simple querying, transformations, grouping etc.)
Anywhere you want to be able to abstract out the idea of "a single piece of behaviour", writing a generic function which doesn't need to know the details of that behaviour, a function pointer could be useful.
In addition to what Jon wrote, function pointers in C can be used to implement OO programming style (e.g. polymorphism).
Function pointers (or their typed and more advanced equivalents) are a helpful feature when implementing inversion of control related patterns. All examples mentioned are applications of IoC principle (the sorting algorithm does not control the used predicate, the call to an object method is delayed until run-time etc)
Regards,
Paul
A function pointer is used in any situation where the function to be called is determined at runtime rather than compile-time. This includes callbacks, but may also be used as a switch-case alternative for example, and to adapt the behaviour of a function by passing a function pointer that defines that behaviour - this is how the standard library qsort() function works for example, enabling it to sort any kind of object.
I have used them in particular to implement a command line parser that evaluates C expressions entered as strings at run-time, and can include function calls. This uses a symbol table to lookup the pointer to the function so it can be called on demand from the operator.
All you might ever wish to know on the subject can be found at The Function Pointer Tutorials
In the end function pointers are just one of those rarely used tools you keep in your bag. If you understand them, when the situation arises where it may provide a solution, you will hopefully recognise it.
It is the only way you can implement Higher Order Functions in C.
As others have mentioned, I've found that one of the most significant uses of function pointers (other than for callbacks) is to enable the construction of generic data structures.
Say you want to construct a hashmap with arbitrary keys and values. One way to do that is declare both void *key and void *value and then pass in two function pointers during the initialization phase: int (*hashcode)(void*) and int (*equals)(void*, void*).
This gives you the ability to build a hashmap that can take basically anything that you can write the above two functions for. In my case, the key was a fixed size character buffer and the value was a pointer to a struct.
It is also used in the following
Making jump tables(like vector tables or ISR)
making the function abstract
Developing Finite State Machines (as state, action and triggered even can easily be implemented using the function pointers, the design also seems to be easy and more readable in that)
Event Driven Framework(GUI - gtk is an example)
Other than callbacks (great for abstraction), function pointers can be used to implement polymorphism in C. This is done extensively in the Linux kernel, and common C libraries such as glibc, GTK+ and GLib.

How can I write a generic C function for calling a Win32 function?

To allow access to the Win32 API from a scripting language (written in C), I would like to write a function such as the following:
void Call(LPCSTR DllName, LPCSTR FunctionName,
LPSTR ReturnValue, USHORT ArgumentCount, LPSTR Arguments[])
which will call, generically, any Win32 API function.
(the LPSTR parameters are essentially being used as byte arrays - assume that they have been correctly sized to take the correct data type external to the function. Also I believe that some additional complexity is required to distinguish between pointer and non-pointer arguments but I'm ignoring that for the purposes of this question).
The problem I have is passing the arguments into the Win32 API functions. Because these are stdcall I can't use varargs so the implementation of 'Call' must know about the number of arguments in advance and hence it cannot be generic...
I think I can do this with assembly code (by looping over the arguments, pushing each to the stack) but is this possible in pure C?
Update: I've marked the 'No it is not possible' answer as accepted for now. I will of course change this if a C-based solution comes to light.
Update: ruby/dl looks like it may be implemented using a suitable mechanism. Any details on this would be appreciated.
First things first: You cannot pass a type as a parameter in C. The only option you are left with is macros.
This scheme works with a little modification (array of void * for arguments), provided you are doing a LoadLibrary/GetProcAddress to call Win32 functions. Having a function name string otherwise will be of no use. In C, the only way you call a function is via its name (an identifier) which in most cases decays to a pointer to the function. You also have to take care of casting the return value.
My best bet:
// define a function type to be passed on to the next macro
#define Declare(ret, cc, fn_t, ...) typedef ret (cc *fn_t)(__VA_ARGS__)
// for the time being doesn't work with UNICODE turned on
#define Call(dll, fn, fn_t, ...) do {\
HMODULE lib = LoadLibraryA(dll); \
if (lib) { \
fn_t pfn = (fn_t)GetProcAddress(lib, fn); \
if (pfn) { \
(pfn)(__VA_ARGS__); \
} \
FreeLibrary(lib); \
} \
} while(0)
int main() {
Declare(int, __stdcall, MessageBoxProc, HWND, LPCSTR, LPCSTR, UINT);
Call("user32.dll", "MessageBoxA", MessageBoxProc,
NULL, ((LPCSTR)"?"), ((LPCSTR)"Details"),
(MB_ICONWARNING | MB_CANCELTRYCONTINUE | MB_DEFBUTTON2));
return 0;
}
No, I don't think its possible to do with without writing some assembly. The reason is you need precise control over what is on the stack before you call the target function, and there's no real way to do that in pure C. It is, of course, simple to do in Assembly though.
Also, you're using PCSTR for all of these arguments, which is really just const char *. But since all of these args aren't strings, what you actually want to use for return value and for Arguments[] is void * or LPVOID. This is the type you should use when you don't know the true type of the arguments, rather than casting them to char *.
The other posts are right about the almost certain need for assembly or other non-standard tricks to actually make the call, not to mention all of the details of the actual calling conventions.
Windows DLLs use at least two distinct calling conventions for functions: stdcall and cdecl. You would need to handle both, and might even need to figure out which to use.
One way to deal with this is to use an existing library to encapsulate many of the details. Amazingly, there is one: libffi. An example of its use in a scripting environment is the implementation of Lua Alien, a Lua module that allows interfaces to arbitrary DLLs to be created in pure Lua aside from Alien itself.
A lot of Win32 APIs take pointers to structs with specific layouts. Of these, a large subset follow a common pattern where the first DWORD has to be initialized to have the size of the struct before it is called. Sometimes they require a block of memory to be passed, into which they will write a struct, and the memory block must be of a size that is determined by first calling the same API with a NULL pointer and reading the return value to discover the correct size. Some APIs allocate a struct and return a pointer to it, such that the pointer must be deallocated with a second call.
I wouldn't be that surprised if the set of APIs that can be usefully called in one shot, with individual arguments convertable from a simple string representation, is quite small.
To make this idea generally applicable, we would have to go to quite an extreme:
typedef void DynamicFunction(size_t argumentCount, const wchar_t *arguments[],
size_t maxReturnValueSize, wchar_t *returnValue);
DynamicFunction *GenerateDynamicFunction(const wchar_t *code);
You would pass a simple snippet of code to GenerateDynamicFunction, and it would wrap that code in some standard boilerplate and then invoke a C compiler/linker to make a DLL from it (there are quite a few free options available), containing the function. It would then LoadLibrary that DLL and use GetProcAddress to find the function, and then return it. This would be expensive, but you would do it once and cache the resulting DynamicFunctionPtr for repeated use. You could do this dynamically by keeping pointers in a hashtable, keyed by the code snippets themselves.
The boilerplate might be:
#include <windows.h>
// and anything else that might be handy
void DynamicFunctionWrapper(size_t argumentCount, const wchar_t *arguments[],
size_t maxReturnValueSize, wchar_t *returnValue)
{
// --- insert code snipped here
}
So an example usage of this system would be:
DynamicFunction *getUserName = GenerateDynamicFunction(
"GetUserNameW(returnValue, (LPDWORD)(&maxReturnValueSize))");
wchar_t userName[100];
getUserName(0, NULL, sizeof(userName) / sizeof(wchar_t), userName);
You could enhance this by making GenerateDynamicFunction accept the argument count, so it could generate a check at the start of the wrapper that the correct number of arguments has been passed. And if you put a hashtable in there to cache the functions for each encountered codesnippet, you could get close to your original example. The Call function would take a code snippet instead of just an API name, but would otherwise be the same. It would look up the code snippet in the hashtable, and if not present, it would call GenerateDynamicFunction and store the result in the hashtable for next time. It would then perform the call on the function. Example usage:
wchar_t userName[100];
Call("GetUserNameW(returnValue, (LPDWORD)(&maxReturnValueSize))",
0, NULL, sizeof(userName) / sizeof(wchar_t), userName);
Of course there wouldn't be much point doing any of this unless the idea was to open up some kind of general security hole. e.g. to expose Call as a webservice. The security implications exist for your original idea, but are less apparent simply because the original approach you suggested wouldn't be that effective. The more generally powerful we make it, the more of a security problem it would be.
Update based on comments:
The .NET framework has a feature called p/invoke, which exists precisely to solve your problem. So if you are doing this as a project to learn about stuff, you could look at p/invoke to get an idea of how complex it is. You could possibly target the .NET framework with your scripting language - instead of interpreting scripts in real time, or compiling them to your own bytecode, you could compile them to IL. Or you could host an existing scripting language from the many already available on .NET.
You could try something like this - it works well for win32 API functions:
int CallFunction(int functionPtr, int* stack, int size)
{
if(!stack && size > 0)
return 0;
for(int i = 0; i < size; i++) {
int v = *stack;
__asm {
push v
}
stack++;
}
int r;
FARPROC fp = (FARPROC) functionPtr;
__asm {
call fp
mov dword ptr[r], eax
}
return r;
}
The parameters in the "stack" argument should be in reverse order (as this is the order they are pushed onto the stack).
Having a function like that sounds like a bad idea, but you can try this:
int Call(LPCSTR DllName, LPCSTR FunctionName,
USHORT ArgumentCount, int args[])
{
void STDCALL (*foobar)()=lookupDLL(...);
switch(ArgumentCount) {
/* Note: If these give some compiler errors, you need to cast
each one to a func ptr type with suitable number of arguments. */
case 0: return foobar();
case 1: return foobar(args[0]);
...
}
}
On a 32-bit system, nearly all values fit into a 32-bit word and shorter values are pushed onto stack as 32-bit words for function call arguments, so you should be able to call virtually all Win32 API functions this way, just cast the arguments to int and the return value from int to the appropriate types.
I'm not sure if it will be of interest to you, but an option would be to shell out to RunDll32.exe and have it execute the function call for you. RunDll32 has some limitations and I don't believe you can access the return value whatsoever but if you form the command line arguments properly it will call the function.
Here's a link
First, you should add the size of each argument as an extra parameter. Otherwise, you need to divine the size of each parameter for each function to push onto the stack, which is possible for WinXX functions since they have to be compatible with the parameters they are documented, but tedious.
Secondly, there isn't a "pure C" way to call a function without knowing the arguments except for a varargs function, and there is no constraint on the calling convention used by a function in a .DLL.
Actually, the second part is more important than the first.
In theory, you could set up a preprocessor macro/#include structure to generate all combinations of parameter types up to, say, 11 parameters, but that implies that you know ahead of time which types will be passed through you function Call. Which is kind of crazy if you ask me.
Although, if you really wanted to do this unsafely, you could pass down the C++ mangled name and use UnDecorateSymbolName to extract the types of the parameters. However, that won't work for functions exported with C linkage.

Resources