I previously asked a question about C functions which take an unspecified number of parameters e.g. void foo() { /* code here */ } and which can be called with an unspecified number of arguments of unspecified type.
When I asked whether it is possible for a function like void foo() { /* code here */ } to get the parameters with which it was called e.g. foo(42, "random") somebody said that:
The only you can do is to use the calling conventions and knowledge of the architecture you are running at and get parameters directly from the stack. source
My question is:
If I have this function
void foo()
{
// get the parameters here
};
And I call it: foo("dummy1", "dummy2") is it possible to get the 2 parameters inside the foo function directly from the stack?
If yes, how? Is it possible to have access to the full stack? For example if I call a function recursively, is it possible to have access to each function state somehow?
If not, what's the point with the functions with unspecified number of parameters? Is this a bug in the C programming language? In which cases would anyone want foo("dummy1", "dummy2") to compile and run fine for a function which header is void foo()?
Lots of 'if's:
You stick to one version of a compiler.
One set of compiler options.
Somehow manage to convince your compiler to never pass arguments in registers.
Convince your compiler not to treat two calls f(5, "foo") and f(&i, 3.14) with different arguments to the same function as error. (This used to be a feature of, for example, the early DeSmet C compilers).
Then the activation record of a function is predictable (ie you look at the generated assembly and assume it will always be the same): the return address will be there somewhere and the saved bp (base pointer, if your architecture has one), and the sequence of the arguments will be the same. So how would you know what actual parameters were passed? You will have to encode them (their size, offset), presumably in the first argument, sort of what printf does.
Recursion (ie being in a recursive call makes no difference) each instance has its activation record (did I say you have to convince your compiler never optimise tail calls?), but in C, unlike in Pascal, you don't have a link backwards to the caller's activation record (ie local variables) since there are no nested function declarations. Getting access to the full stack ie all the activation records before the current instance is pretty tedious, error prone and mostly interest to writers of malicious code who would like to manipulate the return address.
So that's a lot of hassle and assumptions for essentially nothing.
Yes you can access passed parameters directly via stack. But no, you can't use old-style function definition to create function with variable number and type of parameters. Following code shows how to access a param via stack pointer. It is totally platform dependent , so i have no clue if it going to work on your machine or not, but you can get the idea
long foo();
int main(void)
{
printf( "%lu",foo(7));
}
long foo(x)
long x;
{
register void* sp asm("rsp");
printf("rsp = %p rsp_ value = %lx\n",sp+8, *((long*)(sp + 8)));
return *((long*)(sp + 8)) + 12;
}
get stack head pointer (rsp register on my machine)
add the offset of passed parameter to rsp => you get pointer to long x on stack
dereference the pointer, add 12 (do whatever you need) and return the value.
The offset is the issue since it depends on compiler, OS, and who knows on what else.
For this example i simple checked checked it in debugger, but if it really important for you i think you can come with some "general" for your machine solution.
If you declare void foo(), then you will get a compilation error for foo("dummy1", "dummy2").
You can declare a function that takes an unspecified number of arguments as follows (for example):
int func(char x,...);
As you can see, at least one argument must be specified. This is so that inside the function, you will be able to access all the arguments that follow the last specified argument.
Suppose you have the following call:
short y = 1000;
int sum = func(1,y,5000,"abc");
Here is how you can implement func and access each of the unspecified arguments:
int func(char x,...)
{
short y = (short)((int*)&x+1)[0]; // y = 1000
int z = (int )((int*)&x+2)[0]; // z = 5000
char* s = (char*)((int*)&x+3)[0]; // s[0...2] = "abc"
return x+y+z+s[0]; // 1+1000+5000+'a' = 6098
}
The problem here, as you can see, is that the type of each argument and the total number of arguments are unknown. So any call to func with an "inappropriate" list of arguments, may (and probably will) result in a runtime exception.
Hence, typically, the first argument is a string (const char*) which indicates the type of each of the following arguments, as well as the total number of arguments. In addition, there are standard macros for extracting the unspecified arguments - va_start and va_end.
For example, here is how you can implement a function similar in behavior to printf:
void log_printf(const char* data,...)
{
static char str[256] = {0};
va_list args;
va_start(args,data);
vsnprintf(str,sizeof(str),data,args);
va_end(args);
fprintf(global_fp,str);
printf(str);
}
P.S.: the example above is not thread-safe, and is only given here as an example...
Related
This question is about using function pointers, which are not precisely compatible, but which I hope I can use nevertheless as long as my code relies only on the compatible parts. Let's start with some code to get the idea:
typedef void (*funcp)(int* val);
static void myFuncA(int* val) {
*val *= 2;
return;
}
static int myFuncB(int* val) {
*val *= 2;
return *val;
}
int main(void) {
funcp f = NULL;
int v = 2;
f = myFuncA;
f(&v);
// now v is 4
// explicit cast so the compiler will not complain
f = (funcp)myFuncB;
f(&v);
// now v is 8
return 0;
}
While the arguments of myFuncA and myFuncB are identical and fully compatible, the return values are not and are thus just ignored by the calling code. I tried the above code and it works correctly using GCC.
What I learned so far from here and here is that the functions are incompatible by definition of the standard and may cause undefined behavior. My intuition, however, tells me that my code example will still work correctly, since it does not rely in any way on the incompatible parts (the return value). However, in the answers to this question a possible corruption of the stack has been mentioned.
So my question is: Is my example above valid C code so it will always work as intended (without any side effects), or does it depend on the compiler?
EDIT:
I want to do this in order to use a "more powerful" function with a "les powerful" interface. In my example funcp is the interface but I would like to provide additional functionality like myFuncB for optional use.
Agreed it is undefined behaviour, don't do that!
Yes the code functions, i.e. it doesn't fall over, but the value you assign after returning void is undefined.
In a very old version of "C" the return type was unspecified and int and void functions could be 'safely' intermixed. The integer value being returned in the designated accumulator register. I remember writing code using this 'feature'!
For almost anything else you might return the results are likely to be fatal.
Going forward a few years, floating-point return values are often returned using the fp coprocessor (we are still in the 80s) register, so you can't mix int and float return types, because the state of the coprocessor would be confused if the caller does not strip off the value, or strips off a value that was never placed there and causes an fp exception. Worse, if you build with fp emulation, then the fp value may be returned on the stack as described next. Also on 32-bit builds it is possible to pass 64bit objects (on 16 bit builds you can have 32 bit objects) which would be returned either using multiple registers or on the stack. If they are on the stack and allocated the wrong size, then some local stomping will occur,
Now, c supports struct return types and return value copy optimisations. All bets are off if you don't match the types correctly.
Also some function models have the caller allocate stack space for the parameters for the call, but the function itself releases the stack. Disagreement between caller and implementation on on the number or types of parameters and return values would be fatal.
By default C function names are exported and linked undecorated - just the function name defines the symbol, so different modules of your program could have different views about function signatures, which conflict when you link, and potentially generate very interesting runtime errors.
In c++ the function names are highly decorated primarily to allow overloading, but also it helps to avoid signature mismatches. This helps with keeping arguments in step, but actually, ( as noted by #Jens ) the return type is not encoded into the decorated name, primarily because the return type isn't used (wasn't, but I think occasionally can now influence) for overload resolution.
Yes, this is an undefined behaviour and you should never rely on undefined behaviour if want to write portable code.
Function with different return values can have different calling conventions. Your example will probably work for small return types, but when returning large structs (e.g. larger than 32 bits) some compilers will generate code where struct is returned in a temporary memory area which should be cleaned up the the caller.
I have a pointer to function, assume any signature.
And I have 5 different functions with same signature.
At run time one of them gets assigned to the pointer, and that function is called.
Without inserting any print statement in those functions, how can I come to know the name of function which the pointer currently points to?
You will have to check which of your 5 functions your pointer points to:
if (func_ptr == my_function1) {
puts("func_ptr points to my_function1");
} else if (func_ptr == my_function2) {
puts("func_ptr points to my_function2");
} else if (func_ptr == my_function3) {
puts("func_ptr points to my_function3");
} ...
If this is a common pattern you need, then use a table of structs instead of a function pointer:
typedef void (*my_func)(int);
struct Function {
my_func func;
const char *func_name;
};
#define FUNC_ENTRY(function) {function, #function}
const Function func_table[] = {
FUNC_ENTRY(function1),
FUNC_ENTRY(function2),
FUNC_ENTRY(function3),
FUNC_ENTRY(function4),
FUNC_ENTRY(function5)
}
struct Function *func = &func_table[3]; //instead of func_ptr = function4;
printf("Calling function %s\n", func->func_name);
func ->func(44); //instead of func_ptr(44);
Generally, in C such things are not available to the programmer.
There might be system-specific ways of getting there by using debug symbols etc., but you probably don't want to depend on the presence of these for the program to function normally.
But, you can of course compare the value of the pointer to another value, e.g.
if (ptr_to_function == some_function)
printf("Function pointer now points to some_function!\n");
The function names will not be available at runtime.
C is not a reflective language.
Either maintain a table of function pointers keyed by their name, or supply a mode of calling each function that returns the name.
The debugger could tell you that (i.e. the name of a function, given its address).
The symbol table of an unstripped ELF executable could also help. See nm(1), objdump(1), readelf(1)
Another Linux GNU/libc specific approach could be to use at runtime the dladdr(3) function. Assuming your program is nicely and dynamically linked (e.g. with -rdynamic), it can find the symbol name and the shared object path given some address (of a globally named function).
Of course, if you have only five functions of a given signature, you could compare your address (to the five addresses of them).
Notice that some functions don't have any ((globally visible) names, e.g. static functions.
And some functions could be dlopen-ed and dlsym-ed (e.g. inside plugins). Or their code be synthetized at runtime by some JIT-ing framework (libjit, gccjit, LLVM, asmjit). And the optimizing compiler can (and does!) inline functions, clone them, tail-call them, etc.... so your question might not make any sense in general...
See also backtrace(3) & Ian Taylor's libbacktrace inside GCC.
But in general, your quest is impossible. If you really need such reflective information in a reliable way, manage it yourself (look into Pitrat's CAIA system as an example, or somehow my MELT system), perhaps by generating some code during the build.
To know where a function pointer points is something you'll have to keep track of with your program. Most common is to declare an array of function pointers and use an int variable as index of this array.
That being said, it is nowadays also possible to tell in runtime which function that is currently executed, by using the __func__ identifier:
#include <stdio.h>
typedef const char* func_t (void);
const char* foo (void)
{
// do foo stuff
return __func__;
}
const char* bar (void)
{
// do bar stuff
return __func__;
}
int main (void)
{
func_t* fptr;
fptr = foo;
printf("%s executed\n", fptr());
fptr = bar;
printf("%s executed\n", fptr());
return 0;
}
Output:
foo executed
bar executed
Not at all - the symbolic name of the function disappears after compilation. Unlike a reflective language, C isn't aware of how its syntax elements were named by the programmer; especially, there's no "function lookup" by name after compilation.
You can of course have a "database" (e.g. an array) of function pointers that you can compare your current pointer to.
This is utterly awful and non-portable, but assuming:
You're on Linux or some similar, ELF-based system.
You're using dynamic linking.
The function is in a shared library or you used -rdynamic when linking.
Probably a lot of other assumptions you shouldn't be making...
You can obtain the name of a function by passing its address to the nonstandard dladdr function.
set your linker to output a MAP file.
pause the program
inspect the address contained in the pointer.
look up the address in the MAP file to find out which function is being pointed to.
A pointer to a C function is an address, like any pointer. You can get the value from a debugger. You can cast the pointer to any integer type with enough bits to express it completely, and print it. Any compilation unit that can use the pointer, ie, has the function name in scope, can print the pointer values or compare them to a runtime variable, without touching anything inside the functions themselves.
So today I figured (for the first time admittedly) that int foo() is in fact different from int foo(void) in that the first one allows any number of inputs and the second one allows zero.
Does int foo() simply ignore any given inputs? If so, what's the point of allowing this form of function? If not, how can you access them and how is this different from having a variable argument list (e.g. something like int foo (int argc, ...))?
The key to understanding this is understanding the call stack. What happens when you call a function in C (with the standard x86 ABI — other platforms may vary) is that the caller pushes all of the arguments, in reverse order, onto the stack before calling the callee. Then, the callee can read as many of those arguments as it wants. If you use foo(void), obviously you won't read any. If you use foo(int), you'll be able to read one word into the stack below your current frame.
Using foo() (with no args specified) means that the compiler won't care to check the arguments you pass to foo. Anything goes; there's no contract. This can be useful if foo has two different implementations that take different types as arguments, and I want to be able to switch them out programmatically. (It's also sometimes useful when writing functions in assembler, where you can handle the stack and arguments yourself.) But remember that the compiler is not going to do any type-checking for you, so you have to be very careful.
This is different from foo(int, ...), since in that case, the function actually has a way to access all of the arguments (using varargs). There's no reason ever to actually define a function with foo(), since it's basically equivalent to foo(void). foo() is only useful for declarations.
First, take int foo(). This form is a function declaration that does not provide a prototype - that is, it doesn't specify the number and types of its arguments. It is an archaic form of function declaration - it's how functions were originally declared in pre-ANSI C. Such a function however does have a fixed number and type of arguments - it's just that the declaration doesn't tell the compiler what it is, so you are responsible for getting it right where you call the function. You declare such a function with arguments like so:
int foo(a, b)
int a;
char *b;
{
return a + strlen(b);
}
This form of function declaration is now obsolete.
int foo(void) is a modern declaration that provides a prototype; it declares a function that takes no arguments. int foo(int argc, ...) also provides a prototype; it declares a function that has one fixed integer argument and a variable argument list.
In standard conform C, you have to use int foo(void) if the function does not accept any parameters.
I guess it is compiler dependant what happens, when you pass arguments to a function with empty braces. But I don't think there is a way to access these parameters.
As for main, the only standard conform (pure c) ways to write them are either int main(void) or int main(int argc, char **argv) (or char *argv[] which is the same).
Well, in C++ the two forms are equivalent and they both declare a function that takes no arguments. If you try to call the function and pass in an argument, the compile will give an error.
On the other hand, C and Objective-C both treat the two forms differently. In these languages the first form declares a function that takes an unknown number of arguments, whereas the second form declares a function that takes no arguments at all. So, in C the following is valid code:
int foo() {
return 5;
}
int main() {
return foo(1, 2, 3);
}
The compiler doesn't complain, and the code runs fine (a C++ compiler would give an error on this same code).
Generally what you want in C and Objective-C is to use the second form and include the explicit void to indicate that the function should take no arguments. However, it's more common in C++ to use the first form because it's equivalent and shorter.
Is declaring an header file essential? This code:
main()
{
int i=100;
printf("%d\n",i);
}
seems to work, the output that I get is 100. Even without using stdio.h header file. How is this possible?
You don't have to include the header file. Its purpose is to let the compiler know all the information about stdio, but it's by no means necessary if your compiler is smart (or lazy).
You should include it because it's a good habit to get into - if you don't, then the compiler has no real way to know if you're breaking the rules, such as with:
int main (void) {
puts (7); // should be a string.
return 0;
}
which compiles without issue but rightly dumps core when running. Changing it to:
#include <stdio.h>
int main (void) {
puts (7);
return 0;
}
will result in the compiler warning you with something like:
qq.c:3: warning: passing argument 1 of ‘puts’ makes pointer
from integer without a cast
A decent compiler may warn you about this, such as gcc knowing about what printf is supposed to look like, even without the header:
qq.c:7: warning: incompatible implicit declaration of
built-in function ‘printf’
How is this possible? In short: three pieces of luck.
This is possible because some compilers will make assumptions about undeclared functions. Specifically, parameters are assumed to be int, and the return type also int. Since an int is often the same size as a char* (depending on the architecture), you can get away with passing ints and strings, as the correct size parameter will get pushed onto the stack.
In your example, since printf was not declared, it was assumed to take two int parameters, and you passed a char* and an int which is "compatible" in terms of the invocation. So the compiler shrugged and generated some code that should have been about right. (It really should have warned you about an undeclared function.)
So the first piece of luck was that the compiler's assumption was compatible with the real function.
Then at the linker stage, because printf is part of the C Standard Library, the compiler/linker will automatically include this in the link stage. Since the printf symbol was indeed in the C stdlib, the linker resolved the symbol and all was well. The linking was the second piece of luck, as a function anywhere other than the standard library will need its library linked in also.
Finally, at runtime we see your third piece of luck. The compiler made a blind assumption, the symbol happened to be linked in by default. But - at runtime you could have easily passed data in such a way as to crash your app. Fortunately the parameters matched up, and the right thing ended up occurring. This will certainly not always be the case, and I daresay the above would have probably failed on a 64-bit system.
So - to answer the original question, it really is essential to include header files, because if it works, it is only through blind luck!
As paxidiablo said its not necessary but this is only true for functions and variables but if your header file provides some types or macros (#define) that you use then you must include the header file to use them because they are needed before linking happens i.e during pre-processing or compiling
This is possible because when C compiler sees an undeclared function call (printf() in your case) it assumes that it has
int printf(...)
signature and tries to call it casting all the arguments to int type. Since "int" and "void *" types often have same size it works most of the time. But it is not wise to rely on such behavior.
C supprots three types of function argument forms:
Known fixed arguments: this is when you declare function with arguments: foo(int x, double y).
Unknown fixed arguments: this is when you declare it with empty parentheses: foo() (not be confused with foo(void): it is the first form without arguments), or not declare it at all.
Variable arguments: this is when you declare it with ellipsis: foo(int x, ...).
When you see standard function working then function definition (which is in form 1 or 3) is compatible with form 2 (using same calling convention). Many old std. library functions are so (as desugned to be), because they are there form early versions of C, where was no function declarations and they all was in form 2. Other function may be unintentionally be compatible with form 2, if they have arguments as declared in argument promotion rules for this form. But some may not be so.
But form 2 need programmer to pass arguments of same types everywhere, because compiler not able to check arguments with prototype and have to determine calling convention osing actual passed arguments.
For example, on MC68000 machine first two integer arguments for fixed arg functions (for both forms 1 and 2) will be passed in registers D0 and D1, first two pointers in A0 and A1, all others passed through stack. So, for example function fwrite(const void * ptr, size_t size, size_t count, FILE * stream); will get arguments as: ptr in A0, size in D0, count in D1 and stream in A1 (and return a result in D0). When you included stdio.h it will be so whatever you pass to it.
When you do not include stdio.h another thing happens. As you call fwrite with fwrite(data, sizeof(*data), 5, myfile) compiler looks on argruments and see that function is called as fwrite(*, int, int, *). So what it do? It pass first pointer in A0, first int in D0, second int in D1 and second pointer in A1, so it what we need.
But when you try to call it as fwrite(data, sizeof(*data), 5.0, myfile), with count is of double type, compiler will try to pass count through stack, as it is not integer. But function require is in D1. Shit happens: D1 contain some garbage and not count, so further behaviour is unpredictable. But than you use prototype defined in stdio.h all will be ok: compiler automatically convert this argument to int and pass it as needed. It is not abstract example as double in arument may be just result of computation involving floating point numbers and you may just miss this assuming result is int.
Another example is variable argument function (form 3) like printf(char *fmt, ...). For it calling convention require last named argument (fmt here) to be passed through stack regardess of its type. So, then you call printf("%d", 10) it will put pointer to "%d" and number 10 on stack and call function as need.
But when you do not include stdio.h comiler will not know that printf is vararg function and will suppose that printf("%d", 10) is calling to function with fixed arguments of type pointer and int. So MC68000 will place pointer to A0 and int to D0 instead of stack and result is again unpredictable.
There may be luck that arguments was previously on stack and occasionally read there and you get correct result... this time... but another time is will fail. Another luck is that compiler takes care if not declared function may be vararg (and somehow makes call compatible with both forms). Or all arguments in all forms are just passed through stack on your machine, so fixed, unknown and vararg forms are just called identically.
So: do not do this even you feel lucky and it works. Unknown fixed argument form is there just for compatibility with old code and is strictly discouraged to use.
Also note: C++ will not allow this at all, as it require function to be declared with known arguments.
I am trying to do something like the following
enum types {None, Bool, Short, Char, Integer, Double, Long, Ptr};
int main(int argc, char ** args) {
enum types params[10] = {0};
void* triangle = dlopen("./foo.so", RTLD_LAZY);
void * fun = dlsym(triangle, ars[1]);
<<pseudo code>>
}
Where pseudo code is something like
fun = {}
for param in params:
if param == None:
fun += void
if param == Bool:
fun += Boolean
if param == Integer:
fun += int
...
returnVal = fun.pop()
funSignature = returnval + " " + funName + "(" + Riffle(fun, ",") + ")"
exec funSignature
Thank you
Actually, you can do nearly all you want. In C language (unlike C++, for example), the functions in shared objects are referenced merely by their names. So, to find--and, what is most important, to call--the proper function, you don't need its full signature. You only need its name! It's both an advantage and disadvantage --but that's the nature of a language you chose.
Let me demonstrate, how it works.
#include <dlfcn.h>
typedef void* (*arbitrary)();
// do not mix this with typedef void* (*arbitrary)(void); !!!
int main()
{
arbitrary my_function;
// Introduce already loaded functions to runtime linker's space
void* handle = dlopen(0,RTLD_NOW|RTLD_GLOBAL);
// Load the function to our pointer, which doesn't know how many arguments there sould be
*(void**)(&my_function) = dlsym(handle,"something");
// Call something via my_function
(void) my_function("I accept a string and an integer!\n",(int)(2*2));
return 0;
}
In fact, you can call any function that way. However, there's one drawback. You actually need to know the return type of your function in compile time. By default, if you omit void* in that typedef, int is assumed as return type--and, yes, it's a correct C code. The thing is that the compiler needs to know the size of the return type to operate the stack properly.
You can workaround it by tricks, for example, by pre-declaring several function types with different sizes of return types in advance and then selecting which one you actually are going to call. But the easier solution is to require functions in your plugin to return void* or int always; the actual result being returned via pointers given as arguments.
What you must ensure is that you always call the function with the exact number and types of arguments it's supposed to accept. Pay closer attention to difference between different integer types (your best option would be to explicitly cast arguments to them).
Several commenters reported that the code above is not guaranteed to work for variadic functions (such as printf).
What dlsym() returns is normally a function pointer - disguised as a void *. (If you ask it for the name of a global variable, it will return you a pointer to that global variable, too.)
You then invoke that function just as you might using any other pointer to function:
int (*fun)(int, char *) = (int (*)(int, char *))dlsym(triangle, "function");
(*fun)(1, "abc"); # Old school - pre-C89 standard, but explicit
fun(1, "abc"); # New school - C89/C99 standard, but implicit
I'm old school; I prefer the explicit notation so that the reader knows that 'fun' is a pointer to a function without needing to see its declaration. With the new school notation, you have to remember to look for a variable 'fun' before trying to find a function called 'fun()'.
Note that you cannot build the function call dynamically as you are doing - or, not in general. To do that requires a lot more work. You have to know ahead of time what the function pointer expects in the way of arguments and what it returns and how to interpret it all.
Systems that manage more dynamic function calls, such as Perl, have special rules about how functions are called and arguments are passed and do not call (arguably cannot call) functions with arbitrary signatures. They can only call functions with signatures that are known about in advance. One mechanism (not used by Perl) is to push the arguments onto a stack, and then call a function that knows how to collect values off the stack. But even if that called function manipulates those values and then calls an arbitrary other function, that called function provides the correct calling sequence for the arbitrary other function.
Reflection in C is hard - very hard. It is not undoable - but it requires infrastructure to support it and discipline to use it, and it can only call functions that support the infrastructure's rules.
The Proper Solution
Assuming you're writing the shared libraries; the best solution I've found to this problem is strictly defining and controlling what functions are dynamically linked by:
Setting all symbols hidden
for example clang -dynamiclib Person.c -fvisibility=hidden -o libPerson.dylib when compiling with clang
Then using __attribute__((visibility("default"))) and extern "C" to selectively unhide and include functions
Profit! You know what the function's signature is. You wrote it!
I found this in Apple's Dynamic Library Design Guidelines. These docs also include other solutions to the problem above was just my favorite.
The Answer to your Question
As stated in previous answers, C and C++ functions with extern "C" in their definition aren't mangled so the function's symbols simply don't include the full function signature. If you're compiling with C++ without extern "C" however functions are mangled so you could demangle them to get the full function's signature (with a tool like demangler.com or a c++ library). See here for more details on what mangling is.
Generally speaking it's best to use the first option if you're trying to import functions with dlopen.