Related
I previously asked a question about C functions which take an unspecified number of parameters e.g. void foo() { /* code here */ } and which can be called with an unspecified number of arguments of unspecified type.
When I asked whether it is possible for a function like void foo() { /* code here */ } to get the parameters with which it was called e.g. foo(42, "random") somebody said that:
The only you can do is to use the calling conventions and knowledge of the architecture you are running at and get parameters directly from the stack. source
My question is:
If I have this function
void foo()
{
// get the parameters here
};
And I call it: foo("dummy1", "dummy2") is it possible to get the 2 parameters inside the foo function directly from the stack?
If yes, how? Is it possible to have access to the full stack? For example if I call a function recursively, is it possible to have access to each function state somehow?
If not, what's the point with the functions with unspecified number of parameters? Is this a bug in the C programming language? In which cases would anyone want foo("dummy1", "dummy2") to compile and run fine for a function which header is void foo()?
Lots of 'if's:
You stick to one version of a compiler.
One set of compiler options.
Somehow manage to convince your compiler to never pass arguments in registers.
Convince your compiler not to treat two calls f(5, "foo") and f(&i, 3.14) with different arguments to the same function as error. (This used to be a feature of, for example, the early DeSmet C compilers).
Then the activation record of a function is predictable (ie you look at the generated assembly and assume it will always be the same): the return address will be there somewhere and the saved bp (base pointer, if your architecture has one), and the sequence of the arguments will be the same. So how would you know what actual parameters were passed? You will have to encode them (their size, offset), presumably in the first argument, sort of what printf does.
Recursion (ie being in a recursive call makes no difference) each instance has its activation record (did I say you have to convince your compiler never optimise tail calls?), but in C, unlike in Pascal, you don't have a link backwards to the caller's activation record (ie local variables) since there are no nested function declarations. Getting access to the full stack ie all the activation records before the current instance is pretty tedious, error prone and mostly interest to writers of malicious code who would like to manipulate the return address.
So that's a lot of hassle and assumptions for essentially nothing.
Yes you can access passed parameters directly via stack. But no, you can't use old-style function definition to create function with variable number and type of parameters. Following code shows how to access a param via stack pointer. It is totally platform dependent , so i have no clue if it going to work on your machine or not, but you can get the idea
long foo();
int main(void)
{
printf( "%lu",foo(7));
}
long foo(x)
long x;
{
register void* sp asm("rsp");
printf("rsp = %p rsp_ value = %lx\n",sp+8, *((long*)(sp + 8)));
return *((long*)(sp + 8)) + 12;
}
get stack head pointer (rsp register on my machine)
add the offset of passed parameter to rsp => you get pointer to long x on stack
dereference the pointer, add 12 (do whatever you need) and return the value.
The offset is the issue since it depends on compiler, OS, and who knows on what else.
For this example i simple checked checked it in debugger, but if it really important for you i think you can come with some "general" for your machine solution.
If you declare void foo(), then you will get a compilation error for foo("dummy1", "dummy2").
You can declare a function that takes an unspecified number of arguments as follows (for example):
int func(char x,...);
As you can see, at least one argument must be specified. This is so that inside the function, you will be able to access all the arguments that follow the last specified argument.
Suppose you have the following call:
short y = 1000;
int sum = func(1,y,5000,"abc");
Here is how you can implement func and access each of the unspecified arguments:
int func(char x,...)
{
short y = (short)((int*)&x+1)[0]; // y = 1000
int z = (int )((int*)&x+2)[0]; // z = 5000
char* s = (char*)((int*)&x+3)[0]; // s[0...2] = "abc"
return x+y+z+s[0]; // 1+1000+5000+'a' = 6098
}
The problem here, as you can see, is that the type of each argument and the total number of arguments are unknown. So any call to func with an "inappropriate" list of arguments, may (and probably will) result in a runtime exception.
Hence, typically, the first argument is a string (const char*) which indicates the type of each of the following arguments, as well as the total number of arguments. In addition, there are standard macros for extracting the unspecified arguments - va_start and va_end.
For example, here is how you can implement a function similar in behavior to printf:
void log_printf(const char* data,...)
{
static char str[256] = {0};
va_list args;
va_start(args,data);
vsnprintf(str,sizeof(str),data,args);
va_end(args);
fprintf(global_fp,str);
printf(str);
}
P.S.: the example above is not thread-safe, and is only given here as an example...
As you can see from the code snippet below, I have declared one char variable and one int variable. When the code gets compiled, it must identify the data types of variables str and i.
Why do I need to tell again during scanning my variable that it's a string or integer variable by specifying %s or %d to scanf? Isn't the compiler mature enough to identify that when I declared my variables?
#include <stdio.h>
int main ()
{
char str [80];
int i;
printf ("Enter your family name: ");
scanf ("%s",str);
printf ("Enter your age: ");
scanf ("%d",&i);
return 0;
}
Because there's no portable way for a variable argument functions like scanf and printf to know the types of the variable arguments, not even how many arguments are passed.
See C FAQ: How can I discover how many arguments a function was actually called with?
This is the reason there must be at least one fixed argument to determine the number, and maybe the types, of the variable arguments. And this argument (the standard calls it parmN, see C11(ISO/IEC 9899:201x) §7.16 Variable arguments ) plays this special role, and will be passed to the macro va_start. In another word, you can't have a function with a prototype like this in standard C:
void foo(...);
The reason why the compiler can not provide the necessary information is simply, because the compiler is not involved here. The prototype of the functions doesn't specify the types, because these functions have variable types. So the actual data types are not determined at compile time, but at runtime.
The function then takes one argument from the stack, after the other. These values don't have any type information associated with it, so the only way, the function knows how to interpret the data is, by using the caller provided information, which is the format string.
The functions themselves don't know which data types are passed in, nor do they know the number of arguments passed, so there is no way that printf can decide this on it's own.
In C++ you can use operator overloading, but this is an entire different mechanism. Because here the compiler chooses the appropriate function based on the datatypes and available overloaded function.
To illustrate this, printf, when compiled looks like this:
push value1
...
push valueN
push format_string
call _printf
And the prototype of printf is this:
int printf ( const char * format, ... );
So there is no type information carried over, except what is provided in the format string.
printf is not an intrinsic function. It's not part of the C language per se. All the compiler does is generate code to call printf, passing whatever parameters. Now, because C does not provide reflection as a mechanism to figure out type information at run time, the programmer has to explicitly provide the needed info.
Compiler may be smart, but functions printf or scanf are stupid - they do not know what is the type of the parameter do you pass for every call. This is why you need to pass %s or %d every time.
The first parameter is a format string. If you're printing a decimal number, it may look like:
"%d" (decimal number)
"%5d" (decimal number padded to width 5 with spaces)
"%05d" (decimal number padded to width 5 with zeros)
"%+d" (decimal number, always with a sign)
"Value: %d\n" (some content before/after the number)
etc, see for example Format placeholders on Wikipedia to have an idea what format strings can contain.
Also there can be more than one parameter here:
"%s - %d" (a string, then some content, then a number)
Isn't the compiler matured enough to identify that when I declared my
variable?
No.
You're using a language specified decades ago. Don't expect modern design aesthetics from C, because it's not a modern language. Modern languages will tend to trade a small amount of efficiency in compilation, interpretation or execution for an improvement in usability or clarity. C hails from a time when computer processing time was expensive and in highly limited supply, and its design reflects this.
It's also why C and C++ remain the languages of choice when you really, really care about being fast, efficient or close to the metal.
scanf as prototype int scanf ( const char * format, ... ); says stores given data according to the parameter format into the locations pointed by the additional arguments.
It is not related with compiler, it is all about syntax defined for scanf.Parameter format is required to let scanf know about the size to reserve for data to be entered.
GCC (and possibly other C compilers) keep track of argument types, at least in some situations. But the language is not designed that way.
The printf function is an ordinary function which accepts variable arguments. Variable arguments require some kind of run-time-type identification scheme, but in the C language, values do not carry any run time type information. (Of course, C programmers can create run-time-typing schemes using structures or bit manipulation tricks, but these are not integrated into the language.)
When we develop a function like this:
void foo(int a, int b, ...);
we can pass "any" number of additional arguments after the second one, and it is up to us to determine how many there are and what are their types using some sort of protocol which is outside of the function passing mechanism.
For instance if we call this function like this:
foo(1, 2, 3.0);
foo(1, 2, "abc");
there is no way that the callee can distinguish the cases. There are just some bits in a parameter passing area, and we have no idea whether they represent a pointer to character data or a floating point number.
The possibilities for communicating this type of information are numerous. For example in POSIX, the exec family of functions use variable arguments which have all the same type, char *, and a null pointer is used to indicate the end of the list:
#include <stdarg.h>
void my_exec(char *progname, ...)
{
va_list variable_args;
va_start (variable_args, progname);
for (;;) {
char *arg = va_arg(variable_args, char *);
if (arg == 0)
break;
/* process arg */
}
va_end(variable_args);
/*...*/
}
If the caller forgets to pass a null pointer terminator, the behavior will be undefined because the function will keep invoking va_arg after it has consumed all of the arguments. Our my_exec function has to be called like this:
my_exec("foo", "bar", "xyzzy", (char *) 0);
The cast on the 0 is required because there is no context for it to be interpreted as a null pointer constant: the compiler has no idea that the intended type for that argument is a pointer type. Furthermore (void *) 0 isn't correct because it will simply be passed as the void * type and not char *, though the two are almost certainly compatible at the binary level so it will work in practice. A common mistake with that type of exec function is this:
my_exec("foo", "bar", "xyzzy", NULL);
where the compiler's NULL happens to be defined as 0 without any (void *) cast.
Another possible scheme is to require the caller to pass down a number which indicates how many arguments there are. Of course, that number could be incorrect.
In the case of printf, the format string describes the argument list. The function parses it and extracts the arguments accordingly.
As mentioned at the outset, some compilers, notably the GNU C Compiler, can parse format strings at compile time and perform static type checking against the number and types of arguments.
However, note that a format string can be other than a literal, and may be computed at run
time, which is impervious to such type checking schemes. Fictitious example:
char *fmt_string = message_lookup(current_language, message_code);
/* no type checking from gcc in this case: fmt_string could have
four conversion specifiers, or ones not matching the types of
arg1, arg2, arg3, without generating any diagnostic. */
snprintf(buffer, sizeof buffer, fmt_string, arg1, arg2, arg3);
It is because this is the only way to tell the functions (like printf scanf) that which type of value you are passing. for example-
int main()
{
int i=22;
printf("%c",i);
return 0;
}
this code will print character not integer 22. because you have told the printf function to treat the variable as char.
printf and scanf are I/O functions that are designed and defined in a way to receive a control string and a list of arguments.
The functions does not know the type of parameter passed to it , and Compiler also cant pass this information to it.
Because in the printf you're not specifying data type, you're specifying data format. This is an important distinction in any language, and it's doubly important in C.
When you scan in a string with with %s, you're not saying "parse a string input for my string variable." You can't say that in C because C doesn't have a string type. The closest thing C has to a string variable is a fixed-size character array that happens to contain a characters representing a string, with the end of string indicated by a null character. So what you're really saying is "here's an array to hold the string, I promise it's big enough for the string input I want you to parse."
Primitive? Of course. C was invented over 40 years ago, when a typical machine had at most 64K of RAM. In such an environment, conserving RAM had a higher priority than sophisticated string manipulation.
Still, the %s scanner persists in more advanced programming environments, where there are string data types. Because it's about scanning, not typing.
I am writing a generic test function that will accept a function address (read from a map file) and arguments as comma separated data as arguments from a socket.
I am able to implement it for known function pointers.
like
void iif(int a, int b, float f);
typedef void (*fn_t)(int a, int b, float f);
With above approach I would write function pointers for all types of function implementation in the code base. Is there any generic way to do this?
No, since the compiler needs to know how to represent the arguments. It can't know that for a function pointer type that excludes the information, and thus it can't generate the call.
Functions with a small number of parameters might pass them in CPU registers, "spilling over" to the stack when many parameters are called for, for instance.
You can use varargs to come around this, doing so essentially "locks down" the way the arguments are passed. Of course, it forces the called functions to deal with varargs, which is not very convenient.
You can do the following.
fn_t fncptr;
fncptr= MapAddress + 0x(offset);
MapAdress is where you map file to memory address. (You can cast to DWORD before, if C++ compiler fails to add offset to void) Offset is where the function code in file. But rememder, you will need exetuce address to pointer in windows is PAGE_EXETUCE_READWRITE. ThenCall it like,
fncptr(arg1, arg2, arg3);
if compiler fails in first code, do this:
fn_t fncptr;
fncptr= (fn_t)((DWORD)MapAddress + 0x(offset));
Is declaring an header file essential? This code:
main()
{
int i=100;
printf("%d\n",i);
}
seems to work, the output that I get is 100. Even without using stdio.h header file. How is this possible?
You don't have to include the header file. Its purpose is to let the compiler know all the information about stdio, but it's by no means necessary if your compiler is smart (or lazy).
You should include it because it's a good habit to get into - if you don't, then the compiler has no real way to know if you're breaking the rules, such as with:
int main (void) {
puts (7); // should be a string.
return 0;
}
which compiles without issue but rightly dumps core when running. Changing it to:
#include <stdio.h>
int main (void) {
puts (7);
return 0;
}
will result in the compiler warning you with something like:
qq.c:3: warning: passing argument 1 of ‘puts’ makes pointer
from integer without a cast
A decent compiler may warn you about this, such as gcc knowing about what printf is supposed to look like, even without the header:
qq.c:7: warning: incompatible implicit declaration of
built-in function ‘printf’
How is this possible? In short: three pieces of luck.
This is possible because some compilers will make assumptions about undeclared functions. Specifically, parameters are assumed to be int, and the return type also int. Since an int is often the same size as a char* (depending on the architecture), you can get away with passing ints and strings, as the correct size parameter will get pushed onto the stack.
In your example, since printf was not declared, it was assumed to take two int parameters, and you passed a char* and an int which is "compatible" in terms of the invocation. So the compiler shrugged and generated some code that should have been about right. (It really should have warned you about an undeclared function.)
So the first piece of luck was that the compiler's assumption was compatible with the real function.
Then at the linker stage, because printf is part of the C Standard Library, the compiler/linker will automatically include this in the link stage. Since the printf symbol was indeed in the C stdlib, the linker resolved the symbol and all was well. The linking was the second piece of luck, as a function anywhere other than the standard library will need its library linked in also.
Finally, at runtime we see your third piece of luck. The compiler made a blind assumption, the symbol happened to be linked in by default. But - at runtime you could have easily passed data in such a way as to crash your app. Fortunately the parameters matched up, and the right thing ended up occurring. This will certainly not always be the case, and I daresay the above would have probably failed on a 64-bit system.
So - to answer the original question, it really is essential to include header files, because if it works, it is only through blind luck!
As paxidiablo said its not necessary but this is only true for functions and variables but if your header file provides some types or macros (#define) that you use then you must include the header file to use them because they are needed before linking happens i.e during pre-processing or compiling
This is possible because when C compiler sees an undeclared function call (printf() in your case) it assumes that it has
int printf(...)
signature and tries to call it casting all the arguments to int type. Since "int" and "void *" types often have same size it works most of the time. But it is not wise to rely on such behavior.
C supprots three types of function argument forms:
Known fixed arguments: this is when you declare function with arguments: foo(int x, double y).
Unknown fixed arguments: this is when you declare it with empty parentheses: foo() (not be confused with foo(void): it is the first form without arguments), or not declare it at all.
Variable arguments: this is when you declare it with ellipsis: foo(int x, ...).
When you see standard function working then function definition (which is in form 1 or 3) is compatible with form 2 (using same calling convention). Many old std. library functions are so (as desugned to be), because they are there form early versions of C, where was no function declarations and they all was in form 2. Other function may be unintentionally be compatible with form 2, if they have arguments as declared in argument promotion rules for this form. But some may not be so.
But form 2 need programmer to pass arguments of same types everywhere, because compiler not able to check arguments with prototype and have to determine calling convention osing actual passed arguments.
For example, on MC68000 machine first two integer arguments for fixed arg functions (for both forms 1 and 2) will be passed in registers D0 and D1, first two pointers in A0 and A1, all others passed through stack. So, for example function fwrite(const void * ptr, size_t size, size_t count, FILE * stream); will get arguments as: ptr in A0, size in D0, count in D1 and stream in A1 (and return a result in D0). When you included stdio.h it will be so whatever you pass to it.
When you do not include stdio.h another thing happens. As you call fwrite with fwrite(data, sizeof(*data), 5, myfile) compiler looks on argruments and see that function is called as fwrite(*, int, int, *). So what it do? It pass first pointer in A0, first int in D0, second int in D1 and second pointer in A1, so it what we need.
But when you try to call it as fwrite(data, sizeof(*data), 5.0, myfile), with count is of double type, compiler will try to pass count through stack, as it is not integer. But function require is in D1. Shit happens: D1 contain some garbage and not count, so further behaviour is unpredictable. But than you use prototype defined in stdio.h all will be ok: compiler automatically convert this argument to int and pass it as needed. It is not abstract example as double in arument may be just result of computation involving floating point numbers and you may just miss this assuming result is int.
Another example is variable argument function (form 3) like printf(char *fmt, ...). For it calling convention require last named argument (fmt here) to be passed through stack regardess of its type. So, then you call printf("%d", 10) it will put pointer to "%d" and number 10 on stack and call function as need.
But when you do not include stdio.h comiler will not know that printf is vararg function and will suppose that printf("%d", 10) is calling to function with fixed arguments of type pointer and int. So MC68000 will place pointer to A0 and int to D0 instead of stack and result is again unpredictable.
There may be luck that arguments was previously on stack and occasionally read there and you get correct result... this time... but another time is will fail. Another luck is that compiler takes care if not declared function may be vararg (and somehow makes call compatible with both forms). Or all arguments in all forms are just passed through stack on your machine, so fixed, unknown and vararg forms are just called identically.
So: do not do this even you feel lucky and it works. Unknown fixed argument form is there just for compatibility with old code and is strictly discouraged to use.
Also note: C++ will not allow this at all, as it require function to be declared with known arguments.
I came across the following function signature and I wondered if this (the ellipsis, or "...") is some kind of polymorphism?
#include <fcntl.h>
int fcntl(int fd, int cmd, ... );
Thanks in advance.
It's a variable argument list.
That is a variadic function. See stdarg.h for more details.
The ... means that you can pass any number of arguments to this function, as other commenters have already mentioned. Since the optional arguments are not typed, the compiler cannot check the types and you can technically pass in any argument of any type.
So does this mean you can use this to implement some kind of polymorphic function? (I.e., a function that performs some operation based on the type of its arguments.)
No.
The reason you cannot do this, is because you cannot at runtime inspect the types of the arguments passed in. The function reading in the variable argument list is expected to already know the types of the optional arguments it is going to receive.
In case of a function that really is supposed to be able to take any number of arguments of any type (i.e., printf), the types of the arguments are passed in via the format string. This means that the caller has to specify the types it is going to pass in at every invocation, removing the benefit of polymorphic functions (that the caller doesn't have to know the types either).
Compare:
// Ideal invocation
x = multiply(number_a, number_b)
y = multiply(matrix_a, matrix_b)
// Standard C invocation
x = multiply_number(number_a, number_b)
y = multiply_matrix(matrix_a, matrix_b)
// Simulated "polymorphism" with varargs
x = multiply(T_NUMBER, number_a, number_b)
y = multiply(T_MATRIX, matrix_a, matrix_b)
You have to specify the type before the varargs function can do the right thing, so this gains you nothing.
No, that's the "ellipsis" you're seeing there, assuming you're referring to the ... part of the declaration.
Basically it says that this function takes an unknown number of arguments after the first two that are specified there.
The function has to be written in such a way that it knows what to expect, otherwise strange results will ensue.
For other functions that support this, look at the printf function and its variants.
Does C support polymorphism?
No, it doesn't.
However there are several libraries, such as Python C API, that implements a rough variant of polymorphism using structs and pointers. Beware that compiler cannot perform appropriate type checking in most cases.
The tecnhique is simple:
typedef struct {
char * (*to_string)();
} Type;
#define OBJ_HEADER Type *ob_type
typedef struct {
OBJ_HEADER;
} Object;
typedef struct {
OBJ_HEADER;
long ival;
} Integer;
typedef struct {
OBJ_HEADER;
char *name;
char *surname;
} Person;
Integer and Person get a Type object with appropriate function pointers (e.g. to functions like integer_to_string and person_to_string).
Now just declare a function accepting an Object *:
void print(Object *obj) {
printf("%s", obj->type->to_string());
}
now you can call this function with both an Integer and a Person:
Integer *i = make_int(10);
print((Object *) i);
Person *p = make_person("dfa");
print((Object *) p);
EDIT
alternatively you can declare i and p as Object *; of course make_int and make_person will allocate space for Integer and Person and do the appropriate cast:
Object *
make_integer(long i) {
Integer *ob = malloc(sizeof(Integer));
ob->ob_type = &integer_type;
ob->ival = i;
return (Object *) ob;
}
NB: I cannot compile these examples rigth now, please doublecheck them.
I came across the following function signature and I wondered if this (the ellipsis, or "...") is some kind of polymorphism?
yes, it is a primitive form of polymorphism. With only one function signature you are able to pass various structures. However the compiler cannot help you with detecting type errors.
Adding to what's been said: C supports polymorphism through other means. For example, take the standard library qsort function which sorts data of arbitrary type.
It is able to do so by means of untyped (void) pointers to the data. It also needs to know the size of the data to sort (provided via sizeof) and the logic that compares the objects' order. This is accomplished by passing a function pointer to the qsort function.
This is a prime example of runtime polymorphism.
There are other ways to implement object-oriented behaviour (in particular, virtual function calls) by managing the virtual function tables manually. This can be done by storing function pointers in structures and passing them around. Many APIs do so, e.g. the WinAPI, which even uses advanced aspects of object orientation, e.g. base class call dispatch (DefWindowProc, to simulate calling the virtual method of the base class).
I assume you are referring to the ellipsis (...)? If so this indicates that 0 or more parameters will follow. It is called varargs, defined in stdarg.h
http://msdn.microsoft.com/en-us/library/kb57fad8.aspx
printf uses this functionality. Without it you wouldn't be able to keep adding parameters to the end of the function.
C supports a crude form of Polymorphism. I.e. a type being able to appear and behave as another type. It works in a similar was as in C++ under the hood (relying on memory being aligned) but you have to help the compiler out by casting. E.g. you can define a struct:
typedef struct {
char forename[20];
char surname[20];
} Person;
And then another struct:
typedef struct {
char forename[20];
char surname[20];
float salary;
char managername[20];
} Employee;
Then
int main (int argc, int *argv)
{
Employee Ben;
setpersonname((Person *) &Ben);
}
void setpersonname(Person *person)
{
strcpy(person->forename,"Ben");
}
The above example shows Employee being used as a Person.
No, it is a function that is taking variable number of arguments.
That is not technically polymorphism. fcntl takes variable number of arguments & that is the reason for the ... similar to printf function.
C neither supports function overloading - which is a type of ad-hoc polymorphism based on compile-time types - nor multiple dispatch (ie overloading based on runtime types).
To simulate function overloading in C, you have to create multiple differently named functions. The functions' names often contain the type information, eg fputc() for characters and fputs() for strings.
Multiple dispatch can be implemented by using variadic macros. Again, it's the programmer's job to provide the type information, but this time via an extra argument, which will be evaluated at runtime - in contrast to the compile-time function name in case of the approach given above. The printf() family of functions might not be the best example for multiple dispatch, but I can't think of a better one right now.
Other approaches to multiple dispatch using pointers instead of variadic functions or wrapping values in structures to provide type annotations exist.
The printf declaration in the standard library is
int printf(const char*, ...);
Think about that.
You can write code that supports Polymorphic behavior in C, but the ... (ellipsis) is not going to be much help. That is for variable arguments to a function.
If you want polymorphic behavior you can use, unions and structures to construct a data structure that has a "type" section and variable fields depending on type. You can also include tables of function pointers in the structures. Poof! You've invented C++.
Yes C Do support the polymorphism
the Code which we write in the C++ using virtual to implement the polymorphism
if first converted to a C code by Compiler (one can find details here).
It's well known that virtual functionality in C++ is implemented using function pointers.