I have a 3rd party function with signature:
int secretfoo(int numargs, ...);
I can call it directly, but what I really want is wrap it with my function that adds some extra arguments to it.
Assume simple case of integers: I want calls secretfoo(2, 10, 20) to be translated as this: when I see argument 10 to duplicate it and make the call: secretfoo(3, 10, 10, 20). I want to do it in wrapper:
int foowrapper(int numargs, ...);
This wrapper analyze argumetns and call secretfoo as described above.
Can this be done in portably with va_list / va_arg etc.? Any other way?
There is no portable way to manipulate the arguments in a variable argument list directly, because it is highly platform dependent how such arguments are passed into the function. And on most hardware architectures, there is absolutely no way to insert additional arguments in the middle or the end of the list.
If there is a practical upper limit to the number of arguments, then it could be done by extracting all the arguments to foowrapper and 'manually' building the new argument list for the call to secretfoo.
The code would look something like this:
int foowrapper(int numarg, ...)
{
va_list args
int newargs[numarg*2]; /* worst case allocation */
int numnewargs = 0;
/* Extract the arguments */
va_start(numarg, args);
for (int i=0; i<numarg; i++)
{
newargs[numnewargs++] = va_arg(args, int);
/* duplicate value 10 as you encounter it */
if (newargs[numnewargs-1] == 10)
{
newargs[numnewargs++] = 10;
}
}
/* Forward to the secretfoo function */
switch (numnewargs)
{
case 0: return secretfoo(0);
case 1: return secretfoo(1, newargs[0]);
case 2: return secretfoo(2, newargs[0], newargs[1]);
/* etc... */
}
}
I'm afraid it can't be done portably. stdarg.h "defines four macros" (latest C standard draft): va_start, va_end, va_arg and va_copy. None of these can be used to convert a va_list back to a variable number of values, other than one-by-one.
Your third party library should have supplied a function vsecretfoo(int, va_list), like the standard library does for these cases (vprintf, etc.).
Related
I am trying to understand how variable length arguments work in C.
Basically when a variable length argument function(ex: printf(const char *format, ...);) is called, where the arguments are copied (stack/register?)? and how the called function gets the information about the arguments passed by calling function?
I highly appreciate any form of help.
Thanks in advance.
The use of variable arguments list is a standard feature of 'C' language, and as such must be enforced on any machine for which exist a C compiler.
When we say any machine we mean that independently from the way used for parameters passing, registers, stack or both, we must have the feature.
In effect what is really needed to implement the functionality is the deterministic nature of the process. It is not relevant if parameters are passed in stack, register, both, or other MCU custom ways, what is important is that the way it is done is well defined and always the same.
If this property is respected we are sure that we can always walk the parameters list, and access each of them.
Actually the method used to pass parameters for each machine or system, is specified in the ABI (Application Binary Interface, see https://en.wikipedia.org/wiki/Application_binary_interface), following the rules, in reverse, you can always backtrack parameters.
Anyway on some system, the vast majority, the simple reverse engineering of the ABI isn't sufficient to recover parameters, i.e. parameter sizes different from standard CPU register/stack size, in this case you need more info about the parameter you are looking for: the operand size.
Let review the variable parameter handling in C. First you declare a function having a single parameter of type integer, holding the count of parameters passed as variable arguments, and the 3 dots for variable part:
int foo(int cnt, ...);
To access variable arguments normally you use the definitions in <stdarg.h> header in the following way:
int foo(int cnt, ...)
{
va_list ap; //pointer used to iterate through parameters
int i, val;
va_start(ap, cnt); //Initialize pointer to the last known parameter
for (i=0; i<cnt; i++)
{
val = va_arg(ap, int); //Retrieve next parameter using pointer and size
printf("%d ", val); // Print parameter, an integer
}
va_end(ap); //Release pointer. Normally do_nothing
putchar('\n');
}
On a stack based machine (i.e. x86-32bits) where the parameters are pushed sequentially the code above works more or less as the following:
int foo(int cnt, ...)
{
char *ap; //pointer used to iterate through parameters
int i, val;
ap = &cnt; //Initialize pointer to the last known parameter
for (i=0; i<cnt; i++)
{
/*
* We are going to update pointer to next parameter on the stack.
* Please note that here we simply add int size to pointer because
* normally the stack word size is the same of natural integer for
* that machine, but if we are using different type we **must**
* adjust pointer to the correct stack bound by rounding to the
* larger multiply size.
*/
ap = (ap + sizeof(int));
val = *((int *)ap); //Retrieve next parameter using pointer and size
printf("%d ", val); // Print parameter, an integer
}
putchar('\n');
}
Please note that if we access types different from int e/o having size different from native stack word size, the pointer must be adjusted to always increase of a multiple of stack word size.
Now consider a machine that use registers to pass parameters, for simplicity we consider that no operand could be larger than a register size, and that the allocation is made using the registers sequentially (also note the pseudo assembler instruction mov val, rx that loads the variable val with contents of register rx):
int foo(int cnt, ...)
{
int ap; //pointer used to iterate through parameters
int i, val;
/*
* Initialize pointer to the last known parameter, in our
* case the first in the list (see after why)
*/
ap = 1;
for (i=0; i<cnt; i++)
{
/*
* Retrieve next parameter
* The code below obviously isn't real code, but should give the idea.
*/
ap++; //Next parameter
switch(ap)
{
case 1:
__asm mov val, r1; //Get value from register
break;
case 2:
__asm mov val, r2;
break;
case 3:
__asm mov val, r3;
break;
.....
case n:
__asm mov val, rn;
break;
}
printf("%d ", val); // Print parameter, an integer
}
putchar('\n');
}
Hope the concept is clear enough now.
Traditionally, the arguments were "always" push on the stack, regardless of other register passing optimisations, and then va_list was basically just a pointer into the stack to identify the next argument to va_arg. However, register passing is so favoured on new processors and compiler optimisation settings, that even varargs are put as registers.
With this, va_list becomes a small data structure (or a pointer to that data structure) which captures all those register arguments, /and/ has a pointer into the stack, if the number of arguments are too many. The va_arg macro first steps through the captured registers, then steps through the stack entries, so va_list also has a "current index".
Note that at least in the gcc implementation va_list is a hybrid object: When declared in the body it is an instance of the structure, but when passed as an argument, it magically becomes a pointer, like a C++ reference even though C doesn't have the concept of references.
In some platforms va_list also allocates some dynamic memory, which is why you should always call va_end.
where the arguments are copied (stack/register?)?
It varies. On x64 normal conventions are used: the first few arguments (depending on type) probably go into registers, and other arguments go onto the stack. The C standard requires that the compiler support at least 127 arguments to a function, so it's inevitable that some of them are going to go on the stack.
how the called function gets the information about the arguments passed by calling function?
By using the initial arguments, such as the printf format string. The varargs support facilities in C doesn't allow the function to inspect the number and types of arguments, only to get them one at a time (and if they're improperly casted, or if more arguments are accessed than were passed, the result is undefined behavior).
Most implementations push the arguments on the stack, using register won't work well on register-starved architectures or if there's more arguments than registers generally.
And the called function doesn't know anything at all about the arguments, their count or their types. That's why e.g. printf and related functions use format specifiers. The called function will then interpret the next part of the stack according to that format specifier (using the va_arg "function").
If the type fetched by va_arg doesn't match the actual type of the argument, you will have undefined behavior.
As extracted from ABI document, The method to store all the arguments is provided by the ABI document of an architecture.
Reference Link: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf (page number 56).
The Register Save Area:
The prologue of a function taking a variable argument list and known to call the
macro va_start is expected to save the argument registers to the register save area. Each argument register has a fixed offset in the register save area.
C h\s the standard mechanisms to access those parameters. Macros are defined in the stdarg.h
http://www.cse.unt.edu/~donr/courses/4410/NOTES/stdarg/
here you have a very simple implementation of the sniprintf
int ts_formatstring(char *buf, size_t maxlen, const char *fmt, va_list va)
{
char *start_buf = buf;
maxlen--;
while(*fmt && maxlen)
{
/* Character needs formating? */
if (*fmt == '%')
{
switch (*(++fmt))
{
case 'c':
*buf++ = va_arg(va, int);
maxlen--;
break;
case 'd':
case 'i':
{
signed int val = va_arg(va, signed int);
if (val < 0)
{
val *= -1;
*buf++ = '-';
maxlen--;
}
maxlen = ts_itoa(&buf, val, 10, maxlen);
}
break;
case 's':
{
char * arg = va_arg(va, char *);
while (*arg && maxlen)
{
*buf++ = *arg++;
maxlen--;
}
}
break;
case 'u':
maxlen = ts_itoa(&buf, va_arg(va, unsigned int), 10, maxlen);
break;
case 'x':
case 'X':
maxlen = ts_itoa(&buf, va_arg(va, int), 16, maxlen);
break;
case '%':
*buf++ = '%';
maxlen--;
break;
}
fmt++;
}
/* Else just copy */
else
{
*buf++ = *fmt++;
maxlen--;
}
}
*buf = 0;
return (int)(buf - start_buf);
}
int sniprintf(char *buf, size_t maxlen, const char *fmt, ...)
{
int length;
va_list va;
va_start(va, fmt);
length = ts_formatstring(buf, maxlen, fmt, va);
va_end(va);
return length;
}
It is from the atollic studio tiny printf.
All the mechanisms (including the passing the list of the parameters to another functions are shown here.
Backstory
I'm porting the QuickCheck unit test framework to C (see the working code at GitHub). The syntax will be:
for_all(property, gen1, gen2, gen3 ...);
Where property is a function to test, for example bool is_odd(int). gen1, gen2, etc. are functions that generate input values for property. Some generate integers, some generate chars, some generate strings, and so on.
for_all will accept a function with arbitrary inputs (any number of arguments, any types of arguments). for_all will run the generators, creating test values to pass to the property function. For example, the property is_odd is a function with type bool f(int). for_all will use the generates to create 100 test cases. If the property returns false for any of them, for_all will print the offending test case values. Otherwise, for_all will print "SUCCESS".
Thus for_all should use a va_list to access the generators. Once we call the generator functions, how do we pass them to the property function?
Example
If is_odd has the type bool f(int), how would we implement a function apply() that has this syntax:
apply(is_odd, generated_values);
Secondary Issue
See SO.
How can we intelligently print the arbitrary values of a failing test case? A test case may be a single integer, or two characters, or a string, or some combination of the above? We won't know ahead of time whether to use:
printf("%d %d %d\n", some_int, some_int, some_int);
printf("%c\n" a_character);
printf("%s%s\n", a_string, a_struct_requiring_its_own_printf_function);
The C language is a statically-typed language. It does not have the powers of runtime reflection that other languages do. It also does not provide ways to build arbitrary function calls from runtime-provided types. You need to have some way of knowing what the function signature of is_odd is and how many parameter it accepts and what the types of those parameters is. It doesn't even know when it has reached the end of the ... argument list; you need an explicit terminator.
enum function_signature {
returns_bool_accepts_int,
returns_bool_accepts_float,
returns_bool_accepts_int_int,
};
typedef bool (*function_returning_bool_accepting_int)(int);
typedef int (*function_generates_int)();
void for_all(function_signature signature, ...)
{
va_list ap;
va_start(ap, signature);
switch (function_signature)
{
case returns_bool_accepts_int:
{
function_returning_bool_accepting_int fn = va_arg(ap, function_returning_bool_accepting_int);
function_generates_int generator;
do {
generator = va_arg(ap, function_generates_int);
if (generator) fn(generator());
} while (generator);
}
break;
... etc ...
}
}
Your problem is that QuickCheck was designed to take advantage of JavaScripts high dynamic programmability, something missing from C.
Update If you allow arbitrary function signatures, then you need a way to make it static again, say, by making the caller provide the appropriate adapters.
typedef void (*function_pointer)();
typedef bool (*function_applicator)(function_pointer, function_pointer);
void for_all(function_applicator apply, ...)
{
va_list ap;
va_start(ap, apply);
function_pointer target = va_arg(ap, function_pointer);
function_pointer generator;
do {
generator = va_arg(ap, function_pointer);
if (generator) apply(target, generator);
} while (generator);
}
// sample caller
typedef bool (*function_returning_bool_accepting_int)(int);
typedef int (*function_returning_int)();
bool apply_one_int(function_pointer target_, function_pointer generator_)
{
function_returning_bool_accepting_int target = (function_returning_bool_accepting_int)target_;
function_returning_int generator = (function_returning_int)generator_;
return target(generator());
}
for_all(apply_one_int, is_odd, generated_values1, generated_values2, (function_pointer)0);
}
Backstory
I'm porting the QuickCheck unit test framework to C (see the working code at GitHub). The syntax will be:
for_all(property, gen1, gen2, gen3 ...);
Where property is a function to test, for example bool is_odd(int). gen1, gen2, etc. are functions that generate input values for property. Some generate integers, some generate chars, some generate strings, and so on.
for_all will accept a function with arbitrary inputs (any number of arguments, any types of arguments). for_all will run the generators, creating test values to pass to the property function. For example, the property is_odd is a function with type bool f(int). for_all will use the generates to create 100 test cases. If the property returns false for any of them, for_all will print the offending test case values. Otherwise, for_all will print "SUCCESS".
Thus for_all should use a va_list to access the generators. Once we call the generator functions, how do we pass them to the property function?
Example
If is_odd has the type bool f(int), how would we implement a function apply() that has this syntax:
apply(is_odd, generated_values);
Secondary Issue
See SO.
How can we intelligently print the arbitrary values of a failing test case? A test case may be a single integer, or two characters, or a string, or some combination of the above? We won't know ahead of time whether to use:
printf("%d %d %d\n", some_int, some_int, some_int);
printf("%c\n" a_character);
printf("%s%s\n", a_string, a_struct_requiring_its_own_printf_function);
The C language is a statically-typed language. It does not have the powers of runtime reflection that other languages do. It also does not provide ways to build arbitrary function calls from runtime-provided types. You need to have some way of knowing what the function signature of is_odd is and how many parameter it accepts and what the types of those parameters is. It doesn't even know when it has reached the end of the ... argument list; you need an explicit terminator.
enum function_signature {
returns_bool_accepts_int,
returns_bool_accepts_float,
returns_bool_accepts_int_int,
};
typedef bool (*function_returning_bool_accepting_int)(int);
typedef int (*function_generates_int)();
void for_all(function_signature signature, ...)
{
va_list ap;
va_start(ap, signature);
switch (function_signature)
{
case returns_bool_accepts_int:
{
function_returning_bool_accepting_int fn = va_arg(ap, function_returning_bool_accepting_int);
function_generates_int generator;
do {
generator = va_arg(ap, function_generates_int);
if (generator) fn(generator());
} while (generator);
}
break;
... etc ...
}
}
Your problem is that QuickCheck was designed to take advantage of JavaScripts high dynamic programmability, something missing from C.
Update If you allow arbitrary function signatures, then you need a way to make it static again, say, by making the caller provide the appropriate adapters.
typedef void (*function_pointer)();
typedef bool (*function_applicator)(function_pointer, function_pointer);
void for_all(function_applicator apply, ...)
{
va_list ap;
va_start(ap, apply);
function_pointer target = va_arg(ap, function_pointer);
function_pointer generator;
do {
generator = va_arg(ap, function_pointer);
if (generator) apply(target, generator);
} while (generator);
}
// sample caller
typedef bool (*function_returning_bool_accepting_int)(int);
typedef int (*function_returning_int)();
bool apply_one_int(function_pointer target_, function_pointer generator_)
{
function_returning_bool_accepting_int target = (function_returning_bool_accepting_int)target_;
function_returning_int generator = (function_returning_int)generator_;
return target(generator());
}
for_all(apply_one_int, is_odd, generated_values1, generated_values2, (function_pointer)0);
}
PHP has a func_get_args() for getting all function arguments, and JavaScript has the functions object.
I've written a very simple max() in C
int max(int a, int b) {
if (a > b) {
return a;
} else {
return b;
}
}
I'm pretty sure in most languages you can supply any number of arguments to their max() (or equivalent) built in. Can you do this in C?
I thought this question may have been what I wanted, but I don't think it is.
Please keep in mind I'm still learning too. :)
Many thanks.
You could write a variable-arguments function that takes the number of arguments, for example
#include <stdio.h>
#include <stdarg.h>
int sum(int numArgs, ...)
{
va_list args;
va_start(args, numArgs);
int ret = 0;
for(unsigned int i = 0; i < numArgs; ++i)
{
ret += va_arg(args, int);
}
va_end(args);
return ret;
}
int main()
{
printf("%d\n", sum(4, 1,3,3,7)); /* prints 14 */
}
The function assumes that each variable argument is an integer (see va_arg call).
Yes, C has the concept of variadic functions, which is similar to the way printf() allows a variable number of arguments.
A maximum function would look something like this:
#include <stdio.h>
#include <stdarg.h>
#include <limits.h>
static int myMax (int quant, ...) {
va_list vlst;
int i;
int num;
int max = INT_MIN;
va_start (vlst, quant);
for (i = 0; i < quant; i++) {
if (i == 0) {
max = va_arg (vlst, int);
} else {
num = va_arg (vlst, int);
if (num > max) {
max = num;
}
}
}
va_end (vlst);
return max;
}
int main (void) {
printf ("Maximum is %d\n", myMax (5, 97, 5, 22, 5, 6));
printf ("Maximum is %d\n", myMax (0));
return 0;
}
This outputs:
Maximum is 97
Maximum is -2147483648
Note the use of the quant variable. There are generally two ways to indicate the end of your arguments, either a count up front (the 5) or a sentinel value at the back.
An example of the latter would be a list of pointers, passing NULL as the last. Since this max function needs to be able to handle the entire range of integers, a sentinel solution is not viable.
The printf function uses the former approach but slightly differently. It doesn't have a specific count, rather it uses the % fields in the format string to figure out the other arguments.
In fact, this are two questions. First of all C99 only requires that a C implementation may handle at least:
127 parameters in one function
definition
127 arguments in one function call
Now, to your real question, yes there are so-called variadic functions and macros in C99. The syntax for the declaration is with ... in the argument list. The implementation of variadic functions goes with macros from the stdarg.h header file.
here is a link to site that shows an example of using varargs in c Writing a ``varargs'' Function
You can use the va_args function to retrieve the optional arguments you pass to a function. And using this you can pass 0-n optional parameters. So you can support more then 2 arguments if you choose
Another alternative is to pass in an array, like main(). for example:
int myfunc(type* argarray, int argcount);
Yes, you can declare a variadic function in C. The most commonly used one is probably printf, which has a declaration that looks like the following
int printf(const char *format, ...);
The ... is how it declares that it accepts a variable number of arguments.
To access those argument it can uses va_start, va_arg and the like which are typically macros defined in stdarg.h. See here
It is probably also worth noting that you can often "confuse" such a function. For example the following call to printf will print whatever happens to be on the top of the stack when it is called. In reality this is probably the saved stack base pointer.
printf("%d");
C can have functions receive an arbitrary number of parameters.
You already know one: printf()
printf("Hello World\n");
printf("%s\n", "Hello World");
printf("%d + %d is %d\n", 2, 2, 2+2);
There is no max function which accepts an arbitrary number of parameters, but it's a good exercise for you to write your own.
Use <stdarg.h> and the va_list, va_start, va_arg, and va_end identifiers defined in that header.
http://www.kernel.org/doc/man-pages/online/pages/man3/stdarg.3.html
I was wondering if there was any way to pass parameters dynamically to variadic functions. i.e. If I have a function
int some_function (int a, int b, ...){/*blah*/}
and I am accepting a bunch of values from the user, I want some way of passing those values into the function:
some_function (a,b, val1,val2,...,valn)
I don't want to write different versions of all these functions, but I suspect there is no other option?
Variadic functions use a calling convention where the caller is responsible for popping the function parameters from the stack, so yes, it is possible to do this dynamically. It's not standardized in C, and normally would require some assembly to manually push the desired parameters, and invoke the variadic function correctly.
The cdecl calling convention requires that the arguments be pushed in the correct order, and after the call, the bytes pushed as arguments before the call are popped. In this way, the called function can receive an arbitrary number of parameters, as the caller will handle reverting the stack pointer to it's pre-call state. The space occupied by the arguments before the ... is the safe lower bound for number of bytes pushed. Additional variadic arguments are interpreted at runtime.
FFCALL is a library which provides wrappers for passing parameters dynamically to variadic functions. The group of functions you're interested in is avcall. Here's an example calling the functions you gave above:
#include <avcall.h>
av_alist argList;
int retVal;
av_start_int(argList, some_function, retval);
av_int(argList, a);
av_int(argList, b);
av_type(argList, val1);
...
av_type(argList, valn);
av_call(argList);
You might also find this link discussing generating wrappers around variadic functions in C, to be of interest in justifying why this isn't part of standard C.
A standard approach is to have each variadic function accompanied by a va_list-taking counterpart (as in printf and vprintf). The variadic version just converts ... to a va_list (using macros from stdarg.h) and calls its va_list-taking sister, which does actual work.
It might be interesting to try just passing an array, and then use the vararg macros anyway. Depending on stack alignment, it might Just Work (tm).
This is probably not an optimal solution, I mainly posted it because I found the idea interesting.
After trying it out, this approach worked on my linux x86, but not on x86-64 - it can probably be improved. This method will depend on stack alignment, struct alignment and probably more.
void varprint(int count, ...)
{
va_list ap;
int32_t i;
va_start(ap, count);
while(count-- ) {
i = va_arg(ap, int32_t);
printf("Argument: %d\n", i);
}
va_end(ap);
}
struct intstack
{
int32_t pos[99];
};
int main(int argc, char** argv)
{
struct intstack *args = malloc(sizeof(struct intstack));
args->pos[0] = 1;
args->pos[1] = 2;
args->pos[2] = 3;
args->pos[3] = 4;
args->pos[4] = 5;
varprint(5, *args);
return 0;
}
Depending on what it is you're passing around, it could be a discriminated union you're after here (as hinted at in the comments). That would avoid the need for variadic functions or arrays of void*, and answers the question "how does some_function know what you actually passed it". You might have code something like this:
enum thing_code { INTEGER, DOUBLE, LONG };
struct thing
{
enum thing_code code;
union
{
int a;
double b;
long c;
};
};
void some_function(size_t n_things, struct thing *things)
{
/* ... for each thing ... */
switch(things[i].code)
{
case INTEGER:
/* ... */
}
}
You can take this a step further and avoid the switch by replacing the code with one or more pointers to functions that do something useful with each thing. For example, if what you wanted to do was to simply print out each thing, you could have this:
struct thing
{
void (*print)(struct thing*);
union
{
...
};
}
void some_function(size_t n_things, struct thing *things)
{
/* .. for each thing .. */
things[i]->print(things[i]);
/* ... */
}