if I had a function receiving variable length argument
void print_arg_addr (int n, ...)
I can use these three macro to parse the argument
va_start(ap,v)
va_arg(ap,t)
va_end(ap)
In my understanding,
va_start lets ap point to second parameter,
va_arg move ap to next parameter,
va_end lets ap point to NULL.
so I use the snippet below to check my understanding,
but it turns out that ap does not change,
I expect ap will increment by 4 each time.
void print_arg_addr (int n, ...)
{
int i;
int val;
va_list vl;
va_start(vl,n);
for (i=0;i<n;i++)
{
val=va_arg(vl,int);
printf ("ap:%p , %d\n",vl,val);
}
va_end(vl);
printf ("ap:%p \n",vl);
}
int main()
{
print_arg_addr(5,1,2,3,4,5);
}
output:
ap:0x7ffc62fb9890 , 1
ap:0x7ffc62fb9890 , 2
ap:0x7ffc62fb9890 , 3
ap:0x7ffc62fb9890 , 4
ap:0x7ffc62fb9890 , 5
ap:0x7ffc62fb9890
Thank you!
A va_list (like your vl) is some abstract data type that you are not allowed to pass to printf. Its implementation is private to your compiler (and processor architecture), and related to the ABI and calling conventions. Compile your code with all warnings and debug info: gcc -Wall -Wextra -g. You'll get warnings, and you have undefined behavior so you should be very scared.
In other words, consider va_list, va_start, va_end (and all stdarg(3) ...) as some magic provided by the compiler. That is why they are part of the C11 specification (read n1570) and often implemented as compiler builtins.
If you need to understand the internals of va_list and friends (but you should not need that), dive inside your compiler (and study your ABI). Since GCC is free software of millions of source code lines, you could spend many years studying it. In your case, I don't think it is worth the effort.
You might also look at the generated assembler code (a .s file), using gcc -O -S -fverbose-asm
Current calling conventions use processor registers. This is why understanding the details of variadic calls is complex. In the 1980s, arguments were pushed on the machine stack, and at that time va_start returned some pointer into the stack. Things are much more complex now, and you don't want to dive into that complexity.
Related
Is it possible for the callee to iterate (and count) through the function call parameters by offsetting the stack base pointer (rbp) using the inline ASM (x86) without knowing the type or quantity of the arguments?
void foo(char *arg, ...);
I am using Intel compiler but its documentation states that it supports GCC style inline assembly. So GCC based example would be sufficient.
#include <stdio.h>
#include <inttypes.h>
int main(int argc, char **argv){
uint64_t n;
__asm__ __volatile__(
"movq %%rbp, %0\n\t"
: "=r"(n)
);
printf("rbp = 0x%" PRIx64 "\n", n);
return 0;
}
the code in this post
The only possible way for this to work is with a unique sentinel value (e.g. a NULL pointer) that marks the last argument.
This normally only works when all the args are pointers, e.g. as used by the POSIX execl(3) functions with signatures like
int execl(const char *pathname, const char *arg, ...
/* (char *) NULL */);
(Then you don't need inline asm; you can just use C VA_ARG macros.)
Also, rbp is useless; you don't know whether or not the function was compiled with optimization enabled so it might not be a frame pointer at all. If you want the frame address in GNU C, use __builtin_frame_address(0). (Look at the compiler-generated asm to see what it does. IIRC, it forces -fno-omit-frame-pointer for that function and just gives you the value of rbp)
And even getting a stack-frame address doesn't help you get register args (first 6 integer/pointer and first 8 FP args in x86-64 System V. Or first 4 total args in Windows x64.)
Declare your function as variadic and Use VA_ARG to read all the variadic args as uint64_t so you can check them for zero, or whatever bit-pattern you selected as your sentinel.
Obviously this requires the cooperation of the caller to pass a sentinel.
BTW, no calling convention called __cdecl exists for x86-64. Microsoft calls theirs x64 __fastcall or __vectorcall.
No, you can't. It's impossible in the general case.
In the specific case of compiling with debugging symbols on you have a shot at writing code to interpret your own symbols, but this compilation mode is frowned upon for release builds so writing this code is not recommended. It's also completely nonportable but you don't care about that constraint.
Is it safe and defined behaviour to read va_list like an array instead of using the va_arg function?
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("First argument: %d\n", *((int*)valist));
printf("Second argument: %d\n", *(((int*)valist)+1));
va_end(valist);
}
Same question for assigningment
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("Third argument: %d\n", *(((int*)valist)+2));
*((int*)valist+2)=33;
printf("New third argument: %d\n", *(((int*)valist)+2));
va_end(valist);
}
PS: This seems to work on GCC
No, it is not, you cannot assume anything because the implementation varies across libraries.
The only portable way to access the values is by using the macros defined in stdarg.h for accessing the
ellipsis. The size of the type is important, otherwise you end up reading garage
and if your read more bytes than has been passed, you have undefined behaviour.
So, to get a value, you have to use va_arg.
See: STDARG documentation
You cannot relay on a guess as to how va_list works, or on a particular
implementation. How va_list works depends on the ABI, the architecture, the
compiler, etc. If you want a more in-depth view of va_list, see
this answer.
edit
A couple of hours ago I wrote this answer explaining how to use the
va_*-macros. Take a look at that.
No, this is not safe and well-defined. The va_list structure could be anything (you assume it is a pointer to the first argument), and the arguments may or may not be stored contiguously in the "right order" in some memory area being pointed to.
Example of va_list implementation that doesn't work for your code - in this setup some arguments are passed in registers instead of the stack, but the va_arg still has to find them.
If an implementation's documentation specifies that va_list may be used in ways beyond those given in the Standard, you may use them in such fashion on that implementation. Attempting to use arguments in other ways may have unpredictable consequences even on platforms where the layout of parameters is specified. For example, on a platform where variadic arguments are pushed on the stack in reverse order, if one were to do something like:
int test(int x, ...)
{
if (!x)
return *(int*)(4+(uintptr_t)&x); // Address of first argument after x
... some other code using va_list.
}
int test2(void)
{
return test(0, someComplicatedComputation);
}
a compiler which is processing test2 might look at the definition of test,
notice that it (apparently) ignores its variadic arguments when the first
argument is zero, and thus conclude that it doesn't need to compute and
pass the result of someComplicatedComputation. Even if the documentation
for the platform documents the layout of variadic arguments, the fact that
the compiler can't see that they are accessed may cause it to conclude that
they are not.
There is a curious difference between assemblies of a small program, when compiled as a C-program or as a C++-program (for Linux x86-64).
The code in question:
int fun();
int main(){
return fun();
}
Compiling it as a C-program (with gcc -O2) yields:
main:
xorl %eax, %eax
jmp fun
But compiling it as a C++-program (with g++ -02) yields:
main:
jmp _Z3funv
I find it puzzling, that the C-version initializes the return value of the main-function with 0 (xorl %eax, %eax).
Which feature of the C-language is responsible for this necessity?
Edit: It is true that, for int fun(void); the is no initialization of the eax-register.
If there is no prototype of fun at all, i.e.:
int main(){
return fun();
}
then the C-compiler zeros the eax-register once again.
In C int fun(); can take any number of arguments, so it may even be a varargs function. In C++ however it means it takes no arguments.
The x86-64 sysv abi convention demands that the register AL must contain the number of SSE registers used when invoking a varargs function. You of course pass no argument, so it is zeroed. For convenience the compiler decided to zero the whole eax. Declare your prototype as int fun(void); and the xor shall disappear.
Apparently it is a defensive measure, designed for situations when prototype-less fun function happens to actually be a variadic function, as explained by #Jester's answer.
Note though that this explanation does not hold any water from the point of view of standard C language.
Since the beginning of standardized times (C89/90) C language explicitly required all variadic functions to be declared with prototype before the point of the call. Calling a non-prototyped variadic function triggers undefined behavior in standard C. So, formally, compilers do not have to accommodate the possibility of fun being variadic - if it is, the behavior would be undefined anyway.
Moreover, as #John Bollinger noted in the comments, according to the C standard, a non-prototype int fun() declaration actually precludes further variadic prototype declarations of fun. I.e. a variadic function cannot be legally pre-declared as a () function. That would be another reason why the above non-prototype declaration is sufficient for the compiler to assume that fun cannot possibly be variadic.
This could actually be a legacy feature, designed to support pre-standard C code, where pre-declaring variadic functions with prototype was not required.
I have a variable-argument function in C that looks roughly like this:
void log(const char * format, ...) {
va_list args;
va_start(args, format);
vfprintf( stderr, format, args );
va_end(args);
exit(1);
}
I was able crash my app by callilng it like this,
log("%s %d", 1);
because the function was missing an argument. Is there a way to determine an argument is missing at runtime?
No, there isn't. But when you compile your code with gcc, you should add the options -Wall -Wextra -Wformat -Os. This will enable lots of warnings, and when you annotate your function with __attribute__(__printf__, 2, 3) or something similar (I don't remember the exact syntax), a warning for exactly your case should appear.
See http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html for the exact syntax. It's really __atttribute__((__format__(__printf__, 1, 2))).
I don't believe there would be any standard mechanism for determining that at runtime. The parameters after the format specifier are simply values on the stack. For example, if a format specifier indicated a 4-byte integer was next, there would be no way of knowing if the next 4 bytes on the stack were an integer or just whatever happened to be on the stack from a previous call.
Nope there isn't, C will allow you to shoot yourself in the foot just like that.
Below is code which includes a variadic function and calls to the variadic function. I would expect that it would output each sequence of numbers appropriately. It does when compiled as a 32-bit executable, but not when compiled as a 64-bit executable.
#include <stdarg.h>
#include <stdio.h>
#ifdef _WIN32
#define SIZE_T_FMT "%Iu"
#else
#define SIZE_T_FMT "%zu"
#endif
static void dumpargs(size_t count, ...) {
size_t i;
va_list args;
printf("dumpargs: argument count: " SIZE_T_FMT "\n", count);
va_start(args, count);
for (i = 0; i < count; i++) {
size_t val = va_arg(args, size_t);
printf("Value=" SIZE_T_FMT "\n", val);
}
va_end(args);
}
int main(int argc, char** argv) {
(void)argc;
(void)argv;
dumpargs(1, 10);
dumpargs(2, 10, 20);
dumpargs(3, 10, 20, 30);
dumpargs(4, 10, 20, 30, 40);
dumpargs(5, 10, 20, 30, 40, 50);
return 0;
}
Here is the output when compiled for 64-bit:
dumpargs: argument count: 1
Value=10
dumpargs: argument count: 2
Value=10
Value=20
dumpargs: argument count: 3
Value=10
Value=20
Value=30
dumpargs: argument count: 4
Value=10
Value=20
Value=30
Value=14757395255531667496
dumpargs: argument count: 5
Value=10
Value=20
Value=30
Value=14757395255531667496
Value=14757395255531667506
Edit:
Please note that the reason the variadic function pulls size_t out is because the real-world use of this is for a variadic function that accepts a list of pointers and lengths. Naturally the length argument should be a size_t. And in some cases a caller might pass in a well-known length for something:
void myfunc(size_t pairs, ...) {
va_list args;
va_start(args, count);
for (i = 0; i < pairs; i++) {
const void* ptr = va_arg(args, const void*);
size_t len = va_arg(args, size_t);
process(ptr, len);
}
va_end(args);
}
void user(void) {
myfunc(2, ptr1, ptr1_len, ptr2, 4);
}
Note that the 4 passed into myfunc might encounter the problem described above. And yes, really the caller should be using sizeof or the result of strlen or just plain put the number 4 into a size_t somewhere. But the point is that the compiler is not catching this (a common danger with variadic functions).
The right thing to do here is to eliminate the variadic function and replace it with a better mechanism that provides type safety. However, I would like to document this problem, and collect more detailed information as to exactly why this problem exists on this platform and manifests as it does.
So basically, if a function is variadic, it must conform to a certain calling convention (most importantly, the caller must clean up args, not the callie, since the callie has no idea how many args there will be).
The reason why it starts happening on the 4th is because of the calling convention used on x86-64. To my knowledge, both visual c++ and gcc use registers for the first few parameters, and then after that use the stack.
I am guessing that this is the case even for variadic functions (which does strike me as odd since it would make the va_* macros more complicated).
On x86, the standard C calling convention is the use the stack always.
The problem is that you're using size_t to represent the type of the values. This is incorrect, the values are actually normal 32 bit values on Win64.
Size_t should only be used for values which change size based on the 32 or 64 bit-ness of the platform (such as pointers). Change the code to use int or __int32 and this should fix your problem.
The reason this works fine on Win32 is that size_t is a different sized type depending on the platfrom. For 32 bit windows it will be 32 bits and on 64 bit windows it will be 64 bit. So on 32 bit windows it just happens to match the size of the data type you are using.
A variadic function is only weakly type checked. In particular, the function signature does not provide enough information for the compiler to know the type of each argument assumed by the function.
In this case, size_t is 32-bits on Win32 and 64-bits on Win64. It has to vary in size like that in order to perform its defined role. So for a variadic function to pull arguments out correctly which are of type size_t, the caller had to make certain that the compiler could tell that the argument was of that type at compile-time in the calling module.
Unfortunately 10 is a constant of type int. There is no defined suffix letter that marks a constant to be of type size_t. You could hide that fact inside a platform-specific macro, but that would be no clearer than writing (size_z)10 at the call site.
It appears to work partially because of the actual calling convention used in Win64. From the examples given, we can tell that the first four integral arguments to a function are passed in registers, and the rest on the stack. That allowed count and the first three variadic parameters to be read correctly.
However it only appears to work. You are actually standing squarely in Undefined Behavior territory, and "undefined" really does mean "undefined": anything can happen.
On other platforms, anything can happen too.
Because variadic functions are implicitly unsafe, a special burden is placed on the coder to make certain that the type of each argument known at compile time matches the type that argument will be assumed to have at run time.
In some cases where the interfaces are well known, it is possible to warn about type mismatch. For example, gcc can often recognize that the type of an argument to printf() doesn't match the format string, and issue a warning. But doing that in the general case for all variadic functions is hard.
The reason for this is because size_t is defined as a 32-bit value on 32-bit Windows, and a 64-bit value on 64-bit Windows. When the 4th argument is passed into the variadic function, the upper bits appear to be uninitialized. The 4th and 5th values that are pulled out are actually:
Value=0xcccccccc00000028
Value=0xcccccccc00000032
I can solve this problem with a simple cast on all the arguments, such as:
dumpargs(5, (size_t)10, (size_t)20, (size_t)30, (size_t)40, (size_t)50);
This does not answer all my questions, however; such as:
Why is it the 4th argument? Likely because the first 3 are in registers?
How does one avoid this situation in a type-safe portable manner?
Does this happen on other 64-bit platforms, using 64-bit values (ignoring that size_t might be 32-bit on some 64-bit platforms)?
Should I pull out the values as 32-bit values regardless of the target platform, and will that cause problems if a 64-bit value is pushed into the variadic function?
What do the standards say about this behavior?
Edit:
I really wanted to get a quote from The Standard, but it's something that's not hyperlink-able, and costs money to purchase and download. Therefore I believe quoting it would be a copyright violation.
Referencing the comp.lang.c FAQ, it's made clear that when writing a function that takes a variable number of arguments, there's nothing you can do for type safety. It's up to the caller to make sure that each argument either perfectly matches or is explicitly cast. There are no implicit conversions.
That much should be obvious to those who understand C and printf (note that gcc has a feature to check printf-style format strings), but what's not so obvious is that not only are the types not implicitly cast, but if the size of the types don't match what's extracted, you can have uninitialized data, or undefined behavior in general. The "slot" where an argument is placed might not be initialized to 0, and there might not be a "slot"--on some platforms you could pass a 64-bit value, and extract two 32-bit values inside the variadic function. It's undefined behavior.
If you are the one writing this function, it is your job to write the variadic function correctly and/or correctly document your function's calling conventions.
You already found that C plays fast-and-loose with types (see also signedness and promotion), so explicit casting is the most obvious solution. This is frequently seen with integer constants being explicitly defined with things like UL or ULL.
Most sanity checks on passed values will be application-specific or non-portable (e.g. pointer validity). You can use hacks like mandating that pre-defined sentinel value(s) be sent as well, but that's not infallible in all cases.
Best practice would be to document heavily, perform code reviews, and/or write unit tests with this bug in mind.