I have a variable-argument function in C that looks roughly like this:
void log(const char * format, ...) {
va_list args;
va_start(args, format);
vfprintf( stderr, format, args );
va_end(args);
exit(1);
}
I was able crash my app by callilng it like this,
log("%s %d", 1);
because the function was missing an argument. Is there a way to determine an argument is missing at runtime?
No, there isn't. But when you compile your code with gcc, you should add the options -Wall -Wextra -Wformat -Os. This will enable lots of warnings, and when you annotate your function with __attribute__(__printf__, 2, 3) or something similar (I don't remember the exact syntax), a warning for exactly your case should appear.
See http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html for the exact syntax. It's really __atttribute__((__format__(__printf__, 1, 2))).
I don't believe there would be any standard mechanism for determining that at runtime. The parameters after the format specifier are simply values on the stack. For example, if a format specifier indicated a 4-byte integer was next, there would be no way of knowing if the next 4 bytes on the stack were an integer or just whatever happened to be on the stack from a previous call.
Nope there isn't, C will allow you to shoot yourself in the foot just like that.
Related
Is it safe and defined behaviour to read va_list like an array instead of using the va_arg function?
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("First argument: %d\n", *((int*)valist));
printf("Second argument: %d\n", *(((int*)valist)+1));
va_end(valist);
}
Same question for assigningment
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("Third argument: %d\n", *(((int*)valist)+2));
*((int*)valist+2)=33;
printf("New third argument: %d\n", *(((int*)valist)+2));
va_end(valist);
}
PS: This seems to work on GCC
No, it is not, you cannot assume anything because the implementation varies across libraries.
The only portable way to access the values is by using the macros defined in stdarg.h for accessing the
ellipsis. The size of the type is important, otherwise you end up reading garage
and if your read more bytes than has been passed, you have undefined behaviour.
So, to get a value, you have to use va_arg.
See: STDARG documentation
You cannot relay on a guess as to how va_list works, or on a particular
implementation. How va_list works depends on the ABI, the architecture, the
compiler, etc. If you want a more in-depth view of va_list, see
this answer.
edit
A couple of hours ago I wrote this answer explaining how to use the
va_*-macros. Take a look at that.
No, this is not safe and well-defined. The va_list structure could be anything (you assume it is a pointer to the first argument), and the arguments may or may not be stored contiguously in the "right order" in some memory area being pointed to.
Example of va_list implementation that doesn't work for your code - in this setup some arguments are passed in registers instead of the stack, but the va_arg still has to find them.
If an implementation's documentation specifies that va_list may be used in ways beyond those given in the Standard, you may use them in such fashion on that implementation. Attempting to use arguments in other ways may have unpredictable consequences even on platforms where the layout of parameters is specified. For example, on a platform where variadic arguments are pushed on the stack in reverse order, if one were to do something like:
int test(int x, ...)
{
if (!x)
return *(int*)(4+(uintptr_t)&x); // Address of first argument after x
... some other code using va_list.
}
int test2(void)
{
return test(0, someComplicatedComputation);
}
a compiler which is processing test2 might look at the definition of test,
notice that it (apparently) ignores its variadic arguments when the first
argument is zero, and thus conclude that it doesn't need to compute and
pass the result of someComplicatedComputation. Even if the documentation
for the platform documents the layout of variadic arguments, the fact that
the compiler can't see that they are accessed may cause it to conclude that
they are not.
if I had a function receiving variable length argument
void print_arg_addr (int n, ...)
I can use these three macro to parse the argument
va_start(ap,v)
va_arg(ap,t)
va_end(ap)
In my understanding,
va_start lets ap point to second parameter,
va_arg move ap to next parameter,
va_end lets ap point to NULL.
so I use the snippet below to check my understanding,
but it turns out that ap does not change,
I expect ap will increment by 4 each time.
void print_arg_addr (int n, ...)
{
int i;
int val;
va_list vl;
va_start(vl,n);
for (i=0;i<n;i++)
{
val=va_arg(vl,int);
printf ("ap:%p , %d\n",vl,val);
}
va_end(vl);
printf ("ap:%p \n",vl);
}
int main()
{
print_arg_addr(5,1,2,3,4,5);
}
output:
ap:0x7ffc62fb9890 , 1
ap:0x7ffc62fb9890 , 2
ap:0x7ffc62fb9890 , 3
ap:0x7ffc62fb9890 , 4
ap:0x7ffc62fb9890 , 5
ap:0x7ffc62fb9890
Thank you!
A va_list (like your vl) is some abstract data type that you are not allowed to pass to printf. Its implementation is private to your compiler (and processor architecture), and related to the ABI and calling conventions. Compile your code with all warnings and debug info: gcc -Wall -Wextra -g. You'll get warnings, and you have undefined behavior so you should be very scared.
In other words, consider va_list, va_start, va_end (and all stdarg(3) ...) as some magic provided by the compiler. That is why they are part of the C11 specification (read n1570) and often implemented as compiler builtins.
If you need to understand the internals of va_list and friends (but you should not need that), dive inside your compiler (and study your ABI). Since GCC is free software of millions of source code lines, you could spend many years studying it. In your case, I don't think it is worth the effort.
You might also look at the generated assembler code (a .s file), using gcc -O -S -fverbose-asm
Current calling conventions use processor registers. This is why understanding the details of variadic calls is complex. In the 1980s, arguments were pushed on the machine stack, and at that time va_start returned some pointer into the stack. Things are much more complex now, and you don't want to dive into that complexity.
I've recently gotten back into working with C, and I decided to write a library as a wrapper for stdio.h. The goal is to do all the error checking possible so that the user won't have to do it themselves whenever they call a stdio function. This is partly for learning, and partly for real use (since I frequently use stdio).
When I write the following (in main), gcc gives an error at compile time, since there is supposed to be an integer as another argument but none was passed.
printf("Your integer: %d\n");
In case it's useful, here are my compiler flags:
-std=c11 -pedantic-errors -Wall -Wextra -Werror
Here's part of my current wrapper function. It works perfectly and checks for quite a few errors when passed valid/correct arguments:
uintmax_t gsh_printf(const char *format, ...)
{
va_list arg;
int cnt;
va_start(arg, format);
cnt = vprintf(format, arg);
va_end(arg);
// Analyze cnt and check for stream errors here
return (uintmax_t)cnt;
}
But here's the problem, if I call:
gsh_printf("Your integer: %d\n");
It does not give an error, and it even runs! The usual output is something like:
Your integer: 1799713
Or some other number, implying that it's accessing memory not allocated to it, but it never gives a segmentation fault either.
So, why does it not give an error of any kind? And how can I write my own code so that there is a compile-time error, or at least run-time error after checking types, number of args, etc.?
Of course, any help is greatly appreciated, and if you need any more information, just let me know. Thank you!
With fprintf and fscanf families of functions, if a conversion specification corresponding argument is missing, the function call invokes undefined behavior.
With gcc use format (archetype, string-index, first-to-check) function attribute to request a diagnostic:
extern uintmax_t gsh_printf(const char *format, ...)
__attribute__ ((format (printf, 1, 2)));
uintmax_t gsh_printf(const char *format, ...)
{
va_list arg;
int cnt;
va_start(arg, format);
cnt = vprintf(format, arg);
va_end(arg);
// Analyze cnt and check for stream errors here
return (uintmax_t)cnt;
}
See documentation for an explanation of archetype, string-index and first-to-check:
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
For example with the example above, with -Wall (because of -Wformat), with this statement:
gsh_printf("Your integer: %d\n");
you'll get this warning:
warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat]
With -Wall (because of -Wformat-extra-args) you will also get a warning for extra arguments:
gsh_printf("Your integer: %d\n", 0, 1);
gives
warning: too many arguments for format [-Wformat-extra-args]
In C there is no way to determine whether or not the number of arguments that are passed when using va_list is the amount of arguments required. In the C calling convention the arguments are pushed onto the stack starting from the right-most argument. The way printf works is it parses the format string and pops a value from the stack whenever needed. Thus you get the random number when calling
gsh_printf("Your integer: %d\n");
You would need to know in advance how many arguments are supplied which cannot be done using va_list.
You might be able to get around this by using some kind of container class to hold all the arguments and use the number of elements in the container to check if there are enough.
Also notice that 'args' is just a pointer to the start of the argument list. So when you pass it to vprintf, vprintf just prints the value of the pointer.
I'm using Visual Studio 2012 to compile this sample code:
#include <stdarg.h>
#include <stdio.h>
const char * __cdecl foo(const char * format, const char * requiredArgument, ...)
{
va_list args;
va_start(args, format);
vprintf(format, args);
va_end(args);
return requiredArgument;
}
int main(int, char **)
{
foo("The %s is %d pixels wide and %d pixels high.", "box", 300, 200);
return 0;
}
The debug build of the program terminates normally after printing the message "The box is 300 pixels wide and 200 pixels high.".
The release build crashes with a segmentation fault.
My interpretation for this behavior - but I may be wrong about that, please correct me if so - is that I'm incorrectly specifying a function parameter other than the last non-variadic one in va_start, the only admissible form being here va_start(args, requiredArgument) rather than va_start(args, format) as I would like to have.
In other words, I'm misusing va_start in a way that makes the whole program flow unpredictable, and so the segmentation fault is nothing but fine here.
If my assumptions are right, I have two questions now:
Why is it even required to specify the last formally declared function parameter in va_start, if choosing anything else is apparently illegal?
Why does the picky VC++ compiler not raise a warning for such an easy to detect and potentially critical pitfall?
Why is it even required to specify the last formally declared function parameter in va_start, if choosing anything else is apparently illegal?
Because that macro needs to know the address of the last argument.
Why does the picky VC++ compiler not raise a warning for such an easy to detect and potentially critical pitfall?
Because it's just not "intelligent" enough. Or its creators decided not to include this warning. Or maybe it could, but by default it's turned off and you can turn it on using some compiler flag.
Any C programmer who's been working for more than a week has encountered crashes that result from calling printf with more format specifiers than actual arguments, e.g.:
printf("Gonna %s and %s, %s!", "crash", "burn");
However, are there any similar bad things that can happen when you pass too many arguments to printf?
printf("Gonna %s and %s!", "crash", "burn", "dude");
My knowledge of x86/x64 assembly leads me to believe that this is harmless, though I'm not convinced that there's not some edge condition I'm missing, and I have no idea about other architectures. Is this condition guaranteed to be harmless, or is there a potentially crash-inducing pitfall here, too?
Online C Draft Standard (n1256), section 7.19.6.1, paragraph 2:
The fprintf function writes output to the stream pointed to by stream, under control of the string pointed to by format that specifies how subsequent arguments are
converted for output. If there are insufficient arguments for the format, the behavior is
undefined. If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. The fprintf function returns when
the end of the format string is encountered.
Behavior for all the other *printf() functions is the same wrt excess arguments except for vprintf() (obviously).
You probably know the prototype for the printf function as something like this
int printf(const char *format, ...);
A more complete version of that would actually be
int __cdecl printf(const char *format, ...);
The __cdecl defines the "calling convention" which, along with other things, describes how arguments are handled. In the this case it means that args are pushed onto the stack and that the stack is cleaned by the function making the call.
One alternative to _cdecl is __stdcall, there are others. With __stdcall the convention is that arguments are pushed onto the stack and cleaned by the function that is called. However, as far as I know, it isn't possible for a __stdcall function to accept a variable number of arguments. That makes sense since it wouldn't know how much stack to clean.
The long and the short of it is that in the case of __cdecl functions its safe to pass however many args you want, since the cleanup is performed in the code makeing the call. If you were to somehow pass too many arguments to a __stdcall function it result in a corruption of the stack. One example of where this could happen is if you had the wrong prototype.
More information on calling conventions can be found on Wikipedia here.
All the arguments will be pushed on the stack and removed if the stack frame is removed. this behaviour is independend from a specific processor. (I only remember a mainframe which had no stack, designed in 70s) So, yes the second example wont't fail.
printf is designed to accept any number of arguments. printf then reads the format specifier (first argument), and pulls arguments from the argument list as needed. This is why too few arguments crash: the code simply starts using non-existent arguments, accessing memory that doesn't exist, or some other bad thing. But with too many arguments, the extra arguments will simply be ignored. The format specifier will use fewer arguments than have been passed in.
Comment: both gcc and clang produce warnings:
$ clang main.c
main.c:4:29: warning: more '%' conversions than data arguments [-Wformat]
printf("Gonna %s and %s, %s!", "crash", "burn");
~^
main.c:5:47: warning: data argument not used by format string
[-Wformat-extra-args]
printf("Gonna %s and %s!", "crash", "burn", "dude");
~~~~~~~~~~~~~~~~~~ ^
2 warnings generated.