I'm using Visual Studio 2012 to compile this sample code:
#include <stdarg.h>
#include <stdio.h>
const char * __cdecl foo(const char * format, const char * requiredArgument, ...)
{
va_list args;
va_start(args, format);
vprintf(format, args);
va_end(args);
return requiredArgument;
}
int main(int, char **)
{
foo("The %s is %d pixels wide and %d pixels high.", "box", 300, 200);
return 0;
}
The debug build of the program terminates normally after printing the message "The box is 300 pixels wide and 200 pixels high.".
The release build crashes with a segmentation fault.
My interpretation for this behavior - but I may be wrong about that, please correct me if so - is that I'm incorrectly specifying a function parameter other than the last non-variadic one in va_start, the only admissible form being here va_start(args, requiredArgument) rather than va_start(args, format) as I would like to have.
In other words, I'm misusing va_start in a way that makes the whole program flow unpredictable, and so the segmentation fault is nothing but fine here.
If my assumptions are right, I have two questions now:
Why is it even required to specify the last formally declared function parameter in va_start, if choosing anything else is apparently illegal?
Why does the picky VC++ compiler not raise a warning for such an easy to detect and potentially critical pitfall?
Why is it even required to specify the last formally declared function parameter in va_start, if choosing anything else is apparently illegal?
Because that macro needs to know the address of the last argument.
Why does the picky VC++ compiler not raise a warning for such an easy to detect and potentially critical pitfall?
Because it's just not "intelligent" enough. Or its creators decided not to include this warning. Or maybe it could, but by default it's turned off and you can turn it on using some compiler flag.
Related
I've recently gotten back into working with C, and I decided to write a library as a wrapper for stdio.h. The goal is to do all the error checking possible so that the user won't have to do it themselves whenever they call a stdio function. This is partly for learning, and partly for real use (since I frequently use stdio).
When I write the following (in main), gcc gives an error at compile time, since there is supposed to be an integer as another argument but none was passed.
printf("Your integer: %d\n");
In case it's useful, here are my compiler flags:
-std=c11 -pedantic-errors -Wall -Wextra -Werror
Here's part of my current wrapper function. It works perfectly and checks for quite a few errors when passed valid/correct arguments:
uintmax_t gsh_printf(const char *format, ...)
{
va_list arg;
int cnt;
va_start(arg, format);
cnt = vprintf(format, arg);
va_end(arg);
// Analyze cnt and check for stream errors here
return (uintmax_t)cnt;
}
But here's the problem, if I call:
gsh_printf("Your integer: %d\n");
It does not give an error, and it even runs! The usual output is something like:
Your integer: 1799713
Or some other number, implying that it's accessing memory not allocated to it, but it never gives a segmentation fault either.
So, why does it not give an error of any kind? And how can I write my own code so that there is a compile-time error, or at least run-time error after checking types, number of args, etc.?
Of course, any help is greatly appreciated, and if you need any more information, just let me know. Thank you!
With fprintf and fscanf families of functions, if a conversion specification corresponding argument is missing, the function call invokes undefined behavior.
With gcc use format (archetype, string-index, first-to-check) function attribute to request a diagnostic:
extern uintmax_t gsh_printf(const char *format, ...)
__attribute__ ((format (printf, 1, 2)));
uintmax_t gsh_printf(const char *format, ...)
{
va_list arg;
int cnt;
va_start(arg, format);
cnt = vprintf(format, arg);
va_end(arg);
// Analyze cnt and check for stream errors here
return (uintmax_t)cnt;
}
See documentation for an explanation of archetype, string-index and first-to-check:
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
For example with the example above, with -Wall (because of -Wformat), with this statement:
gsh_printf("Your integer: %d\n");
you'll get this warning:
warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat]
With -Wall (because of -Wformat-extra-args) you will also get a warning for extra arguments:
gsh_printf("Your integer: %d\n", 0, 1);
gives
warning: too many arguments for format [-Wformat-extra-args]
In C there is no way to determine whether or not the number of arguments that are passed when using va_list is the amount of arguments required. In the C calling convention the arguments are pushed onto the stack starting from the right-most argument. The way printf works is it parses the format string and pops a value from the stack whenever needed. Thus you get the random number when calling
gsh_printf("Your integer: %d\n");
You would need to know in advance how many arguments are supplied which cannot be done using va_list.
You might be able to get around this by using some kind of container class to hold all the arguments and use the number of elements in the container to check if there are enough.
Also notice that 'args' is just a pointer to the start of the argument list. So when you pass it to vprintf, vprintf just prints the value of the pointer.
I have a variable-argument function in C that looks roughly like this:
void log(const char * format, ...) {
va_list args;
va_start(args, format);
vfprintf( stderr, format, args );
va_end(args);
exit(1);
}
I was able crash my app by callilng it like this,
log("%s %d", 1);
because the function was missing an argument. Is there a way to determine an argument is missing at runtime?
No, there isn't. But when you compile your code with gcc, you should add the options -Wall -Wextra -Wformat -Os. This will enable lots of warnings, and when you annotate your function with __attribute__(__printf__, 2, 3) or something similar (I don't remember the exact syntax), a warning for exactly your case should appear.
See http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html for the exact syntax. It's really __atttribute__((__format__(__printf__, 1, 2))).
I don't believe there would be any standard mechanism for determining that at runtime. The parameters after the format specifier are simply values on the stack. For example, if a format specifier indicated a 4-byte integer was next, there would be no way of knowing if the next 4 bytes on the stack were an integer or just whatever happened to be on the stack from a previous call.
Nope there isn't, C will allow you to shoot yourself in the foot just like that.
Consider the following test case:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdarg.h>
void test(char **outa, char **outb, const char* fstra, const char* fstrb, ...) {
va_list ap;
va_start(ap, fstrb);
vasprintf(&outa, fstra, ap);
vasprintf(&outb, fstrb, ap);
va_end(ap);
}
int main(void) {
char *a, *b;
test(&a, &b, "%s", " %s\n", "foo", "bar");
/* ... */
}
The intent here is that the test() function takes two format strings and a list of parameters for both of them. The first format string is supposed to 'eat' as many arguments it needs, and the remaining ones are supposed to be used for the second format string.
So, the expected result here would be foo & bar and that's what I get with glibc. But AFAICS the machine running codepad (guess some *BSD it is), gives foo & foo and my guess is that it uses va_copy() on the argument list.
I guess I'm hitting an undefined (and ugly) behavior here; so the question is: is there a way to achieve double-format-string printf() without reimplementing it from scratch? And is there a nice way to check that behavior using autoconf without using AC_RUN_IFELSE()?
I guess some quick method of scanning format-string for the number of arguments to be consumed could work here as well (+va_copy()).
When you call one of the v*printf functions, this uses va_arg which means the value of ap is indeterminate on return.
The relevant bit lies in section 7.19.6.8 The vfprintf function in C99, which references the footnote:
As the functions vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, and vsscanf invoke theva_argmacro, the value ofargafter the return is indeterminate.
This has survived to the latest draft of C1x I have as well, so I suspect it's not going to change quickly.
There is no portable way to do what you're attempting using the higher-level v*printf functions although you could resort to using the lower level stuff.
The standard is very clear in that a called function using va_arg on a va_list variable renders it indeterminate in the caller. From C99 7.15 Variable Arguments <stdarg.h>:
The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap.
However, the value of ap when using va_arg on it within a single function is determinate (otherwise the whole variable arguments processing would fall apart). So you could write a single function which processed both format strings in turn, with these lower-level functions.
With the higher level stuff (as per the footnote), you are required to va_end/va_start to put the ap variable back in a determinate state and this will unfortunately reset to the start of the parameter list.
I'm not sure how much of a simplification your provided example code is but, if that's close to reality, you can acheive the same result by just combining the two format strings beforehand and using that to pass to vprintf, something like:
void test(const char* fstra, const char* fstrb, ...) {
char big_honkin_buff[1024]; // Example, don't really do this.
va_list ap;
strcpy (big_honkin_buff, fstra);
strcat (big_honkin_buff, fstrb);
va_start(ap, big_honkin_buff);
vprintf(big_honkin_buff, ap);
va_end(ap);
}
As the other answer already states, passing ap to a v*() function leaves ap in an undetermined state. So, the solution is to not depend on this state. I suggest an alternative workaround.
First, initialize ap as normal. Then determine the length of the first formatted string using vsnprintf(NULL, 0, fstra, ap). Concatenate the format strings, reinitialize ap, and split the output using the predetermined length of the first formatted string.
It should look something like the following:
void test(const char* fstra, const char* fstrb, ...) {
char *format, *buf;
char *a, *b;
int a_len, buf_len;
va_list ap;
va_start(ap, fstrb);
a_len = vsnprintf(NULL, 0, fstra, ap);
va_end(ap);
asprintf(&format, "%s%s", fstra, fstrb);
va_start(ap, fstrb);
buf_len = vasprintf(&buf, format, ap);
va_end(ap);
free(format);
a = malloc(a_len + 1);
memcpy(a, buf, a_len);
a[a_len] = '\0';
b = malloc(buf_len - a_len + 1);
memcpy(b, buf + a_len, buf_len - a_len);
b[buf_len - a_len] = '\0';
free(buf);
}
As also discussed in the other answer, this approach does not separate positional printf-style placeholders ("%1$s. I repeat, %1$s."). So the documentation for the interface should clearly state that both format strings share the same positional placeholder namespace—and that if one of the format strings uses positional placeholders then both must.
To complete the other answers, which are correct, a word about what happens in common implementations.
In 32bit Linux (and I think Windows too), passing the same ap to two functions actually works.
This is because the va_list is just a pointer to the place on the stack where the parameters are. v*rintf functions get it, but don't change it (they can't, it's passed by value).
In 64bit Linux (don't know about Windows), it doesn't work.
va_list is a struct, and v*printf gets a pointer to it (because actually it's an array of size 1 of structs). When arguments are consumed, the struct is modified. So another call to v*printf will get the parameters not from the start, but after the last one consumed.
Of course, this doesn't mean you should use a va_list twice in 32bit Linux. It's undefined behavior, which happens to work in some implementations. Don't rely on it.
I understand that the difference between the printf, fprintf, sprintf etc functions and the vprintf, vfprintf, vsprintf etc functions has to do with how they deal with the function arguments. But how specifically? Is there really any reason to use one over the other? Should I just always use printf as that is a more common thing to see in C, or is there a legitimate reason to pick vprintf instead?
printf() and friends are for normal use. vprintf() and friends are for when you want to write your own printf()-like function. Say you want to write a function to print errors:
int error(char *fmt, ...)
{
int result;
va_list args;
va_start(args, fmt);
// what here?
va_end(args);
return result;
}
You'll notice that you can't pass args to printf(), since printf() takes many arguments, rather than one va_list argument. The vprintf() functions, however, do take a va_list argument instead of a variable number of arguments, so here is the completed version:
int error(char *fmt, ...)
{
int result;
va_list args;
va_start(args, fmt);
fputs("Error: ", stderr);
result = vfprintf(stderr, fmt, args);
va_end(args);
return result;
}
You never want to use vprintf() directly, but it's incredibly handy when you need to e.g. wrap printf(). For these cases, you will define the top-level function with variable arguments (...). Then you'll collect those into a va_list, do your processing, and finally call vprintf() on the va_list to get the printout happening.
The main difficulty with variadic arguments is not that there is a variable number of arguments but that there is no name associated with each argument. The va_start, va_arg macros parse the arguments in memory (in most C compilers they are on the stack) using the type information contained in the format string cf. Kernighan and Ritchie, second edition, section 7.3.
This example shows the elegance of Python. Since C/C++ cannot reconcile the difference between int error(char *fmt, ...) and int error(char *fmt, va_list ap), thus, for every function *printf, it has to create two versions, i.e., one taking in ..., the other taking in va_list, this essentially doubles the total number of functions. In Python, you can use *list() or **dict() to pass in a va_list as ....
Hopefully, future C/C++ can support this kind of argument processing scheme.
Below is code which includes a variadic function and calls to the variadic function. I would expect that it would output each sequence of numbers appropriately. It does when compiled as a 32-bit executable, but not when compiled as a 64-bit executable.
#include <stdarg.h>
#include <stdio.h>
#ifdef _WIN32
#define SIZE_T_FMT "%Iu"
#else
#define SIZE_T_FMT "%zu"
#endif
static void dumpargs(size_t count, ...) {
size_t i;
va_list args;
printf("dumpargs: argument count: " SIZE_T_FMT "\n", count);
va_start(args, count);
for (i = 0; i < count; i++) {
size_t val = va_arg(args, size_t);
printf("Value=" SIZE_T_FMT "\n", val);
}
va_end(args);
}
int main(int argc, char** argv) {
(void)argc;
(void)argv;
dumpargs(1, 10);
dumpargs(2, 10, 20);
dumpargs(3, 10, 20, 30);
dumpargs(4, 10, 20, 30, 40);
dumpargs(5, 10, 20, 30, 40, 50);
return 0;
}
Here is the output when compiled for 64-bit:
dumpargs: argument count: 1
Value=10
dumpargs: argument count: 2
Value=10
Value=20
dumpargs: argument count: 3
Value=10
Value=20
Value=30
dumpargs: argument count: 4
Value=10
Value=20
Value=30
Value=14757395255531667496
dumpargs: argument count: 5
Value=10
Value=20
Value=30
Value=14757395255531667496
Value=14757395255531667506
Edit:
Please note that the reason the variadic function pulls size_t out is because the real-world use of this is for a variadic function that accepts a list of pointers and lengths. Naturally the length argument should be a size_t. And in some cases a caller might pass in a well-known length for something:
void myfunc(size_t pairs, ...) {
va_list args;
va_start(args, count);
for (i = 0; i < pairs; i++) {
const void* ptr = va_arg(args, const void*);
size_t len = va_arg(args, size_t);
process(ptr, len);
}
va_end(args);
}
void user(void) {
myfunc(2, ptr1, ptr1_len, ptr2, 4);
}
Note that the 4 passed into myfunc might encounter the problem described above. And yes, really the caller should be using sizeof or the result of strlen or just plain put the number 4 into a size_t somewhere. But the point is that the compiler is not catching this (a common danger with variadic functions).
The right thing to do here is to eliminate the variadic function and replace it with a better mechanism that provides type safety. However, I would like to document this problem, and collect more detailed information as to exactly why this problem exists on this platform and manifests as it does.
So basically, if a function is variadic, it must conform to a certain calling convention (most importantly, the caller must clean up args, not the callie, since the callie has no idea how many args there will be).
The reason why it starts happening on the 4th is because of the calling convention used on x86-64. To my knowledge, both visual c++ and gcc use registers for the first few parameters, and then after that use the stack.
I am guessing that this is the case even for variadic functions (which does strike me as odd since it would make the va_* macros more complicated).
On x86, the standard C calling convention is the use the stack always.
The problem is that you're using size_t to represent the type of the values. This is incorrect, the values are actually normal 32 bit values on Win64.
Size_t should only be used for values which change size based on the 32 or 64 bit-ness of the platform (such as pointers). Change the code to use int or __int32 and this should fix your problem.
The reason this works fine on Win32 is that size_t is a different sized type depending on the platfrom. For 32 bit windows it will be 32 bits and on 64 bit windows it will be 64 bit. So on 32 bit windows it just happens to match the size of the data type you are using.
A variadic function is only weakly type checked. In particular, the function signature does not provide enough information for the compiler to know the type of each argument assumed by the function.
In this case, size_t is 32-bits on Win32 and 64-bits on Win64. It has to vary in size like that in order to perform its defined role. So for a variadic function to pull arguments out correctly which are of type size_t, the caller had to make certain that the compiler could tell that the argument was of that type at compile-time in the calling module.
Unfortunately 10 is a constant of type int. There is no defined suffix letter that marks a constant to be of type size_t. You could hide that fact inside a platform-specific macro, but that would be no clearer than writing (size_z)10 at the call site.
It appears to work partially because of the actual calling convention used in Win64. From the examples given, we can tell that the first four integral arguments to a function are passed in registers, and the rest on the stack. That allowed count and the first three variadic parameters to be read correctly.
However it only appears to work. You are actually standing squarely in Undefined Behavior territory, and "undefined" really does mean "undefined": anything can happen.
On other platforms, anything can happen too.
Because variadic functions are implicitly unsafe, a special burden is placed on the coder to make certain that the type of each argument known at compile time matches the type that argument will be assumed to have at run time.
In some cases where the interfaces are well known, it is possible to warn about type mismatch. For example, gcc can often recognize that the type of an argument to printf() doesn't match the format string, and issue a warning. But doing that in the general case for all variadic functions is hard.
The reason for this is because size_t is defined as a 32-bit value on 32-bit Windows, and a 64-bit value on 64-bit Windows. When the 4th argument is passed into the variadic function, the upper bits appear to be uninitialized. The 4th and 5th values that are pulled out are actually:
Value=0xcccccccc00000028
Value=0xcccccccc00000032
I can solve this problem with a simple cast on all the arguments, such as:
dumpargs(5, (size_t)10, (size_t)20, (size_t)30, (size_t)40, (size_t)50);
This does not answer all my questions, however; such as:
Why is it the 4th argument? Likely because the first 3 are in registers?
How does one avoid this situation in a type-safe portable manner?
Does this happen on other 64-bit platforms, using 64-bit values (ignoring that size_t might be 32-bit on some 64-bit platforms)?
Should I pull out the values as 32-bit values regardless of the target platform, and will that cause problems if a 64-bit value is pushed into the variadic function?
What do the standards say about this behavior?
Edit:
I really wanted to get a quote from The Standard, but it's something that's not hyperlink-able, and costs money to purchase and download. Therefore I believe quoting it would be a copyright violation.
Referencing the comp.lang.c FAQ, it's made clear that when writing a function that takes a variable number of arguments, there's nothing you can do for type safety. It's up to the caller to make sure that each argument either perfectly matches or is explicitly cast. There are no implicit conversions.
That much should be obvious to those who understand C and printf (note that gcc has a feature to check printf-style format strings), but what's not so obvious is that not only are the types not implicitly cast, but if the size of the types don't match what's extracted, you can have uninitialized data, or undefined behavior in general. The "slot" where an argument is placed might not be initialized to 0, and there might not be a "slot"--on some platforms you could pass a 64-bit value, and extract two 32-bit values inside the variadic function. It's undefined behavior.
If you are the one writing this function, it is your job to write the variadic function correctly and/or correctly document your function's calling conventions.
You already found that C plays fast-and-loose with types (see also signedness and promotion), so explicit casting is the most obvious solution. This is frequently seen with integer constants being explicitly defined with things like UL or ULL.
Most sanity checks on passed values will be application-specific or non-portable (e.g. pointer validity). You can use hacks like mandating that pre-defined sentinel value(s) be sent as well, but that's not infallible in all cases.
Best practice would be to document heavily, perform code reviews, and/or write unit tests with this bug in mind.