In which cases va_list should be used - c

I made a small C library that implements graph theory algorithms and binds them for use in Python.
I send it to a friend to check it and he told me that va_list is "dangerous" and must not be used in this kind of project.
So the question is. In which cases va_list should be used?

The main problem I see is that there's no guarantee that you really got the number of arguments that you were expecting, and no way to check for that. This makes errors undetectable, and undetectable errors are, obviously, the most dangerous kind. va_arg is also not type-safe, which means that if you pass a double and expect an unsigned long long, you'll get garbage instead of a good-looking integer, and no way to detect it at compile-time. (It becomes much more of a mess when the types don't even have the same size).
Depending on the data you deal with, this may be more or less of a problem. If you pass pointers, it becomes almost instantly fatal to omit an argument because your function will retrieve garbage instead, and this could (if the planets are properly aligned) become a vulnerability.
If you pass "regular" numeric data, it then depends on if the function is critical. In some cases you can easily detect an error looking at the function's output, and in some practical cases it really isn't that much of a problem if the function fails.
It all revolves about if you're afraid of forgetting arguments yourself, actually.
C++11 has a variadic template feature that allows you to treat an arbitrary number of parameters in a safe way. If the step from C to C++ isn't hurting too much, you could look into it.

In C++11, va_list should never be used, as it provides better alternative called variadic template, which is typesafe whereas va_list is not.
In C, you could use va_list when you need variadic function, but be careful, as it is not typesafe.
And yes, your friend is correct: va_list is dangerous. Avoid it as much as possible.
In C and C++03, the standard library function printf is implemented using va_list, that is why C++03 programmers usually avoid using this, for it is not typesafe.
But a variadic typesafe printf could be implemented in C++11, as: (taken from wiki)
void printf(const char *s)
{
while (*s) {
if (*s == '%' && *(++s) != '%')
throw std::runtime_error("invalid format string: missing arguments");
std::cout << *s++;
}
}
template<typename T, typename... Args>
void printf(const char *s, T value, Args... args)
{
while (*s) {
if (*s == '%' && *(++s) != '%') {
std::cout << value;
++s;
printf(s, args...);
return;
}
std::cout << *s++;
}
throw std::logic_error("extra arguments provided to printf");
}

va_list has some disadvanges that are related to the underspecification of the function arguments:
when calling such a function the compiler doesn't know what types of
arguments are expected, so the standard imposes some "usual
conversion" before the arguments are passed to the function. E.g all
integers that are narrower than int are promoted, all float are
promoted to double. In some border case you'd not received what you
wanted in the called function.
In the called function you tell the compiler what type of argument
you expect and how much of them. There is no guarantee that a caller get's it right.
If you pass in the number of arguments anyhow and these are of the same known type you could just pass them in with a temporary array, written for C99:
void add_vertices(graph G, vertex v, size_t n, vertex neigh[n]);
you would call this something like that
add_vertices(G, v, nv, (vertex []){ 3, 5, 6, 7 });
If that calling convention looks too ugly to you, you could wrap it in a macro
#define ADD_VERTICES(G, V, NV, ... ) add_vertices((G), (V), (NV), (vertex [NV]){ __VA_ARG__ })
ADD_VERTICES(G, v, nv, 3, 5, 6, 7);
here the ... indicates a similar concept for macros. But the result is much safer since the compiler can do a type check and this is not delayed to the execution.

If you want to implement a function in C with variable argument count, you can use va_list. For example, printf uses va_list. Not sure why it can be dangerous.

Related

Despite having no datatypes for arguments and x,y are not global variables, How is this code not showing any errors and is working perfectly? [duplicate]

What is useful about this C syntax — using 'K&R' style function declarations?
int func (p, p2)
void* p;
int p2;
{
return 0;
}
I was able to write this in Visual Studios 2010beta
// yes, the arguments are flipped
void f()
{
void* v = 0;
func(5, v);
}
I don't understand. What's the point of this syntax? I can write:
int func (p, p2)
int p2;
{
return 0;
}
// and write
int func (p, p2)
{
return 0;
}
The only thing it seems to specify is how many parameters it uses and the return type. I guess parameters without types is kind of cool, but why allow it and the int paranName after the function declarator? It's weird.
Also is this still standard C?
The question you are asking is really two questions, not one. Most replies so far tried to cover the entire thing with a generic blanket "this is K&R style" answer, while in fact only a small part of it has anything to do with what is known as K&R style (unless you see the entire C language as "K&R-style" in one way or another :)
The first part is the strange syntax used in function definition
int func(p, p2)
void *p;
int p2; /* <- optional in C89/90, but not in C99 */
{
return 0;
}
This one is actually a K&R-style function definition. Other answer have covered this pretty well. And there's not much to it, actually. The syntax is deprecated, but still fully supported even in C99 (except for "no implicit int" rule in C99, meaning that in C99 you can't omit the declaration of p2).
The second part has little to do with K&R-style. I refer to the fact that the function can be called with "swapped" arguments, i.e. no parameter type checking takes place in such a call. This has very little to do with K&R-style definition per se, but it has everything to do with your function having no prototype. You see, in C when you declare a function like this
int foo();
it actually declares a function foo that takes an unspecified number of parameters of unknown type. You can call it as
foo(2, 3);
and as
j = foo(p, -3, "hello world");
ans so on (you get the idea);
Only the call with proper arguments will "work" (meaning that the others produce undefined behavior), but it is entirely up to you to ensure its correctness. The compiler is not required to diagnose the incorrect ones even if it somehow magically knows the correct parameter types and their total number.
Actually, this behavior is a feature of C language. A dangerous one, but a feature nevertheless. It allows you to do something like this
void foo(int i);
void bar(char *a, double b);
void baz(void);
int main()
{
void (*fn[])() = { foo, bar, baz };
fn[0](5);
fn[1]("abc", 1.0);
fn[2]();
}
i.e. mix different function types in a "polymorphic" array without any typecasts (variadic function types can't be used here though). Again, inherent dangers of this technique are quite obvious (I don't remember ever using it, but I can imagine where it can be useful), but that's C after all.
Finally, the bit that links the second part of the answer to the first. When you make a K&R-style function definition, it doesn't introduce a prototype for the function. As far as function type is concerned, your func definition declares func as
int func();
i.e. neither the types nor the total number of parameters are declared. In your original post you say "... it seems to specify is how many params it uses ...". Formally speaking, it doesn't! After your two-parameter K&R-style func definition you still can call func as
func(1, 2, 3, 4, "Hi!");
and there won't be any constraint violation in it. (Normally, a quality compiler will give you a warning).
Also, a sometimes overlooked fact is that
int f()
{
return 0;
}
is also a K&R-style function definition that does not introduce a prototype. To make it "modern" you'd have to put an explicit void in the parameter list
int f(void)
{
return 0;
}
Finally, contrary to a popular belief, both K&R-style function definitions and non-prototyped function declarations are fully supported in C99. The former has been deprecated since C89/90, if I remember correctly. C99 requires the function to be declared before the first use, but the declaration is not required to be a prototype. The confusion apparently stems from the popular terminological mix-up: many people call any function declaration "a prototype", while in fact "function declaration" is not the same thing as "prototype".
This is pretty old K&R C syntax (pre-dates ANSI/ISO C). Nowadays, you should not use it anymore (as you have already noticed its major disadvantage: the compiler won't check the types of arguments for you). The argument type actually defaults to int in your example.
At the time, this syntax was used, one sometimes would find functions like
foo(p, q)
{
return q + p;
}
which was actually a valid definition, as the types for p, q, and the return type of foo default to int.
This is simply an old syntax, that pre-dates the "ANSI C" syntax you might be more familiar with. It's called "K&R C", typically.
Compilers support it to be complete, and to be able to handle old code bases, of course.
This is the original K&R syntax before C was standardized in 1989. C89 introduced function prototypes, borrowed from C++, and deprecated the K&R syntax. There is no reason to use it (and plenty of reasons not to) in new code.
That's a relic from when C had no prototypes for functions. Way back then, (I think) functions were assumed to return int and all its arguments were assumed to be int. There was no checking done on function parameters.
You're much better off using function prototypes in the current C language.
And you must use them in C99 (C89 still accepts the old syntax).
And C99 requires functions to be declared (possibly without a prototype). If you're writing a new function from scratch, you need to provide a declaration ... make it a prototype too: you lose nothing and gain extra checking from the compiler.

Using va_list as an array in C

Is it safe and defined behaviour to read va_list like an array instead of using the va_arg function?
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("First argument: %d\n", *((int*)valist));
printf("Second argument: %d\n", *(((int*)valist)+1));
va_end(valist);
}
Same question for assigningment
EX:
void func(int string_count, ...)
{
va_start(valist, string_count);
printf("Third argument: %d\n", *(((int*)valist)+2));
*((int*)valist+2)=33;
printf("New third argument: %d\n", *(((int*)valist)+2));
va_end(valist);
}
PS: This seems to work on GCC
No, it is not, you cannot assume anything because the implementation varies across libraries.
The only portable way to access the values is by using the macros defined in stdarg.h for accessing the
ellipsis. The size of the type is important, otherwise you end up reading garage
and if your read more bytes than has been passed, you have undefined behaviour.
So, to get a value, you have to use va_arg.
See: STDARG documentation
You cannot relay on a guess as to how va_list works, or on a particular
implementation. How va_list works depends on the ABI, the architecture, the
compiler, etc. If you want a more in-depth view of va_list, see
this answer.
edit
A couple of hours ago I wrote this answer explaining how to use the
va_*-macros. Take a look at that.
No, this is not safe and well-defined. The va_list structure could be anything (you assume it is a pointer to the first argument), and the arguments may or may not be stored contiguously in the "right order" in some memory area being pointed to.
Example of va_list implementation that doesn't work for your code - in this setup some arguments are passed in registers instead of the stack, but the va_arg still has to find them.
If an implementation's documentation specifies that va_list may be used in ways beyond those given in the Standard, you may use them in such fashion on that implementation. Attempting to use arguments in other ways may have unpredictable consequences even on platforms where the layout of parameters is specified. For example, on a platform where variadic arguments are pushed on the stack in reverse order, if one were to do something like:
int test(int x, ...)
{
if (!x)
return *(int*)(4+(uintptr_t)&x); // Address of first argument after x
... some other code using va_list.
}
int test2(void)
{
return test(0, someComplicatedComputation);
}
a compiler which is processing test2 might look at the definition of test,
notice that it (apparently) ignores its variadic arguments when the first
argument is zero, and thus conclude that it doesn't need to compute and
pass the result of someComplicatedComputation. Even if the documentation
for the platform documents the layout of variadic arguments, the fact that
the compiler can't see that they are accessed may cause it to conclude that
they are not.

Pointer to void as an argument in a function with no prototype for variable number of arguments

Say I have a function that should accept any number of parameters, so what im coing here is declaring no prototype, and letting the function to be created when it is called in the code. I am using a pointer to void to receive the random number of parametersparameters, however, when doing this, the reference to the memory addres of the first parameter is the only thing that is passed, so for it to work, i would have to declare variables in the same order that i am going to call them in the code:
unsigned char result=0;
unsigned char a=1;
unsigned char b=2;
unsigned char c=3;
char main (void)
{
for (;;)
{
result = function (&a, &b, &c);
result = function (&c, &b, &a);
}
}
function (void *vPointer)
{
return (1);
}
Also I am declaring function without a type since it would not match the call (where it is implicitly declared also).
The result here is a reference to the first parameter sent in the function, so if i point to the next addres in the first function call, it would work, but in the second call, it gets the reference to c, and whatever memory is ahead of where it is placed.
Anyone know a way of sorting the parameters references the correct way? or an effective way to receive an unknown number of parameters in a function?
NOTE: (...) SHALL NOT be used.
All C functions should have prototypes. They're not actually mandatory, but there's no good reason not to use them (unless you're stuck with a pre-ANSI compiler that doesn't support them). (But see the bottom of this answer.)
If you want a function that takes a variable number of arguments, that prototype should end with , ..., and the function itself should use the <stdarg.h> mechanism to process its arguments. (This requires at least one argument with a defined type; that argument is used as an anchor for the following arguments.) It's documented here and elsewhere.
As I was typing this, you updated your question with "NOTE: No libraries (such as (...) )should be used". <stdarg.h> is one of the handful headers that's required for all conforming C implementations, including freestanding (embedded) ones -- because it doesn't define any functions, just types and macros. Your C implementation should support it. If it doesn't, then it's not a conforming C implementation, and you'll need to tell us exactly what compiler you're using and/or read its documentation to find out how it handles variadic functions, or an equivalent.
If you really can't use , ... and <stdarg.h>, (or perhaps the older <varargs.h>), then you can define your function with a fixed number of arguments, enough for all uses, then have callers pass extra null pointers.
EDIT:
This is an update based on new information in comments and chat.
The OP has a homework assignment to implement printf for some TI microcontroller, for some reason not using either the , ... notation or <stdarg.h>. The compiler in question apparently implements C89/C90, so it does support both features; this is an arbitrary restriction.
This information should have been in the question, which is why I'm downvoting it until the OP updates it.
There is no portable way to achieve this -- which is exactly why , ... is part of the standard language, and <stdarg.h> is part of the standard library.
Probably the best approach would be to write a program that uses , ... and <stdarg.h>, then invoke the compiler so it shows just the output of the preprocessor (resolving the various va_* macros and the va_list type), and then imitate that. And you'd have to assume, or verify using the compiler documentation, that the calling convention for variadic and non-variadic functions is compatible. In other words, find out what this particular implementation does, and reinvent a similar wheel.
(I hope that the point of the homework assignment is to demonstrate how much better the standard techniques are.)
UPDATE 2:
I wrote above that all C functions should have prototypes. This may actually be a rare exception to this rule. At least one of these calls:
printf("Hello\n");
printf("x = %d\n", 42);
must produce a diagnostic from a conforming compiler unless either printf is declared with , ... (which is forbidden by the homework assignment), or there is no visible prototype for printf. If there's no prototype, then at least one of the calls will have undefined behavior (behavior that's not defined by the C standard, though it may be defined by a particular compiler).
In effect, to meet the homework requirements, you'll have to pretend that you're using a pre-ANSI C compiler.
the only "clean" way to use functions with variable arguments is to use variadic functions:
#include <stdarg.h>
void myfun(int foo, ...) {
va_list ap;
va_start(foo, ap);
// ...
va_end(ap);
}
you will need to make sure that you know which arguments you actually expect (usually you either use your first argument to indicate how many (and which) arguments to expect (examples are an int that says "now come arguments", or a format-string like "%d %s:%s", that says now come an int and two char*), or you use a a final terminating argument (e.g. read arguments until you encounter NULL).
You could use an array of variable length:
unsigned char result=0;
unsigned char a=1;
unsigned char b=2;
unsigned char c=3;
function (int len, void *vPointer);
int main (void)
{
for (;;)
{
unsigned char args[3];
args[0] = a;
args[1] = b;
args[2] = c;
result = function (3, args);
args[0] = c;
args[1] = b;
args[2] = a;
result = function (3, args);
}
return 0;
}
function (int len, void *vPointer)
{
return (1);
}
But I recommend you use the standard way instead, i.e. variadic functions.
//jk
You can use a structure:-
typedef struct _Params {
int m_a;
int m_b;
int m_c;
} Params;
Then your parameters can't get mixed up. Just as more letters up to the max you need.

Is it possible to portably define a function that accepts any number of arguments?

I have a vague understanding of what 'C portability' means (IIRC, it means that the C language spans multiple platforms, but must adhere to certain standards (?)) and would like to know if we can portably write programs that would accept any number of arguments (even none).
What I do know is that without considering portability, all functions written with an empty pair of parentheses '()' can accept x-amount of arguments, but what about with it? It seems that there are a few papers on encouraging limitations regarding the number of arguments accepted by portably defined functions but I have yet to find one that says we cannot pass x-number of arguments (where is x is not bounded).
a function defined with an empty set of parenthesis:
void f() {...}
accept no parameters and it is undefined behavior to call it with any.
a function declared with an empty set of parenthesis:
void g();
must be called with the same number and same type of parameters that it has been defined, but the compiler won't check it. It is an undefined behavior if you mess up.
to declate a function as taking no parameter (and thus getting an error message if you mess up), use
void g(void);
a function may be variadic, but to call it you must see a declaration which state the function is variadic:
void h(int nb, ...);
(Not that in C a variadic function must have at least one non variadic parameter, in C++ it isn't the case) A variadic function must be called with at least the non variadic argument specified.
The minimum limit on the number of parameters (for any function) is 127. (Well, what the standard says is that an implementation must be able to compile at least one program reaching that limit).
--
To clear up a confusion:
void f() { ... }
vs
void g(void) { ... }
f is defined as accepting no parameters and can't be called with any by declared as accepting an unknow number of parameters so the compiler will not ensure that no paramaters are given. g is defined and declared as accepting no parameters. It's a left over of K&R days with functions definitions like
int h()
int i;
{...}
which declares a function taking an indeterminate number of parameters but define it as taking one int parameter. This style is totally out of fashion and the only case of practical importance remaining is the f vs g one.
Use <stdarg.h>.
See, for example, section 15, question 4 of the C-FAQ.
Edit: example of stdarg.h> usage
#include <stdarg.h>
#include <stdio.h>
/* sum all values (no overflow checking) up to a 0 (zero) */
int sum(int v, ...) {
va_list va;
int s = v;
if (v) {
va_start(va, v);
while ((v = va_arg(va, int))) {
s += v;
}
va_end(va);
}
return s;
}
int main(void) {
printf("sum(5, 4, 6, 10) is %d\n", sum(5, 4, 6, 10, 0));
return 0;
}

Why does this variadic function fail on 4th parameter on Windows x64?

Below is code which includes a variadic function and calls to the variadic function. I would expect that it would output each sequence of numbers appropriately. It does when compiled as a 32-bit executable, but not when compiled as a 64-bit executable.
#include <stdarg.h>
#include <stdio.h>
#ifdef _WIN32
#define SIZE_T_FMT "%Iu"
#else
#define SIZE_T_FMT "%zu"
#endif
static void dumpargs(size_t count, ...) {
size_t i;
va_list args;
printf("dumpargs: argument count: " SIZE_T_FMT "\n", count);
va_start(args, count);
for (i = 0; i < count; i++) {
size_t val = va_arg(args, size_t);
printf("Value=" SIZE_T_FMT "\n", val);
}
va_end(args);
}
int main(int argc, char** argv) {
(void)argc;
(void)argv;
dumpargs(1, 10);
dumpargs(2, 10, 20);
dumpargs(3, 10, 20, 30);
dumpargs(4, 10, 20, 30, 40);
dumpargs(5, 10, 20, 30, 40, 50);
return 0;
}
Here is the output when compiled for 64-bit:
dumpargs: argument count: 1
Value=10
dumpargs: argument count: 2
Value=10
Value=20
dumpargs: argument count: 3
Value=10
Value=20
Value=30
dumpargs: argument count: 4
Value=10
Value=20
Value=30
Value=14757395255531667496
dumpargs: argument count: 5
Value=10
Value=20
Value=30
Value=14757395255531667496
Value=14757395255531667506
Edit:
Please note that the reason the variadic function pulls size_t out is because the real-world use of this is for a variadic function that accepts a list of pointers and lengths. Naturally the length argument should be a size_t. And in some cases a caller might pass in a well-known length for something:
void myfunc(size_t pairs, ...) {
va_list args;
va_start(args, count);
for (i = 0; i < pairs; i++) {
const void* ptr = va_arg(args, const void*);
size_t len = va_arg(args, size_t);
process(ptr, len);
}
va_end(args);
}
void user(void) {
myfunc(2, ptr1, ptr1_len, ptr2, 4);
}
Note that the 4 passed into myfunc might encounter the problem described above. And yes, really the caller should be using sizeof or the result of strlen or just plain put the number 4 into a size_t somewhere. But the point is that the compiler is not catching this (a common danger with variadic functions).
The right thing to do here is to eliminate the variadic function and replace it with a better mechanism that provides type safety. However, I would like to document this problem, and collect more detailed information as to exactly why this problem exists on this platform and manifests as it does.
So basically, if a function is variadic, it must conform to a certain calling convention (most importantly, the caller must clean up args, not the callie, since the callie has no idea how many args there will be).
The reason why it starts happening on the 4th is because of the calling convention used on x86-64. To my knowledge, both visual c++ and gcc use registers for the first few parameters, and then after that use the stack.
I am guessing that this is the case even for variadic functions (which does strike me as odd since it would make the va_* macros more complicated).
On x86, the standard C calling convention is the use the stack always.
The problem is that you're using size_t to represent the type of the values. This is incorrect, the values are actually normal 32 bit values on Win64.
Size_t should only be used for values which change size based on the 32 or 64 bit-ness of the platform (such as pointers). Change the code to use int or __int32 and this should fix your problem.
The reason this works fine on Win32 is that size_t is a different sized type depending on the platfrom. For 32 bit windows it will be 32 bits and on 64 bit windows it will be 64 bit. So on 32 bit windows it just happens to match the size of the data type you are using.
A variadic function is only weakly type checked. In particular, the function signature does not provide enough information for the compiler to know the type of each argument assumed by the function.
In this case, size_t is 32-bits on Win32 and 64-bits on Win64. It has to vary in size like that in order to perform its defined role. So for a variadic function to pull arguments out correctly which are of type size_t, the caller had to make certain that the compiler could tell that the argument was of that type at compile-time in the calling module.
Unfortunately 10 is a constant of type int. There is no defined suffix letter that marks a constant to be of type size_t. You could hide that fact inside a platform-specific macro, but that would be no clearer than writing (size_z)10 at the call site.
It appears to work partially because of the actual calling convention used in Win64. From the examples given, we can tell that the first four integral arguments to a function are passed in registers, and the rest on the stack. That allowed count and the first three variadic parameters to be read correctly.
However it only appears to work. You are actually standing squarely in Undefined Behavior territory, and "undefined" really does mean "undefined": anything can happen.
On other platforms, anything can happen too.
Because variadic functions are implicitly unsafe, a special burden is placed on the coder to make certain that the type of each argument known at compile time matches the type that argument will be assumed to have at run time.
In some cases where the interfaces are well known, it is possible to warn about type mismatch. For example, gcc can often recognize that the type of an argument to printf() doesn't match the format string, and issue a warning. But doing that in the general case for all variadic functions is hard.
The reason for this is because size_t is defined as a 32-bit value on 32-bit Windows, and a 64-bit value on 64-bit Windows. When the 4th argument is passed into the variadic function, the upper bits appear to be uninitialized. The 4th and 5th values that are pulled out are actually:
Value=0xcccccccc00000028
Value=0xcccccccc00000032
I can solve this problem with a simple cast on all the arguments, such as:
dumpargs(5, (size_t)10, (size_t)20, (size_t)30, (size_t)40, (size_t)50);
This does not answer all my questions, however; such as:
Why is it the 4th argument? Likely because the first 3 are in registers?
How does one avoid this situation in a type-safe portable manner?
Does this happen on other 64-bit platforms, using 64-bit values (ignoring that size_t might be 32-bit on some 64-bit platforms)?
Should I pull out the values as 32-bit values regardless of the target platform, and will that cause problems if a 64-bit value is pushed into the variadic function?
What do the standards say about this behavior?
Edit:
I really wanted to get a quote from The Standard, but it's something that's not hyperlink-able, and costs money to purchase and download. Therefore I believe quoting it would be a copyright violation.
Referencing the comp.lang.c FAQ, it's made clear that when writing a function that takes a variable number of arguments, there's nothing you can do for type safety. It's up to the caller to make sure that each argument either perfectly matches or is explicitly cast. There are no implicit conversions.
That much should be obvious to those who understand C and printf (note that gcc has a feature to check printf-style format strings), but what's not so obvious is that not only are the types not implicitly cast, but if the size of the types don't match what's extracted, you can have uninitialized data, or undefined behavior in general. The "slot" where an argument is placed might not be initialized to 0, and there might not be a "slot"--on some platforms you could pass a 64-bit value, and extract two 32-bit values inside the variadic function. It's undefined behavior.
If you are the one writing this function, it is your job to write the variadic function correctly and/or correctly document your function's calling conventions.
You already found that C plays fast-and-loose with types (see also signedness and promotion), so explicit casting is the most obvious solution. This is frequently seen with integer constants being explicitly defined with things like UL or ULL.
Most sanity checks on passed values will be application-specific or non-portable (e.g. pointer validity). You can use hacks like mandating that pre-defined sentinel value(s) be sent as well, but that's not infallible in all cases.
Best practice would be to document heavily, perform code reviews, and/or write unit tests with this bug in mind.

Resources