Related
I understand that for an ordinary C functions:
g1(f1(), f2() f3())
that the order of evaluating the arguments is unspecified: f3() might be called before f1() or vice versa.
But does the same hold true for varadic functions? The definition of va_arg() says:
Each invocation of the va_arg macro modifies ap to point to the next variable argument.
Though it doesn't specify what it means by 'next' (left to right or right to left?), common use cases make it seem likely that it means left to right.
Furthermore, can one assume that the required first argument is evaluated before (or after) the variable arguments? Or is that also unspecified?
void g2(int a, ...);
I suppose a strict reading of this says that one cannot assume any particular order of evaluation. But it certainly would make writing functions like printf() much more difficult, if not intractable.
The order of evaluation of function arguments does not (strictly) depend on whether the function in question is variadic or not. In both cases, the order is unspecified.
The description of va_arg is just telling you what order it reads the arguments in. By the time you're inside the variadic function, the arguments were already evaluated and passed to the function.
Each invocation of the va_arg macro modifies ap to point to the next variable argument.
This refers to "left to right", with the arguments passed to ....
Furthermore, can one assume that the required first argument is evaluated before (or after) the variable arguments?
This is also unspecified. There is no exception for variadic arguments when it comes to order of evaluation, AFAIK.
The arguments are all evaluated when you call the function, although the order is not specificed. By the time you use the va_arg macro, you have already evaluated all the arguments.
It's the same way if I call f(int a, int b), both a and b have already been evaluated by the time I'm in the body of f.
I understand that for an ordinary C functions [...] the order of evaluating the arguments is unspecified [...] But does the same hold true for varadic functions? The definition of va_arg() says [...]
C specifies that
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call
(C17 6.5.2.2/7)
That applies to all functions, including variadic ones. Among other things, it means that the specifications for the va_arg macro are irrelevant to the order of evaluation of the actual arguments to the function in which that macro appears. All actual arguments to the function are evaluated before execution of the function body begins.
The only distinction C draws on the calling side between variadic functions and non-variadic functions is in the argument type conversions that apply. The variable arguments to a variadic function (or to a function without an in-scope prototype) are subject to the default argument promotions, whereas that is not the case for non-variadic arguments to functions with in-scope prototypes.
Furthermore, can one assume that the required first argument is evaluated before (or after) the variable arguments? Or is that also unspecified?
It is also unspecified. Again, the only distinction C makes between the semantics of calling a variadic function and of calling a non-variadic one is to do with rules for argument type promotions. And that's not really so much a distinction as it is covering cases that don't otherwise arise, and even then it is in a manner that is consistent with other C semantics for arguments whose types are not specified via a function prototype.
I suppose a strict reading of this says that one cannot assume any particular order of evaluation. But it certainly would make writing functions like printf() much more difficult, if not intractable.
No "strict" reading is required. C does not specify any rules for the relative order of evaluation of arguments to the same function call. Period. But that causes no particular issue with the implementation of variadic functions, because all the arguments are evaluated before execution of the function body starts. Variadic functions are subject to constraints on the order in which the values of the variable arguments are read within the function, but that has nothing to do with the order in which the argument expressions are evaluated on the caller's side.
I know that how arguments are passed to functions is not part of the C standard, and is dependent on the hardware architecture and calling convention.
I also know that an optimizing compiler may automatically inline functions to save on call overhead, and omit code that has no "side effects".
But, I have a question about a specific case:
Lets say there is a non trivial function that can not be inlined or removed, and must be called, that is declared to take no arguments:
int veryImportantFunc() {
/* do some important stuff */
return result;
}
But this function is called with arguments:
int result = veryImportantFunc(1, 2, 3);
Is the compiler allowed to call the function without passing these arguments?
Or is there some standard or technical limitation that would prevent this kind of optimization?
Also, what if argument evaluation has side effects:
int counter = 1;
int result = veryImportantFunc(1, ++counter, 3);
Is the compiler obligated to evaluate even without passing the result, or would it be legal to drop the evaluation leaving counter == 1?
And finally, what about extra arguments:
char* anotherFunc(int answer) {
/* Do stuff */
return question;
}
if this function is called like this:
char* question = anotherFunc(42, 1);
Can the 1 be dropped by the compiler based on the function declaration?
EDIT: To clarify: I have no intention of writing the kind of code that is in my examples, and I did not find this in any code I am working on.
This question is to learn about how compilers work and what the relevant standards say, so to all of you who advised me to stay away from this kind of code: thank you, but I already know that.
To begin with, "declared to take no arguments" is wrong. int veryImportantFunc() is a function accepting any arguments. This is obsolete C style and shouldn't be used. For a function taking no arguments, use (void).
Is the compiler allowed to call the function without passing these arguments?
If the actual function definition does not match the number of arguments, the behavior is undefined.
Also, what if argument evaluation has side effects
Doesn't matter, since arguments are evaluated (in unspecified order) before the function is called.
Is the compiler obligated to evaluate even without passing the result, or would it be legal to drop the evaluation leaving counter == 1?
It will evaluate the arguments and then invoke undefined behavior. Anything can happen.
And finally, what about extra arguments:
Your example won't compile, as it isn't valid C.
The following quotes from the C standard are relevant to your different questions:
6.5.2.2 Function calls
...
2. If the expression that denotes the called function has a type that includes a prototype, the number of arguments shall agree with the number of parameters.
...
4. An argument may be an expression of any complete object type. In preparing for the call to a function, the arguments are evaluated, and each parameter is assigned the value of the corresponding argument.
...
6. If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined.
...
10. There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.
Lets say there is a non trivial function that can not be inlined or removed, and must be called, that is declared to take no arguments:
int veryImportantFunc() {
/* do some important stuff */
return result;
}
But this function is called with arguments:
There are two possibilities:
the function is declared with a "full" prototype such as
int veryImportantFunc(void);
in this case the call with extra arguments won't compile, as the number of parameters and arguments must match;
the function is declared as taking an unspecified number of arguments, i.e. the declaration visibile to the call site is
int veryImportantFunc();
in this case, the call is undefined behavior, as the usage doesn't match the actual function definition.
All the other considerations about optimization aren't particularly interesting, as what you are trying to do is illegal however you look at it.
We can stretch this and imagine a situation where passing extra useless arguments is legal, for example a variadic function never using the extra arguments.
In this case, as always, the compiler is free to perform any such optimization as long as the observable behavior isn't impacted, and proceeds "as if" performed according to the C abstract machine.
Given that the details of arguments passing aren't observables1, the compiler could in line of principle optimize away the argument passing, while the arguments evaluation may still need to be done if it has some observable impact on the program state.
That being said, I have a hard time imagining how such optimization may be implemented in the "classical linking model", but with LTCG it shouldn't be impossible.
The only observable effects according to the C standard are IO and reads/writes on volatile variables.
Following Pascals theory, it is better to be wrong in believing the compiler can make this optimisation than be right in believing it doesn’t. It serves no purpose to wrongly define a function; if you really must, you can always put a stub in front of it:
int RealSlimShady(void) {
return Dufus;
}
int MaybeSlimShady(int Mathew, int Mathers) {
Used(Mathew);
Used(Mathers);
return RealSlimShady();
}
Everyone is happy, and if your compiler is worth its salt, there will be 0 code overhead.
This is a question about norms in standard C11, concerning to side effects when function arguments are evaluated in an expression.
I am trying to define a macro in standard C that emulate the "method"-like syntax of an OOP language, in a rudimentary way.
I have designed a solution, whose main ideas I will expose here, but I have some doubts about its conformance to C11.
I need to do the exposition first and at the end I will make the specific question, which is related to evaluation of expressions involving function calls. Sorry by the long post.
So, given a struct, or well a struct * object x, I would be happy if I could do a "method" call, in this way:
x->foo_method();
The typical manner in that this problem is solved is something like that:
Define the "class" by means of a struct declaration:
typedef struct foo_s { void foo_method(struct foo_s * this); } *foo_class;
foo_class x = malloc(sizeof(struct foo_s));
x->foo_method = some_function_defined_over_there___;
Then make the call by repeating the object in the ``this'' parameter:
x->foo_method(x);
One can try to define some kind of "method-call" macro:
#define call(X, M) ((X)->M(X))
However this approach is bad, since the evaluation of X can duplicate side-effects (this is the well-known pitfail of repeating a macro parameter twice).
[By using tricky macros I can handle the case of an arbitrary number of parameters for the method M, for example, by using __VA_ARGS__ and a few of intermediate macro hacks.]
To solve the problem of repetition of macro arguments, I decided to implement a global stack, maybe hidden as a static array in a function:
(void*) my_stack(void* x, char* operation)
{
static void* stack[100] = { NULL, };
// 'operation' selects "push" or "pop" operations on the stack.
// ...
// IF (operation == "push") then 'x' itself is returned again.
}
So now, I avoid the duplication of side effects in the macro, by writtin `X' only once:
#define call(X, M) (((foo_class)my_stack((X), "push")) -> M (my_stack(0,"pop")))
As you can see, my intent is that the function-like macro be considered by the C-compiler as a expression, whose value is the value returned by the method M.
I have written only once the paramater X inside the macro-body, its value was stored in the stack. Since one needs this value be able to access the "method" member of X itself, this is the reason why the function my_stack returns the value of x: I need to reuse it immediately as part of the same expression that has pushed the value x in the stack.
Although this idea seems to easily solve the problem of duplication of X in the call(X,M) macro, they appear more issues.
One can have "methods" whose arguments are also objects stored in the stack by using the same macro call().
Even more, we could have arguments in the "method" whose values are obtained as the result of evaluating other "methods".
Finally, other functions or methods appearing as arguments of a given "method" could modify the stack, because they are, probably, functions modifying the stack by using the call() macro.
I want that my macro be consistent in all that cases. For example, let us suppose that x1,x2,x3, are foo_class objects.
On the other hand, let us suppose that we have, in foo_class, the following "method" member:
int (*meth)(foo_class this, int, int);
Finally, we could make the "method" call:
call(x1, meth, (call (x2, 2, 2), call(x3, 3, 3)) ) ;
[The real syntax for the macro is not necessarilly as it's been showing there. I expect that the main idea is understood.]
The intent is to emulate this function call:
x1->meth(x1, x2->meth(x2,2,2), x3->meth(x3,3,3));
The problem here is that I'm using a stack to emulate the following duplications of objects in the call: x1->meth(x1,....), x2->meth(x2,...), x3->meth(x3,...).
For example: ((foo_class)(my_stack(x2,"push"))) -> meth (my_stack(0,"pop"), ...).
MY QUESTION IS: Can I be always sure the paring "push"/"pop" in any possible expression (that consistently use the call() macro) gives all the time the expected pair of objects?
For example, if I am "pushing" x2, it would be completely wrong that the x3 be "popped".
MY CONJECTURE IS: The answer would be YES, but after a deep analysis of the standard document ISO C11 around the topic of sequence points.
There is a sequence point between the expression that produces the "method" (actually, the "function") to be called, and the expressions of the arguments to be passed to it.
Thus, for example, x1 is stored in the stack before the meth method is considered to be called.
There is a sequence point after the evaluation of all the arguments passed to the function and before the actual function call.
Thus, for example, if new objects x4, x5, etc. are "pushed"/"popped" in the stack while the call to x1->meth(x1...x2...x3) happens, these objects x4 and x5 will appear and disappear in the stack after x2, x3, have already gone from the stack.
There are not any sequence point between arguments in a function call.
Thus, the following expressions could interleave when they are been evaluated (when they are argurments of the function call shown above, involving x1,x2,x3):
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
my_stack(x3,"push") -> meth(my_stack(0,"pop"),3,3)
It could happen that after the objects x2 and x3 be "pushed" in the stack, the "pop" operations could happen ill paired: x3 could be "popped" in the meth(...,2,2) line, and x2 could be "popped" in the meth(...,3,3) line, against the desired.
This situationg is completely unlikely, and it seems that under Standard C99 there is not formal solution.
However, in C11 we have the concept of inderminately sequenced side effects.
We have that:
When a function is called, all its side effects are resolved indeterminaly sequenced respect any other expression around the expression that makes the function call. [See paragraph (12) here: sequence points].
Since the side effects in the function call of meth involved in the expression:
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
have to resolved "completely before" or "completely after" the side effects in:
my_stack(x3,"push") -> meth(my_stack(0,"pop"),3,3)
I conclude that the "push" and "pop" operations are well paired.
Is it OK my interpretation of the standard? I will cite it, just in case:
[C11, 6.5.2.2/10] There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.94)
[Footnote 94]: In other words, function executions do not ‘‘interleave’’ with each other.
That is, although the order of evaluation of arguments in a function call cannot be predicted, I think that one can, anyway, be sure that the rules of "sequence" stablished in ISO C11 are enough to ensure that the "push" and "pop" operations work well in this case.
So, a "method"-like syntax can be used in C, in order to emulate a rudimentary but consistent OOP capability of "methods as members of objects".
No, I don't think you can be guaranteed that this does what you want.Let us decompose your expression
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
<<<<<< A >>>>>>>>>> <<<<<<< B >>>>>>
<<<<<<<<<<<<< C >>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<< D >>>>>>>>>>>>>>>>>>>>>>>>
The evaluations of B and C are completely independent and must both be done before the function call D. The arguments of a function and the function designator are not much different for that.
Because A and B are function calls, they are in fact sequenced, but this is indeterminely, so you don't know which one comes first and which second.
I think you would be far better off by making your call an inline function. If you really need versions of call for different types, you could go to select the function with a _Generic expression. But as someone already said in the comments, you are really at the limits of what you should reasonably do in C.
If I had the folowing declaration:
extern volatile int SOME_REGISTER;
and later on:
void trigger_read_register()
{
SOME_REGISTER;
}
would calling trigger_read_register() issue a read request on SOME_REGISTER ?
According to the C11 spec, accessing a volatile is considered a side effect, and thus the compiler shouldn't optimize the (otherwise useless) access in your example.
So, the answer is that yes, it should read from memory.
See C11 standard (draft), section 5.1.2.3 section 2:
Accessing a volatile object, modifying an object, modifying a file, or
calling a function that does any of those operations are all side
effects, which are changes in the state of the execution
environment. Evaluation of an expression in general includes both
value computations and initiation of side effects. Value computation
for an lvalue expression includes determining the identity of the
designated object.
Further, 4 says:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
I am studying about undefined behavior in C and I came to a statement that states that
there is no particular order of evaluation of function arguments
but then what about the standard calling conventions like _cdecl and _stdcall, whose definition said (in a book) that arguments are evaluated from right to left.
Now I am confused with these two definitions one, in accordance of UB, states different than the other which is in accordance of the definition of calling convention. Please justify the two.
As Graznarak's answer correctly points out, the order in which arguments are evaluated is distinct from the order in which arguments are passed.
An ABI typically applies only to the order in which arguments are passed, for example which registers are used and/or the order in which argument values are pushed onto the stack.
What the C standard says is that the order of evaluation is unspecified. For example (remembering that printf returns an int result):
some_func(printf("first\n"), printf("second\n"));
the C standard says that the two messages will be printed in some order (evaluation is not interleaved), but explicitly does not say which order is chosen. It can even vary from one call to the next, without violating the C standard. It could even evaluate the first argument, then evaluate the second argument, then push the second argument's result onto the stack, then push the first argument's result onto the stack.
An ABI might specify which registers are used to pass the two arguments, or exactly where on the stack the values are pushed, which is entirely consistent with the requirements of the C standard.
But even if an ABI actually requires the evaluation to occur in a specified order (so that, for example, printing "second\n" followed by "first\n" would violate the ABI) that would still be consistent with the C standard.
What the C standard says is that the C standard itself does not define the order of evaluation. Some secondary standard is still free to do so.
Incidentally, this does not by itself involve undefined behavior. There are cases where the unspecified order of evaluation can lead to undefined behavior, for example:
printf("%d %d\n", i++, i++); /* undefined behavior! */
Argument evaluation and argument passing are related but different problems.
Arguments tend to be passed left to right, often with some arguments passed in registers rather than on the stack. This is what is specified by the ABI and _cdecl and _stdcall.
The order of evaluation of arguments before placing them in the locations that the function call requires is unspecified. It can evaluate them left to right, right to left, or some other order. This is compiler dependent and may even vary depending on optimization level.
_cdecl and _stdcall merely specify that the arguments are pushed onto the stack in right-to-left order, not that they are evaluated in that order. Think about what would happen if calling conventions like _cdecl, _stdcall, and pascal changed the order that the arguments were evaluated.
If evaluation order were modified by calling convention, you would have to know the calling convention of the function you're calling in order to understand how your own code would behave. That's a leaky abstraction if I've ever seen one. Somewhere, buried in a header file someone else wrote, would be a cryptic key to understanding just that one line of code; but you've got a few hundred thousand lines, and the behavior changes for each one? That would be insanity.
I feel like much of the undefined behavior in C89 arose from the fact that the standard was written after multiple conflicting implementations existed. They were maybe more concerned with agreeing on a sane baseline that most implementers could accept than they were with defining all behavior. I like to think that all undefined behavior in C is just a place where a group of smart and passionate people agreed to disagree, but I wasn't there.
I'm tempted now to fork a C compiler and make it evaluate function arguments as if they're a binary tree that I'm running a breadth-first traversal of. You can never have too much fun with undefined behavior!
Check the book you mentioned for any references to "Sequence points", because I think that's what you're trying to get at.
Basically, a sequence point is a point that, once you've arrived there, you are certain that all preceding expressions have been fully evaluated, and its side-effects are sure to be no more.
For example, the end of an initializer is a sequence point. This means that after:
bool foo = !(i++ > j);
You are sure that i will be equal to i's initial value +1, and that foo has been assigned true or false. Another example:
int bar = i++ > j ? i : j;
Is perfectly predictable. It reads as follows: if the current value of i is greater than j, and add one to i after this comparison (the question mark is a sequence point, so after the comparison, i is incremented), then assign i (NEW VALUE) to bar, else assign j. This is down to the fact that the question mark in the ternary operator is also a valid sequence point.
All sequence points listed in the C99 standard (Annex C) are:
The following are the sequence points described in 5.1.2.3:
— The call to a function, after the arguments have been evaluated (6.5.2.2).
— The end of the first operand of the following operators: logical AND && (6.5.13);
logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
— The end of a full declarator: declarators (6.7.5);
— The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).
— Immediately before a library function returns (7.1.4).
— After the actions associated with each formatted input/output function conversion
specifier (7.19.6, 7.24.2).
— Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function and any movement of the objects
passed as arguments to that call (7.20.5).
What this means, in essence is that any expression that is not a followed by a sequence point can invoke undefined behaviour, like, for example:
printf("%d, %d and %d\n", i++, i++, i--);
In this statement, the sequence point that applies is "The call to a function, after the arguments have been evaluated". After the arguments are evaluated. If we then look at the semantics, in the same standard under 6.5.2.2, point ten, we see:
10 The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
That means for i = 1, the values that are passed to printf could be:
1, 2, 3//left to right
But equally valid would be:
1, 0, 1//evaluated i-- first
//or
1, 2, 1//evaluated i-- second
What you can be sure of is that the new value of i after this call will be 2.
But all of the values listed above are, theoretically, equally valid, and 100% standard compliant.
But the appendix on undefined behaviour explicitly lists this as being code that invokes undefined behaviour, too:
Between two sequence points, an object is modified more than once, or is modified
and the prior value is read other than to determine the value to be stored (6.5).
In theory, your program could crash, instead of printinf 1, 2, and 3, the output "666, 666 and 666" would be possible, too
so finally i found it...yeah.
it is because the arguments are passed after they are evaluated.So passing arguments is a completely different story from the evaluation.Compiler of c as it is traditionally build to maximize the speed and optimization can evaluate the expression in any way.
so the both argument passing and evaluation are different stories altogether.
since the C standard does not specify any order for evaluating parameters, every compiler implementation is free to adopt one. That's one reason why coding something like foo(i++) is complete insanity- you may get different results when switching compilers.
One other important thing which has not been highlighted here - if your favorite ARM compiler evaluates parameters left to right, it will do so for all cases and for all subsequent versions. Reading order of parameters for a compiler is merely a convention...