I am learning C so I tried the below code and am getting an output of 7,6 instead of 6,7. Why?
#include <stdio.h>
int f1(int);
void main()
{
int b = 5;
printf("%d,%d", f1(b), f1(b));
}
int f1(int b)
{
static int n = 5;
n++;
return n;
}
The order of the evaluation of the function arguments is unspecified in C. (Note there's no undefined behaviour here; the arguments are not allowed to be evaluated concurrently for example.)
Typically the evaluation of the arguments is either from right to left, or from left to right.
As a rule of thumb don't call the same function twice in a function parameter list if that function has side-effects (as it does in your case), or if you pass the same parameter twice which allows something in the calling site to be modified (e.g. passing a pointer).
https://en.cppreference.com/w/c/language/eval_order
Before C11, you must follow Rule (2)
There is a sequence point after evaluation of the first (left) operand and
before evaluation of the second (right) operand of the following binary
operators: && (logical AND), || (logical OR), and , (comma).
Because arguments are considered separated by comma operator before C11. This is not optimal because arguments are pushed right to left on some platform. Thus, C11 adds Rule (12) making it unspecified.
A function call that is not sequenced before or sequenced after another
function call is indeterminately sequenced (CPU instructions that
constitute different function calls cannot be interleaved, even if the
functions are inlined)
Even C99 designated initializers still go back to Rule (2), where earlier (left) initializers are resolved before later (right) initializers relative to the comma operator. That is, until C11 adds Rule (13) making it unspecified.
In initialization list expressions, all evaluations are indeterminately
sequenced
In other words, before Rule (12) and Rule (13), the comma operator from Rule (2) is the specified behavior. Rule (2) leads to inefficient code that cannot be optimized on some platform. There is not enough registers if the number of structure member or function parameter exceed some threshold. That is, "Register Pressure" becomes an issue.
Historically, aggregate type initializers and function arguments falls back to the comma operator. In C11, they specifically add the definition that commas in those aggregate type initializers and function arguments are not "comma operators" so that Rule (12) and Rule (13) makes sense, and that Rule (2) is not applied.
Related
It's clear from the C standard that general function calls are expressions from the definition:
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
(6.5.1)
Since the () are operators and it returns a value, regular function calls are obviously expressions.
But those which don't return a value don't seem to fit with this definition. The function name itself does (as it designates a function), but this isn't a function call.
The standard does clearly say that a function call is an expression, and that it can return void, but this seems to conflict with the definition of an expression. What am I missing?
Calling a function is an expression regardless of the function's return type. C's grammar is orthogonal to its type system. They are independent pieces of the language. Grammatically func(); is an expression statement.
expression_statement
: ';'
| expression ';'
;
postfix_expression
: primary_expression
| postfix_expression '[' expression ']'
| postfix_expression '(' ')'
| postfix_expression '(' argument_expression_list ')'
There are very few things you can do with a void result. You can't assign it to a variable since void variables aren't allowed. If func()'s result is void you can use four operators:
Parentheses: (func())
Comma sequencing: func(), 42
Ternary operator: 42 ? func() : func().
Cast to void: (void) func()
You can also return a void result:
return func();
Finally, in a for(init; condition; increment) loop the three pieces are all expressions. init and increment (but not condition) can be void.
for (func(); 42; func()) { }
Few of these are useful and none are good style, but they're all legal.
Paragraph 1 of clause 6.5 was not completely thought out with regard to void. The C standard is imperfect and has a number of defects. This paragraph should be received as a general description to orient readers and is not a precise mathematical specification of what an expression is.
It is said that:
An expression is a sequence of operators and operands that
specifies computation of a value, or
that designates an object or a function or
that generates side effects
or that performs a combination thereof.
The specifies computation of a value is but one among possibilities. The void function call would be the one "that generates side effects".
Any expression in the expression statement in C is considered a void expression. C11 6.8.3 Expression and null statements p2:
The expression in an expression statement is evaluated as a void expression for its side effects.153)
153) Such as assignments, and function calls which have side effects.
i.e. in the expression statement
a = 5;
a = 5 is a void expression that is evaluated for its side effects only, i.e. the assignment of value 5 into a, not for computation of a value, even though a = 5 could be used for a computation of a value in other contexts. Likewise you can write a; and it is a legal use of an expression "evaluated for its side effects", even though it has none. It does not cease to be an expression there.
The LHS of a comma operator is a void expression. A void expression can be used in ? : - then both branches will be void expressions and the entire expression in itself will be a void expression.
An expression in C can be void.
Such an expression has not a value and then it cannot be assigned to an object.
Moreover, any expression can be cast to void.
I have just written a sample code to try it out. Surprisingly, I did not get any compilation failure. As per C,we should have declaration followed by initialization or use. Kindly explain.
#include <stdio.h>
int main(void) {
int a = a = 1; //Why it compiles??
printf("%d",a);
return 0;
}
Above code is compiled successfully and outputs 1. Please explain and also provide any input from standard which allows this.
Each assignment expression like a = 1 has - besides the "side effect" of assigning the value 1 to a - a result value, which is the value of a after the assignment (cf, for example, cppreference/assignment):
Assignment also returns the same value as what was stored in lhs (so
that expressions such as a = b = c are possible).
Hence, if you write, for example, int a; printf("%d",(a=1)), the output will be 1.
If you know chain assignments like in int a; a = a = 1, then this is equivalent to int a; a = (a=1), and - as the result of (a=1) is 1, the result of a = (a=1) is 1, too.
The definition
int a = a = 1;
is equal to
int a = (a = 1);
and is also roughly equivalent to
int a;
a = (a = 1);
When you use a in the initialization, it has already been defined, it exists and can be assigned to. And more importantly, since it's defined then it can be used as a source for its own initialization.
The C standard does not define the behavior in this case, not because of the rule about unsequenced effects or explicit statement but rather because it fails to address the situation.
C 2011 (unofficial draft N1570) clause 6.7, paragraph 1, shows us the overall grammar of declarations. In this grammar, int a = a = 1; is a
declaration in which:
int is a declaration-specifiers which consists solely of the type-specifier int.
a = a = 1 is an init-declarator, in which a is a declarator and a = 1 is an initializer. The declarator consists solely of the identifier a.
6.7.6 3 defines a full declarator to be a declarator that is not part of another declarator, and it says the end of a full declarator is a sequence point. However, these are not necessary for our analysis.
6.7.9 8 says “An initializer specifies the initial value stored in an object.”
6.7.9 11 says “The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.”
So, on one hand, the initializer, which has the value 1, specifies the initial value stored in a. On the other hand, the expression a = 1 has the side effect of storing 1 in a. I do not see anything in the C standard that says which occurs first. The rules about sequencing within expressions apply only to the evaluation of the initializer; they do not tell us the order of giving the “initial value” to a and the side effect of assigning to it.
It is reasonable to conclude that, whether a is given the initial value 1 or is assigned the value 1, it ends up with the value 1, so the behavior is defined. However, the standard famously makes it undefined behavior to modify the value of an object twice in an unsequenced way, even if the value being written is the same. The explicit statement of that rule is in 6.5 2, which applies to expressions, and hence does not apply in this situation where we have a conflict between an initialization and an expression. However, we might interpret the spirit of the standard to be:
In order to afford an implementation opportunity to do whatever it needs to do to store (or modify) a new value in an object, a sequencing for the store relative to other stores (or modifications) must be defined.
The standard fails to define a sequence for the initialization and the assignment side effect, and therefore it fails to afford the implementation this needed constraint.
Thus, the standard fails to specify the behavior in a way that guarantees an implementation will produce defined behavior.
Additionally, we can consider int a = 2 + (a = 1). In this case, the value of the initializer is 3, but the side effect assigns 1 to a. For this declaration, the standard does not say which value prevails (except that one might interpret “initial value” literally, thus implying that 3 must be assigned first, so the side effect must be later).
It has always been my understanding that the lack of a sequence point after the reading of the right expression in an assignment makes an example like the following produce undefined behavior:
void f(void)
{
int *p;
/*...*/
p = (int [2]){*p};
/*...*/
}
// p is assigned the address of the first element of an array of two ints, the
// first having the value previously pointed to by p and the second, zero. The
// expressions in this compound literal need not be constant. The unnamed object
// has automatic storage duration.
However, this is EXAMPLE 2 under "6.5.2.5 Compound literals" in the committee draft for the C11 standard, the version identified as n1570, which I understand to be the final draft (I don't have access to the final version).
So, my question: Is there something in the standard that gives this defined and specified behavior?
EDIT
I would like to expound on exactly what I see as the problem, in response to some of the discussion that has come up.
We have two conditions under which an assignment is explicitly stated to have
undefined behavior, as per 6.5p2 of the standard quoted in the answer given by dbush:
1) A side effect on a scalar object is unsequenced relative to a different side
effect on the same scalar object.
2) A side effect on a scalar object is unsequenced relative to a value
computation using the value of the same scalar object.
An example of item 1 is "i = ++i + 1". In this case the side effect of
writing the value i+1 into i due to ++i is unsequenced relative to the side effect of assigning the RHS to the LHS. There is a sequence point between the value calculations of each side and the assignment of RHS to LHS, as described in 6.5.16.1 given in the answer by Jens Gustedt below. However, the modification of i due to ++i is not subject to that sequence point, otherwise the behavior would
be defined.
In the example I give above, we have a similar situation. There is a value computation, which involves the creation of an array and the conversion of that array to a pointer to its first element. There is also a side effect of writing a value to part of that array, *p to the first element.
So, I don't see what gaurantees we have in the standard that the modification
of the otherwise uninitialized first element of the array will be sequenced
before the writing of the array address to p. What about this modification (writing *p to the first element) is different from the modification of writing
i+1 to i?
To put it another way, suppose an implementation looked at the statement of interest in the example as three tasks: 1st, allocate space for the compound literal object; 2nd: assign a pointer to said space to p; 3rd: write *p to the first element in the newly allocated space. The value computation for both RHS and LHS would be sequenced before the assignment, as computing the value of the RHS only requires the address. In what way is this hypothetical implementation not standard compliant?
You need to look at the definition of the assignment operator in 6.5.16.1
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
The evaluations of the operands are unsequenced.
So here you clearly see that first it evaluates the expressions on both sides in any order or even concurrently, and then stores the value of the right into the object designated by the left.
Additionally, you should know that LHS and RHS of an assignment are evaluated differently. Citations are a bit too long, so here is a summary
For the LHS the evaluation leaves "lvalues", that is objects such as
p, untouched. In particular it doesn't look at the contents of the
object.
For the RHS there is "lvalue conversion", that is for any object that is found there (e.g *p) the contents of that object is loaded.
If the RHS contains an lvalue of array type, this array is converted to a pointer to its first element. This is what is happening to your compound literal.
Edit: You added another question
What about this modification (writing *p to the first element) is
different from the modification of writing i+1 to i?
The difference is simply that i in the LHS of the assignment and thus has to be updated. The array from the compound literal is not in the LHS and thus is of no concern for the update.
Section 6.5p2 of the C standard details why this is valid:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings. 84)
And footnote 84 states:
84) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The posted snippet from 6.5.2.5 falls under the latter, as there is no side effect.
In (int [2]){*p}, *p provides an initial value for the compound literal. This is not an assignment, and it is not a side effect. The initial value is part of the object when the object is created. There is no moment when the array exists and it is not initialized.
In p = (int [2]){*p}, we know the side effect of updating p is sequenced after the computation of the right side because C 2011 [N1570] 6.5.16 3 says “The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.”
I was reading this excerpt from the GNU C manual:
You use the comma operator, to separate two (ostensibly related) expressions.
Later in the description:
If you want to use the comma operator in a function argument, you need
to put parentheses around it. That’s because commas in a function
argument list have a different meaning: they separate arguments.
Until now, everything is alright. The weird part is:
foo (x, (y=47, x), z); is a function call with just three
arguments. (The second argument is (y=47, x) .)
The question is: how is the parameter pushed on the stack, how do I access it from within the function?
In your case,
foo (x, (y=47, x), z);
is functionally similar as
foo (x, x, z);
As per the property of comma operator, the LHS operand is evaluated and the result is discarded, then the RHS operand is evaluated and that's the result.
For sake of completion, quoting the C11, chapter §6.5.17
The left operand of a comma operator is evaluated as a void expression; there is a
sequence point between its evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value.
Point to note: the variable y will be updated, as the LHS operand is evaluated as a void expression, but has no effect on this funcion call. In case, the y is a global variable and used in foo() function, it will see an initial value of 47.
That said, to answer
how is the parameter pushed on the stack
is very very implementation (architecture) dependent. C does not specify any order for function argument passing and some architecture may event not use "stack" for function argument passing, at all!!
I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.
One compiler evaluated the Lvalue first, Rvalue second and then the assignment.
Another compiler did the Rvalue first, Lvalue second and then the assignment.
This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:
struct MM {
int m;
}
int helper (struct MM** ppmm ) {
(*ppmm) = (struct MM *) malloc (sizeof (struct MM));
(*ppmm)->m = 1000;
return 100;
}
int main() {
struct MM mm = {500};
struct MM* pmm = &mm
pmm->m = helper(&pmm);
printf(" %d %d " , mm.m , pmm->m);
}
The example above, the line pmm->m = helper(&mm);, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.
My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do.
are there any other similar pitfalls I should be aware of ?
The semantics for evaluation of an = expression include that
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C2011, 6.5.16/3; emphasis added)
The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.
That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
(C2011, 6.5/2)
The called function has the side effect of modifying the value stored in main()'s variable pmm, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm, and these are unsequenced, therefore the behavior is undefined.
Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.
When using the simple assignment operator: =, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.
For example if you have two functions:
*Get() = logf(2.0f);
It is not specified in which order they are called at any time, and yet this behavior is completely defined.
A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ; will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.
Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.
The left operand is evaluated first.
The left operand is evaluated and the pointer pmm will point to the struct mm. Then the function is called, and a sequence point occurs. it modifies the pointer pmm by pointing it to allocated memory, followed by a sequence point because of the operator ;. Then it stores the value 1000 to the member m, followed by another sequence point because of ;. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm, more specifically its member m.
mm->m has the value 100 and ppm->m has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.
The right operand is evaluated first.
The function is called first, the sequence point occurs, it modifies the pointer ppm by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m will point to the new allocated struct, and its member m, is modified by assigning it the value 100.
mm->m will have the value 500 since it was never modified, and pmm->m will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.