In the following function:
int fun(int *k) {
*k += 4;
return 3 * (*k) - 1;
}
void main() {
int i = 10, j = 10, sum1, sum2;
sum1 = (i / 2) + fun(&i);
sum2 = fun(&j) + (j / 2);
}
You'd get sum 1 to equal 46, and sum 2 to equal 48. How would the function run if there were no precedence rules?
How drastically difference would things run without consistent precedence rules?
The precedence rules tell us how an expression is structured, not how it is evaluated. In sum1 = (i / 2) + fun(&i);, the rules tell us things including:
i / 2 is grouped together because it is in parentheses; it cannot form, for example, (sum1 = i) / 2 + fun(&i);.
(i / 2) and fun(&i) are grouped together because + has higher precedence than =, making sum1 = ((i / 2) + fun(&i); rather than (sum1 = (i / 2)) + fun(&i);.
The precedence rules do not tell us whether i / 2 or fun(&i) is evaluated first. In fact, no rules in the C standard specify whether i / 2 or fun(&i) is evaluated first. The compiler may choose.
If i / 2 is evaluated first, the result will be 10 / 2 + 41 and then 5 + 41 and finally 46. if fun(&i) is evaluated first, the result will be 14 / 2 + 41 and then 7 + 41 and finally 48. Your compiler chose the former. It could have chosen the latter.
How would the function run if there were no precedence rules?
If there were no rules, we would not know how the function would be executed. The rules are what tell us how it will be executed.
Some comments assert that the behavior of this program is undefined. That is incorrect. This misunderstanding comes from C 2018 6.5 2, which says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…
In your code, i / 2 uses i and fun(&i) contains a “side effect” on i (changing its value via an assignment). If these were unsequenced, the behavior would be undefined. However, there is a sequence point after evaluating the argument to fun and before calling it, and there are sequence points after each full expression in fun, including its return statement. Thus, there is some sequencing of the uses of i and the side effects on it. This sequencing is incompletely determined by the rules of the C standard, but it is, as defined by the standard, indeterminately sequenced, not unsequenced.
Related
I know that:
int b = 1, c = 2, d = 3, e = 4;
printf("%d %d %d", ++b, b, b++);
results in undefined behavior. Since
Modifying any object more than once between two sequence points is UB.
Undefined behavior and sequence points
But I don't know if:
int b = 1, c = 2, d = 3, e = 4;
printf("%d", b++ + ++c - --d - e--);
is also UB?
What I think is that increment/decrement operators will evalute first because of the precedence, between them right to left since the associativity . Then arithmetic operators will be evaluated left to right.
Which will just be
(b) + (c + 1) - (d - 1) - (e)
that is, 1 + (2 + 1) - (3 - 1) - (4)
= (2 - 4)
= -2
Is it right?
But I don't know if: ... is also UB?
It is not, but your reasoning about why is fuzzy.
What I think is that increment/decrement operators will evaluate first because of the precedence, between them right to left since the associativity . Then arithmetic operators will be evaluated left to right.
Precedence determines how the result is calculated. It doesn't say anything about the ordering of the side-effects.
There is no equivalent of precedence telling you when the side effects (the stored value of b has been incremented, the stored value of e has been decremented) are observable during the statement. All you know is that the variables have taken their new values before the next statement (ie, by the ;).
So, the reason this is well-defined is that it does not depend on those side-effects.
I deliberately hand-waved the language to avoid getting bogged down, but I should probably clarify:
"during the statement" really means "before the next sequence point"
"before the next statement (... ;)" really means "at the next sequence point"
See Order of evaluation:
There is a sequence point after the evaluation of all function arguments and of the function designator, and before the actual function call.
So really the side-effects are committed before the call to printf, so earlier than the ; at the end of the statement.
There is a gigantic difference between the expressions
b++ + ++c - --d - e--
(which is fine), and
x++ + ++x - --x - x--
(which is rampantly undefined).
It's not using ++ or -- that makes an expression undefined. It's not even using ++ or -- twice in the same expression. No, the problem is when you use ++ or -- to modify a variable inside an expression, and you also try to use the value of that same variable elsewhere in the same expression, and without an intervening sequence point.
Consider the simpler expression
++z + z;
Now, obviously the subexpression ++z will increment z. So the question is, does the + z part use the old or the new value of z? And the answer is that there is no answer, which is why this expression is undefined.
Remember, expressions like ++z do not just mean, "take z's value and add 1". They mean, "take z's value and add 1, and store the result back into z". These expressions have side effects. And the side effects are at the root of the undefinedness issue.
A friend asked me to explain the difference between operator precedence and order of evaluation in simple terms. This is how I explained it to them :-
Let's take an example -
int x;
int a = 2;
int b = 5;
int c = 6;
int d = 4;
x = a * b / (c + d);
Here, the final value of x will become 1. This is because first, the values of c and d will be added together (6+4), then the values of a and b will be multiplied together (2*5), and finally, the division will take place (10/10), resulting in the final value becoming 1, which is then assigned to x.
All of this is specified by operator precedence.
In this example, the parentheses force the addition to take place before the multiplication and the division, even though addition has a lower precedence.
Also, the multiplication is executed before the division, because multiplication and division have the same precedence, and both of them have the associativity of left-to-right.
Now comes the important part, i.e. the order of evaluation of this expression.
On one system, the order of evaluation may be like this -
/* Step 1 */ x = a * b / (c + d);
/* Step 2 */ x = a * 5 / (c + d);
/* Step 3 */ x = a * 5 / (c + 4);
/* Step 4 */ x = a * 5 / (6 + 4);
/* Step 5 */ x = a * 5 / 10;
/* Step 6 */ x = 2 * 5 / 10;
/* Step 7 */ x = 10 / 10;
/* Step 8 */ x = 1;
Note that in any step, it is always ensured that the operator precedence is maintained, i.e. even though b was replaced by 5 in Step 2, the multiplication did not take place until Step 7. So, even though the order of evaluation is different for different systems, the operator precedence is always maintained.
On another system, the order of evaluation may be like this -
/* Step 1 */ x = a * b / (c + d);
/* Step 2 */ x = a * b / (6 + d);
/* Step 3 */ x = a * b / (6 + 4);
/* Step 4 */ x = a * b / 10;
/* Step 5 */ x = 2 * b / 10;
/* Step 6 */ x = 2 * 5 / 10;
/* Step 7 */ x = 10 / 10;
/* Step 8 */ x = 1;
Again, the operator precedence is maintained.
In the above example, the entire behaviour is well-defined. One reason for this is that all of the variables are different.
In technical terms, the behaviour in this example is well-defined because there are no unsequenced modifications to any variable.
So, on any system, x will always get assigned the value 1 finally.
Now, let's change the above example to this :-
int x;
int y = 1;
x = ++y * y-- / (y + y++);
Here, the final value that gets assigned to x varies between systems, making the behaviour undefined.
On one system, the order of evaluation may be like this -
/* Step 1 */ x = ++y * y-- / (y + y++); // (y has value 1)
/* Step 2 */ x = ++y * y-- / (1 + y++); // (y still has value 1)
/* Step 3 */ x = ++y * 1 / (1 + y++); // (y now has value 0)
/* Step 4 */ x = 1 * 1 / (1 + y++); // (y now has value 1)
/* Step 5 */ x = 1 * 1 / (1 + 1); // (y now has value 2)
/* Step 6 */ x = 1 * 1 / 2;
/* Step 7 */ x = 1 / 2;
/* Step 8 */ x = 0;
Again, the operator precedence is maintained.
On another system, the order of evaluation may be like this -
/* Step 1 */ x = ++y * y-- / (y + y++); // (y has value 1)
/* Step 2 */ x = ++y * y-- / (y + 1); // (y now has value 2)
/* Step 3 */ x = ++y * 2 / (y + 1); // (y now has value 1)
/* Step 4 */ x = ++y * 2 / (1 + 1); // (y still has value 1)
/* Step 5 */ x = ++y * 2 / 2; // (y still has value 1)
/* Step 6 */ x = 2 * 2 / 2: // (y now has value 2)
/* Step 7 */ x = 4 / 2;
/* Step 8 */ x = 2;
Again, the operator precedence is maintained.
How can I improve this explanation?
I would prefer an explanation that uses function calls. A function call makes it very obvious that "something needs to be evaluated before applying the operator".
Basic example:
int x = a() + b() * c();
must be calculated as
temp = result_of_b_func_call * result_of_c_func_call
x = result_of_a_func_call + temp
due to multiplication having higher precedence than addition.
However, the evaluation order of the 3 function calls is unspecified, i.e. the functions can be called in any order. Like
a(), b(), c()
or
a(), c(), b()
or
b(), a(), c()
or
b(), c(), a()
or
c(), a(), b()
or
c(), b(), a()
Another basic example would be to explain operator associativity - like:
int x = a() + b() + c();
must be calculated as
temp = result_of_a_func_call + result_of_b_func_call
x = temp + result_of_c_func_call
due to left-to-right associativity of addition. But again the order of the 3 function calls are unknown.
If function calls is not an option, I would prefer something like
x = a * b + c / d
Here it's pretty obvious that there are two sub-expressions, i.e. a * b and c / d. Due to operator precedence both of these sub-expressions must be evaluated before the addition but the order of evaluation is unspecified, i.e. we can't tell whether the multiplication or the division is done first.
So it can be
temp1 = a * b
temp2 = c / d
x = temp1 + temp2
or it can be
temp2 = c / d
temp1 = a * b
x = temp1 + temp2
All we know is that the addition must be last.
6.5 Expressions
...
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
85) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ? : (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
86) In an expression that is evaluated more than once during the execution of a program, unsequenced and
indeterminately sequenced evaluations of its subexpressions need not be performed consistently in
different evaluations.
C 2011 Online Draft
Precedence and associativity only control how expressions are parsed and which operators are grouped with which operands. They do not control the order in which subexpressions are evaluated.
Given your example
x = a * b / (c + d);
precedence and associativity cause the expression to be parsed as
(x) = ((a * b) / (c + d))
The multiplicative operators * and / have the same precedence and are left-associative, so a * b / (c + d) is parsed as (a * b) / (c + d) (as opposed to a * (b / (c + d))).
So what this tells us is that the result of a * b is divided by the result of c + d, but this does not mean that a * b must be evaluated before c + d or vice versa.
Each of a, b, c, and d may be evaluated in any order (including simultaneously if the architecture supports it). Similarly each of a * b and c + d may be evaluated in any order, and if the same expression is evaluated multiple times in the program, that order doesn't have to be consistent. Obviously both a and b have to be evaluated before a * b can be evaluated, and both c and d have to be evaluated before c + d can be evaluated, but that's the only ordering you can be certain about.
There are operators that force left-to-right evaluation - ||, &&, ?:, and the comma operator, but in general order of evaluation is a free-for-all.
It's not necessarily true to say that the "the parentheses force the addition to take place before the multiplication and the division". You can see this in a disassembly of the code (gcc 10.2.0):
x = a * b / (c + d);
1004010b6: 8b 45 fc mov -0x4(%rbp),%eax
1004010b9: 0f af 45 f8 imul -0x8(%rbp),%eax
1004010bd: 8b 4d f4 mov -0xc(%rbp),%ecx
1004010c0: 8b 55 f0 mov -0x10(%rbp),%edx
1004010c3: 01 d1 add %edx,%ecx
1004010c5: 99 cltd
1004010c6: f7 f9 idiv %ecx
The multiplication was performed first, followed by the addition, then the division.
Nope, you say
Here, the final value of x will become 1. This is because first, the values of c and d will be added together (6+4), then the values of a and b will be multiplied together (2*5), and finally, the division will take place (10/10), resulting in the final value becoming 1, which is then assigned to x.
the evaluation order establishes that 6 + 4 will be evaluated before the division is done... but not that the compiler cannot first arrange to evaluate first c * d (because the multiplication operators are left associative, and this means --also-- that the multiplication will be made before the division). You don't even know (except if you look at the assembler output) which order of subexpression evaluation will the compiler select. As stated, the full parenthesized expression would be:
(x = ((a * b) / (c + d)));
so, the compiler will decide to start first with a * b or c + d indistinctly. Then it will do the other operation, then it will do the division, and finally the assignment. But beware, because the assignment requires the address of x and not its value (it's an lvalue), so the address of x can be calculated at any point, but before the assignment is made. Finally, the (unused) value of the assignment is thrown.
a possible order could be:
calculate a * b
calculate address of x
calculate c + d
calculate the division (a*b)/(c+d)
store the result at position &x.
a different one:
calculate c + d
calculate a * b
calculate the division (a*b)/(c+d)
calculate address of x
store the result at position &x.
but you could also calculate the address of x in the first step.
In the code mentioned below sum1 variable is getting 46 as the answer when operators precedence left to right.But in sum2 answer is getting as 48 and it's precedence is right to left. Why those answers are getting different.
#include <stdio.h>
int func(int *k){
*k+=4;
return 3 * (*k)-1;
}
void main() {
int i = 10, j = 10, sum1, sum2;
sum1 = (i / 2) + func(&i);
sum2 = func(&j)+(j/2);
printf("%d\n",sum1);
printf("%d",sum2);
}
In the expression (i / 2) + func(&i), the compiler (or the C implementation generally) is free to evaluate either i / 2 first or func(&i) first. Similarly, in func(&j) + (j/2), the compiler is free to evaluate func(&j) or j/2 first.
Precedence is irrelevant. Precedence tells us how an expression is structured, but it does not fully determine the order in which it is evaluated. Precedence tells us that, in a * b + c * d, the structure must be (a * b) + (c * d). In a + b + c, precedence, in the form of left-to-right association for +, tells us the structure must be (a + b) + c. It does not tell that a must be evaluated before c. For example, in a() + b() + c(), the structure is (a() + b()) + c(), but the compiler may call the functions in any order, holding their results in temporary registers if needed, and then add the results.
In func(&j)+(j/2), there is no right-to-left precedence or association. No rule in the C standard says j/2 must be evaluated before func(&j).
A compiler might tend to evaluate subexpressions from left to right, in the absence of other constraints. However, various factors may alter that. For example, if one subexpression appears multiple times, the compiler might evaluate it early and retain its value for reuse. Essentially, the compiler builds a tree structure describing the expressions it needs to evaluate and then seeks optimal ways to evaluate them. It does not necessarily proceed left to right, and you cannot rely on any particular evaluation order.
Questions about Sequencing
The C standard has a rule, in C 2018 6.5 2, that says if a modification to an object, as occurs to i in the statement *k+=4;, is unsequenced relative to a value computation using the same object, as occurs for i in i / 2, then the behavior is undefined. However, this problem does not occur in this code because the modification and the value computation are indeterminately sequenced, not unsequenced: 6.5.2.2 10 says “… Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to execution of the called function.” C 5.1.2.3 3 says “… Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which…”
The program has undefined behavior because the order of evaluation of operands in an additive operator is unspecified and such evaluations of operands are unsequenced.
Pay attention to that within the assignment expression the variables i and j are being changed and these changes are not sequenced. Either i / 2 or j / 2 can be evaluated before the function call or vice versa.
The & preceding the variable name (&i or &j) is sending a pointer meaning that any changes to the variable are saved. In this case, the func call takes a variable and adds 4 to it then returns some value based on that variable. The function still changed the variable's value and since the equation is being processed left to right, each subsequent use of the variable has that change reflected.
sum1 = (i/2) +func(&i) (where i = 10)
--> sum1 = 5 + func(&i)
--> sum1 = 5 + 41 (and now i = 14)
sum2 = func(&j) + (j/2) (where j = 10)
--> sum2 = 41 + (j/2) (and now j = 14)
--> sum2 = 41 + 7
I understand that this topic has come up umpteen times but I request a moment.
I have tried understanding this many times, also in context of order of evaluation. I was looking for some explicit examples to understand op. precedence and I found one here: http://docs.roxen.com/pike/7.0/tutorial/expressions/operator_tables.xml What I would like to know is if the examples given there (I have cut-pasted them below) are correct.
1+2*2 => 1+(2*2)
1+2*2*4 => 1+((2*2)*4)
(1+2)*2*4 => ((1+2)*2)*4
1+4,c=2|3+5 => (1+4),(c=(2|(3+5)))
1 + 5&4 == 3 => (1 + 5) & (4 == 3)
c=1,99 => (c=1),99
!a++ + ~f() => (!(a++)) + (~(f()))
s == "klas" || i < 9 => (s == "klas") || (i < 9)
r = s == "sten" => r = (s == "sten")
For instance, does 1+2*2*4 is really 1+((2*2)*4) or could as well have been, 1+(2*(2*4)) according to C specification. Any help or further reference to examples would be useful. Thanks again.
Although those examples come from a different language, I think they are the same as operator precedence in C. In general, you'd be better off using a reference for the C language, such as the C standard, or a summary such as the one in Wikipedia.
However, I don't believe that is actually what you are asking. Operator precedence has no implications for order of evaluation. All operator precedence does is show you how to parenthesize the expression. A C compiler is allowed to evaluate the operations in just about any order it wishes to. It is also allowed to use algebraic identities if it is provable that they will have the same result for all valid inputs (this is not usually the case for floating point calculations, but it is usually true for unsigned integer calculations).
The only cases where the compiler is required to produce code with a specific evaluation order are:
Short-circuit boolean operators && and ||: the left argument must be evaluated first, and in some cases the right argument may not be evaluated;
The so-called ternary operator ?:: the left argument (before the ?) must be evaluated first; subsequently, exactly one of the other two operators will be evaluated. Note that this operator groups to the right, demonstrating that there is no relationship between grouping and evaluation order. That is, pred_1 ? action_1() : pred_2 ? action_2() : pred_3 ? action_3() is the same as pred_1 ? action_1() : (pred_2 ? action_2() : pred_3 ? action_3()), but it's pred_1 which must be evaluated first.
The comma operator ,: the left argument must be evaluated first. This is not the same as the use of the comma in function calls.
Function arguments must be evaluated before the function is called, although the order of evaluation of the arguments is not specified, and neither is the order of evaluation of the expression which produces the function.
The last phrase refers to examples such as this:
// This code has Undefined Behaviour. DO NOT USE
typedef void(*takes_int_returns_void)(int);
takes_int_returns_void fvector[3] = {...}
//...
//...
(*fvector[i++])(i);
Here, a compiler may choose to increment i before or after it evaluates the argument to the function (or other less pleasant possibilities), so you don't actually know what value the function will be called with.
In the case of 1+2*2*4, the compiler must generate code which will produce 17. How it does that is completely up to the compiler. Furthermore, if all x, y and z are all unsigned integers, a compiler may compile 1 + x*y*z with any order of multiplications it wants to, even reordering to y*(x*z).
Most operators have precedence from left to right.This will give a detailed idea about operator precedence :
Click here!
Binary operators, other than assignment operators, go from left to right when they are of equal precedence, so 1 + 2 * 2 * 4 is equivalent to 1 + ((2 * 2) * 4). Obviously in this particular case 1 + (2 * (2 * 4)) gives the same answer, but it won't always. For instance, 1 + 2 / 2.0 * 4 evaluates to 1 + ((2 / 2.0) * 4) == 5.0 and not to 1 + (2 / (2.0 * 4)) == 1.25.
Order of evaluation is a completely different thing from operator precedence. For one thing, operator precedence is always well-defined, order of evaluation sometimes is not (e.g. the order in which function arguments are evaluated).
This is a perfect tutorial about operator precedence and order of evaluation. Enjoy!
My professor and I are engaging in a bit of a debate about the += operator in C. He says that += or =+ will work, but he is not certain why =+ works.
int main()
{
int i = 0, myArray[5] = {1,1,1,1,1};
while(i < 5)
{
myArray[i] += 3 + i;
printf("%d\n", myArray[i]);
i++;
}
system("pause");
}
The output will yield 4, 5, 6, 7, 8. Changing the += operator to =+ yields the same results. However -= does not do the same as =- (which is obvious as it treats the 3 as a 3).
So C gurus:
Why does this work with =+?
How does a C compiler treat =+ versus +=?
He is wrong; += is completely different from =+.
The expression x =+ 3 is parsed as x = (+3).
Here, + becomes the (rather useless) unary + operator. (the opposite of negation)
The expression x =- 3 is parsed as x = (-3), using the unary negation operator.
Your professor is remembering ancient versions of C in which =+, =-, =* etc did in fact mean the same thing as +=, -=, *= etc. (We're talking older than the version generally referred to as "K&R" here. Version 6 UNIX, if memory serves.)
In current versions of C, they do not mean the same thing; the versions with the equals sign first will be parsed as if there was a space in between the equals and whatever comes after. This happens to produce a valid program (albeit not a program that does what you expect) for =- and =+ because - and + can be used as unary operators.
=* or =/ could be used to settle the argument. a *= 3 will multiply a by three, and a /= 3 will divide it by three, but a =* 3 is a semantic error (because unary * can only be applied to pointers) and a =/ 3 is a syntax error (because / can not be used as an unary operator).
Code
myArray[i] += 3 + i;
will yield myArray[i] = myArray[i] + 3 + i;
whereas
myArray[i] =+ 3 + i;
yields myArray[i] = 3 + i
that's what I got.
+ is also a unary operator as is -.