I understand that this topic has come up umpteen times but I request a moment.
I have tried understanding this many times, also in context of order of evaluation. I was looking for some explicit examples to understand op. precedence and I found one here: http://docs.roxen.com/pike/7.0/tutorial/expressions/operator_tables.xml What I would like to know is if the examples given there (I have cut-pasted them below) are correct.
1+2*2 => 1+(2*2)
1+2*2*4 => 1+((2*2)*4)
(1+2)*2*4 => ((1+2)*2)*4
1+4,c=2|3+5 => (1+4),(c=(2|(3+5)))
1 + 5&4 == 3 => (1 + 5) & (4 == 3)
c=1,99 => (c=1),99
!a++ + ~f() => (!(a++)) + (~(f()))
s == "klas" || i < 9 => (s == "klas") || (i < 9)
r = s == "sten" => r = (s == "sten")
For instance, does 1+2*2*4 is really 1+((2*2)*4) or could as well have been, 1+(2*(2*4)) according to C specification. Any help or further reference to examples would be useful. Thanks again.
Although those examples come from a different language, I think they are the same as operator precedence in C. In general, you'd be better off using a reference for the C language, such as the C standard, or a summary such as the one in Wikipedia.
However, I don't believe that is actually what you are asking. Operator precedence has no implications for order of evaluation. All operator precedence does is show you how to parenthesize the expression. A C compiler is allowed to evaluate the operations in just about any order it wishes to. It is also allowed to use algebraic identities if it is provable that they will have the same result for all valid inputs (this is not usually the case for floating point calculations, but it is usually true for unsigned integer calculations).
The only cases where the compiler is required to produce code with a specific evaluation order are:
Short-circuit boolean operators && and ||: the left argument must be evaluated first, and in some cases the right argument may not be evaluated;
The so-called ternary operator ?:: the left argument (before the ?) must be evaluated first; subsequently, exactly one of the other two operators will be evaluated. Note that this operator groups to the right, demonstrating that there is no relationship between grouping and evaluation order. That is, pred_1 ? action_1() : pred_2 ? action_2() : pred_3 ? action_3() is the same as pred_1 ? action_1() : (pred_2 ? action_2() : pred_3 ? action_3()), but it's pred_1 which must be evaluated first.
The comma operator ,: the left argument must be evaluated first. This is not the same as the use of the comma in function calls.
Function arguments must be evaluated before the function is called, although the order of evaluation of the arguments is not specified, and neither is the order of evaluation of the expression which produces the function.
The last phrase refers to examples such as this:
// This code has Undefined Behaviour. DO NOT USE
typedef void(*takes_int_returns_void)(int);
takes_int_returns_void fvector[3] = {...}
//...
//...
(*fvector[i++])(i);
Here, a compiler may choose to increment i before or after it evaluates the argument to the function (or other less pleasant possibilities), so you don't actually know what value the function will be called with.
In the case of 1+2*2*4, the compiler must generate code which will produce 17. How it does that is completely up to the compiler. Furthermore, if all x, y and z are all unsigned integers, a compiler may compile 1 + x*y*z with any order of multiplications it wants to, even reordering to y*(x*z).
Most operators have precedence from left to right.This will give a detailed idea about operator precedence :
Click here!
Binary operators, other than assignment operators, go from left to right when they are of equal precedence, so 1 + 2 * 2 * 4 is equivalent to 1 + ((2 * 2) * 4). Obviously in this particular case 1 + (2 * (2 * 4)) gives the same answer, but it won't always. For instance, 1 + 2 / 2.0 * 4 evaluates to 1 + ((2 / 2.0) * 4) == 5.0 and not to 1 + (2 / (2.0 * 4)) == 1.25.
Order of evaluation is a completely different thing from operator precedence. For one thing, operator precedence is always well-defined, order of evaluation sometimes is not (e.g. the order in which function arguments are evaluated).
This is a perfect tutorial about operator precedence and order of evaluation. Enjoy!
Related
I know that:
int b = 1, c = 2, d = 3, e = 4;
printf("%d %d %d", ++b, b, b++);
results in undefined behavior. Since
Modifying any object more than once between two sequence points is UB.
Undefined behavior and sequence points
But I don't know if:
int b = 1, c = 2, d = 3, e = 4;
printf("%d", b++ + ++c - --d - e--);
is also UB?
What I think is that increment/decrement operators will evalute first because of the precedence, between them right to left since the associativity . Then arithmetic operators will be evaluated left to right.
Which will just be
(b) + (c + 1) - (d - 1) - (e)
that is, 1 + (2 + 1) - (3 - 1) - (4)
= (2 - 4)
= -2
Is it right?
But I don't know if: ... is also UB?
It is not, but your reasoning about why is fuzzy.
What I think is that increment/decrement operators will evaluate first because of the precedence, between them right to left since the associativity . Then arithmetic operators will be evaluated left to right.
Precedence determines how the result is calculated. It doesn't say anything about the ordering of the side-effects.
There is no equivalent of precedence telling you when the side effects (the stored value of b has been incremented, the stored value of e has been decremented) are observable during the statement. All you know is that the variables have taken their new values before the next statement (ie, by the ;).
So, the reason this is well-defined is that it does not depend on those side-effects.
I deliberately hand-waved the language to avoid getting bogged down, but I should probably clarify:
"during the statement" really means "before the next sequence point"
"before the next statement (... ;)" really means "at the next sequence point"
See Order of evaluation:
There is a sequence point after the evaluation of all function arguments and of the function designator, and before the actual function call.
So really the side-effects are committed before the call to printf, so earlier than the ; at the end of the statement.
There is a gigantic difference between the expressions
b++ + ++c - --d - e--
(which is fine), and
x++ + ++x - --x - x--
(which is rampantly undefined).
It's not using ++ or -- that makes an expression undefined. It's not even using ++ or -- twice in the same expression. No, the problem is when you use ++ or -- to modify a variable inside an expression, and you also try to use the value of that same variable elsewhere in the same expression, and without an intervening sequence point.
Consider the simpler expression
++z + z;
Now, obviously the subexpression ++z will increment z. So the question is, does the + z part use the old or the new value of z? And the answer is that there is no answer, which is why this expression is undefined.
Remember, expressions like ++z do not just mean, "take z's value and add 1". They mean, "take z's value and add 1, and store the result back into z". These expressions have side effects. And the side effects are at the root of the undefinedness issue.
I'm starting to learn to program in c, and I thought I was already pretty confident with the precedence of operators, until I did this:
a > b ? c = a : (c = b);
Of course at the first time I didn't use parenthesis on the last sentence, but since that ended up causing compiling issues I searched how to solve that problem on this forum and I read that adding parenthesis could do the job. However, I thought that the expressions inside parenthesis get executed before anything else written in the same line, which would mean that the c = b sentence was executed first and then the ternary operator. I did something similar but easier to read in order to get a better idea of what was happening with this operator precedence thing and tried executing this line:
printf("Number is %d", i) + (i = 5);
I know this expression returns returns a value, but since I don't need it and this isn't a line that I will keep for more than 5 seconds, I won't store it in any variable. What gets my attention in this case is that, when I execute the code, I doesn't show up on the screen with the value 5, but instead it uses the previous value, which means that the computer is just reading it from left to right. When I do:
(i = 5) + printf(Numer is %d, i);
it first does the assignment of i and only after that the printf function is executed. My question is: how does the computer execute an expression that uses operators of different orders of precedence? It clearly doesn't run first the operator with the highest precedence, because in the first printf the value stored wasn't the one assigned on the parenthesis, but it also doesn't just read from left to right because in that case there would be no operator precedence. How does it work?
Parenthesis and operator precedence only dictate how operands are grouped. It does not dictate the order of evaluation.
In this expression:
a > b ? c = a : (c = b);
The three parts of the ternary operator are a > b, c = a, and c = b respectively. This operator also has the property that only one of the second and third clause are evaluated, based on the result of the first. Formally speaking, there is a sequence point between the evaluation of the first clause and of either the second or third. So a > b is first evaluated. If it is nonzero, c = a is evaluated, otherwise c = b is evaluated.
In this expression:
printf("Number is %d", i) + (i = 5);
There is nothing that dictates whether printf("Number is %d", i) or i = 5 is evaluated first. Unlike the ternary operator, there is no sequence point between the evaluation of the operands of the + operator. This expression also has a problem: i is both read and written in the same expression without a sequence point. Doing so triggers undefined behavior. This is also true for:
(i = 5) + printf(Numer is %d, i);
On a side note, this:
a > b ? c = a : (c = b);
Can be more clearly written as:
c = a > b ? a : b;
In the follwing code,
int a = 1, b = 2, c = 3, d;
d = a++ && b++ || c++;
printf("%d\n", c);
The output will be 3 and I get that or evaluates first condition, sees it as 1 and then doesn't care about the other condition but in c, unary operators have a higher precedence than logical operators and like in maths
2 * 3 + 3 * 4
we would evaluate the above expression by first evaluating product and then the summation, why doesn't c do the same? First evaluate all the unary operators, and then the logical thing?
Please realize that precedence is not the same concept as order of evaluation. The special behavior of && and || says that the right-hand side is not evaluated at all if it doesn't have to be. Precedence tells you something about how it would be evaluated if it were evaluated.
Stated another way, precedence helps describe how to parse an expression. But it does not directly say how to evaluate it. Precedence tells us that the way to parse the expression you asked about is:
||
/ \
/ \
&& c++
/ \
/ \
a++ b++
But then when we go to evaluate this parse tree, the short-circuiting behavior of && and || tells us that if the left-hand side determines the outcome, we don't go down the right-hand side and evaluate anything at all. In this case, since a++ && b++ is true, the || operator knows that its result is going to be 1, so it doesn't cause the c++ part to be evaluated at all.
That's also why conditional expressions like
if(p != NULL && *p != '\0')
and
if(n == 0 || sum / n == 0)
are safe. The first one will not crash, will not attempt to access *p, in the case where p is NULL. The second one will not divide by 0 if n is 0.
It's very easy to get the wrong impression abut precedence and order of evaluation. When we have an expression like
1 + 2 * 3
we always say things like "the higher precedence of * over + means that the multiplication happens first". But what if we throw in some function calls, like this:
f() + g() * h()
Which of those three functions is going to get called first? It turns out we have no idea. Precedence doesn't tell us that. The compiler could arrange to call f() first, even though its result is needed last. See also this answer.
This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 1 year ago.
Why i can't do the following in Objective-C?
a = (a < 10) ? (a++) : a;
or
a = (a++)
The ++ in a++ is a post-increment. In short, a = a++ does the following:
int tmp = a; // get the value of a
a = a + 1 //post-increment a
a = tmp // do the assignment
As noted by others, in C this is actually undefined behavior and different compilers can order the operations differently.
Try to avoid using ++ and -- operators when you can. The operators introduce side effects which are hard to read. Why not simply write:
if (a < 10) {
a += 1;
}
To extend Sulthan's answer, there are several problem with your expressions, at least the simple assignment (case 2).
A. There is no sense in doing so. Even a++ has a value (is a non-void expression) that can be assigned, it automatically assigns the result to a itself. So the best you can expect is equivalent to
a++;
The assignment cannot improve the assignment at all. But this is not the error message.
B. Sulthans replacement of the statement is a better case. It is even worse: The ++ operator has the value of a (at the beginning of the expression) and the effect to increment a at some point in future: The increment can be delayed up to the next sequence point.
The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
(ISO/IEC 9899:TC3, 6.5.2.4, 2)
But the assignment operator = is not a sequence point (Annex C).
The following are the sequence points described in 5.1.2.3:
[Neither assignment operator nor ) is included in the list]
Therefore the expression can be replaced with what Sulthan said:
int tmp = a; // get the value of a
a = a + 1 //post-increment a
a = tmp // do the assignment
with the result, that a still contains the old value.
Or the expression can be replaced with this code …:
int tmp = a; // get the value of a
a = tmp // assignemnt
a = a + 1 // increment
… with a different result (a is incremented). This is what the error message says: There is no defined sequence (order the operations has to be applied.)
You can insert a sequence point using the comma operator , (what is the primary use of it), …
a++, a=a; // first increment, then assign
… but this shows the whole leak of meaning of what you want to do.
It is the same with your first example. Though ? is a sequence point …:
The following are the sequence points described in 5.1.2.3:
… The end of the first operand of the following operators: […] conditional ? (6.5.15);[…].
… both the increment (a++) and the assignment (a=) are after the ? operator is evaluated and therefore unsequenced ("in random order") again.
To make the comment "Be careful" more concrete: Don't use an incremented object in an expression twice. (Unless there is a clear sequence point).
int a = 1;
… = a++ * a;
… evaluates to what? 2? 1? Undefined, because the increment can take place after reading a "the second time".
BTW: The Q is not related to Objective-C, but to pure C. There is no Objective-C influence to C in that point. I changed the tagging.
In code snippet below
int jo=50;
if( jo =(rand()%100), jo!=50)
{
printf("!50");
}
% has highest precedence so rand()%100 will get executed first
!= has the precedence greater than = so jo != 50 should get execute right ?
, has the least precedence
still when i execute assignment occurs first then != and then , . I get an output !50 why ??
The issue is "sequence points":
http://www.angelikalanger.com/Articles/VSJ/SequencePoints/SequencePoints.html
Problematic vs. Safe Expressions
What is it that renders the
assignment x[i]=i++ + 1; a problematic one whereas the assignment
i=2; is harmless, in the sense that its result is well-defined and
predictable? The crux is that in the expression x[i]=i++ + 1; there
are two accesses to variable i and one of the accesses, namely the
i++, is a modifying access. Since the order of evaluation between
sequence points is not defined we do not know whether i will be
modified before it will be read or whether it will be read before the
modification. Hence the root of the problem is multiple access to a
variable between sequence points if one the accesses is a
modification.
Here is another example. What will happen here if i and j have
values 1 and 2 before the statement is executed?
f(i++, j++, i+j);
Which value will be passed to function f as the third argument?
Again, we don't know. It could be any of the following: 3, 4, or 5. It
depends on the order in which the function arguments are evaluated.
The common misconception here is that the arguments would be evaluated
left to right. Or maybe right to left? In fact, there is no order
whatsoever mandated by the language definition.
Precedence does not control the order of execution. Precedence only controls the grouping - that is, precedence says what the operands are for each operation, not when each operation happens.
In this example, the precedence of % is irrelevant due to the parentheses - these say that the operands of % are rand() and 100.
The precedece of , being lower than that of = and != tells us that the operands of = are jo and (rand()%100), and that the operands of != are jo and 50.
The operands of , are then jo = (rand() % 100) and jo != 50.
The definition of the , operator says that the first operand is evaluated, then there is a sequence point, and then the second operand is evaluated. So this case, jo = (rand() % 100) is fully evaluated, which stores the result of rand() % 100 into jo; and then jo != 50 is evaluated. The value of the overall expression is the value of jo != 50.
Well, sequence points is the right answer. but let's translate from the textbook-ese.
The comma operator has a special property: it makes sure that what's on its left hand side is evaluated first. So, when you get to the expression
jo =(rand()%100), jo!=50
even though the != binds more tightly than ',', so that the expressiojn, fully parenthesized is
(jo =(rand()%100)),(jo!=50)
the first part is evaluated first.
To remember this, you can pronouce or read the comma operator as "and then", so
j0=(rand()%100)
"and then"
jo!=50.
It's a mistake to think of "precedence" as "done first".
Consider the following code snippet:
f() + g() + h()
Which has add operation has higher precedence, the one that sums the results of f() and g(), or the one that sums the results of that and h()?
It's a trick question, because there is no need to invoke "precedence" at all. But there is still an order of operations, because function calls in C introduce "sequence points", which is how C allows you to determine "what happens when", as it were.
In your particular code, you have a comma operator—which is quite different from the comma punctuator in function arguments—in this part:
jo = (rand() % 100), jo != 50
The comma operator introduces a sequence point (as does the function call to rand), so we know that rand runs and produces a value, then that value % 100 is computed and assigned to jo, and finally jo is compared with 50.
(There is a sequence point after the evaluation of the controlling expression in the if as well, and one at each statement-ending semicolon.)