A problem with the most simple C operators - c

Can someone explain how this operation works?
#include <stdio.h>
int main()
{
int a = 5;
int b, c;
c = (a = 3) + (b = a + 2) - (a = 2);
printf("%d %d %d", a, b, c);
return 0;
}
The output is 2 5 5
Parentheses go first and start from the left.
c = 3 + (b = 3 + 2) - (a = 2)
c = 3 + 5 - 2
And the output should be 2 5 6

The 3 assignments in (a = 3) + (b = a + 2) - (a = 2) are a problem as it is undefined behavior as they are unsequenced: may occur in any order or even at the same time creating bus conflicts.
Best to re-write this poor code.

C doesn’t force left-to-right evaluation of arithmetic expressions - each of a = 3, b = a + 2, and a = 2 may be evaluated in any order (they are said to be unsequenced with respect to each other). Since two of those expressions modify the value of a, the result of b = a + 2 can vary based on the compiler, optimization settings, even the surrounding code - the result could be 7, 5, 2, or something else completely.
The behavior of modifying a multiple times and using it in a value computation without an intervening sequence point is undefined, meaning the compiler isn’t required to handle the situation in any particular way. It may give you the result you expect, but it doesn’t have to. The result doesn’t have to be consistent from build to build, or even from run to run.
This should give the results you would expect based on a left-to-right reading of your expression.
a = 3;
b = a + 2; // b = 5
c = a + b - 2; // c = 6
a = 2;

Related

Operator precedence versus order of evaluation

A friend asked me to explain the difference between operator precedence and order of evaluation in simple terms. This is how I explained it to them :-
Let's take an example -
int x;
int a = 2;
int b = 5;
int c = 6;
int d = 4;
x = a * b / (c + d);
Here, the final value of x will become 1. This is because first, the values of c and d will be added together (6+4), then the values of a and b will be multiplied together (2*5), and finally, the division will take place (10/10), resulting in the final value becoming 1, which is then assigned to x.
All of this is specified by operator precedence.
In this example, the parentheses force the addition to take place before the multiplication and the division, even though addition has a lower precedence.
Also, the multiplication is executed before the division, because multiplication and division have the same precedence, and both of them have the associativity of left-to-right.
Now comes the important part, i.e. the order of evaluation of this expression.
On one system, the order of evaluation may be like this -
/* Step 1 */ x = a * b / (c + d);
/* Step 2 */ x = a * 5 / (c + d);
/* Step 3 */ x = a * 5 / (c + 4);
/* Step 4 */ x = a * 5 / (6 + 4);
/* Step 5 */ x = a * 5 / 10;
/* Step 6 */ x = 2 * 5 / 10;
/* Step 7 */ x = 10 / 10;
/* Step 8 */ x = 1;
Note that in any step, it is always ensured that the operator precedence is maintained, i.e. even though b was replaced by 5 in Step 2, the multiplication did not take place until Step 7. So, even though the order of evaluation is different for different systems, the operator precedence is always maintained.
On another system, the order of evaluation may be like this -
/* Step 1 */ x = a * b / (c + d);
/* Step 2 */ x = a * b / (6 + d);
/* Step 3 */ x = a * b / (6 + 4);
/* Step 4 */ x = a * b / 10;
/* Step 5 */ x = 2 * b / 10;
/* Step 6 */ x = 2 * 5 / 10;
/* Step 7 */ x = 10 / 10;
/* Step 8 */ x = 1;
Again, the operator precedence is maintained.
In the above example, the entire behaviour is well-defined. One reason for this is that all of the variables are different.
In technical terms, the behaviour in this example is well-defined because there are no unsequenced modifications to any variable.
So, on any system, x will always get assigned the value 1 finally.
Now, let's change the above example to this :-
int x;
int y = 1;
x = ++y * y-- / (y + y++);
Here, the final value that gets assigned to x varies between systems, making the behaviour undefined.
On one system, the order of evaluation may be like this -
/* Step 1 */ x = ++y * y-- / (y + y++); // (y has value 1)
/* Step 2 */ x = ++y * y-- / (1 + y++); // (y still has value 1)
/* Step 3 */ x = ++y * 1 / (1 + y++); // (y now has value 0)
/* Step 4 */ x = 1 * 1 / (1 + y++); // (y now has value 1)
/* Step 5 */ x = 1 * 1 / (1 + 1); // (y now has value 2)
/* Step 6 */ x = 1 * 1 / 2;
/* Step 7 */ x = 1 / 2;
/* Step 8 */ x = 0;
Again, the operator precedence is maintained.
On another system, the order of evaluation may be like this -
/* Step 1 */ x = ++y * y-- / (y + y++); // (y has value 1)
/* Step 2 */ x = ++y * y-- / (y + 1); // (y now has value 2)
/* Step 3 */ x = ++y * 2 / (y + 1); // (y now has value 1)
/* Step 4 */ x = ++y * 2 / (1 + 1); // (y still has value 1)
/* Step 5 */ x = ++y * 2 / 2; // (y still has value 1)
/* Step 6 */ x = 2 * 2 / 2: // (y now has value 2)
/* Step 7 */ x = 4 / 2;
/* Step 8 */ x = 2;
Again, the operator precedence is maintained.
How can I improve this explanation?
I would prefer an explanation that uses function calls. A function call makes it very obvious that "something needs to be evaluated before applying the operator".
Basic example:
int x = a() + b() * c();
must be calculated as
temp = result_of_b_func_call * result_of_c_func_call
x = result_of_a_func_call + temp
due to multiplication having higher precedence than addition.
However, the evaluation order of the 3 function calls is unspecified, i.e. the functions can be called in any order. Like
a(), b(), c()
or
a(), c(), b()
or
b(), a(), c()
or
b(), c(), a()
or
c(), a(), b()
or
c(), b(), a()
Another basic example would be to explain operator associativity - like:
int x = a() + b() + c();
must be calculated as
temp = result_of_a_func_call + result_of_b_func_call
x = temp + result_of_c_func_call
due to left-to-right associativity of addition. But again the order of the 3 function calls are unknown.
If function calls is not an option, I would prefer something like
x = a * b + c / d
Here it's pretty obvious that there are two sub-expressions, i.e. a * b and c / d. Due to operator precedence both of these sub-expressions must be evaluated before the addition but the order of evaluation is unspecified, i.e. we can't tell whether the multiplication or the division is done first.
So it can be
temp1 = a * b
temp2 = c / d
x = temp1 + temp2
or it can be
temp2 = c / d
temp1 = a * b
x = temp1 + temp2
All we know is that the addition must be last.
6.5 Expressions
...
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
85) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ? : (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
86) In an expression that is evaluated more than once during the execution of a program, unsequenced and
indeterminately sequenced evaluations of its subexpressions need not be performed consistently in
different evaluations.
C 2011 Online Draft
Precedence and associativity only control how expressions are parsed and which operators are grouped with which operands. They do not control the order in which subexpressions are evaluated.
Given your example
x = a * b / (c + d);
precedence and associativity cause the expression to be parsed as
(x) = ((a * b) / (c + d))
The multiplicative operators * and / have the same precedence and are left-associative, so a * b / (c + d) is parsed as (a * b) / (c + d) (as opposed to a * (b / (c + d))).
So what this tells us is that the result of a * b is divided by the result of c + d, but this does not mean that a * b must be evaluated before c + d or vice versa.
Each of a, b, c, and d may be evaluated in any order (including simultaneously if the architecture supports it). Similarly each of a * b and c + d may be evaluated in any order, and if the same expression is evaluated multiple times in the program, that order doesn't have to be consistent. Obviously both a and b have to be evaluated before a * b can be evaluated, and both c and d have to be evaluated before c + d can be evaluated, but that's the only ordering you can be certain about.
There are operators that force left-to-right evaluation - ||, &&, ?:, and the comma operator, but in general order of evaluation is a free-for-all.
It's not necessarily true to say that the "the parentheses force the addition to take place before the multiplication and the division". You can see this in a disassembly of the code (gcc 10.2.0):
x = a * b / (c + d);
1004010b6: 8b 45 fc mov -0x4(%rbp),%eax
1004010b9: 0f af 45 f8 imul -0x8(%rbp),%eax
1004010bd: 8b 4d f4 mov -0xc(%rbp),%ecx
1004010c0: 8b 55 f0 mov -0x10(%rbp),%edx
1004010c3: 01 d1 add %edx,%ecx
1004010c5: 99 cltd
1004010c6: f7 f9 idiv %ecx
The multiplication was performed first, followed by the addition, then the division.
Nope, you say
Here, the final value of x will become 1. This is because first, the values of c and d will be added together (6+4), then the values of a and b will be multiplied together (2*5), and finally, the division will take place (10/10), resulting in the final value becoming 1, which is then assigned to x.
the evaluation order establishes that 6 + 4 will be evaluated before the division is done... but not that the compiler cannot first arrange to evaluate first c * d (because the multiplication operators are left associative, and this means --also-- that the multiplication will be made before the division). You don't even know (except if you look at the assembler output) which order of subexpression evaluation will the compiler select. As stated, the full parenthesized expression would be:
(x = ((a * b) / (c + d)));
so, the compiler will decide to start first with a * b or c + d indistinctly. Then it will do the other operation, then it will do the division, and finally the assignment. But beware, because the assignment requires the address of x and not its value (it's an lvalue), so the address of x can be calculated at any point, but before the assignment is made. Finally, the (unused) value of the assignment is thrown.
a possible order could be:
calculate a * b
calculate address of x
calculate c + d
calculate the division (a*b)/(c+d)
store the result at position &x.
a different one:
calculate c + d
calculate a * b
calculate the division (a*b)/(c+d)
calculate address of x
store the result at position &x.
but you could also calculate the address of x in the first step.

Is (a = 0, a) + (a =1, a) undefined behaviour for int a?

Is
int main()
{
int a;
int b = (a = 0, a) + (a = 1, a);
}
defined? Without the , a in each term, the program behaviour is clearly undefined due to multiple unsequenced writes to a, but don't the , introduce adequate sequencing points?
No it isn't well-defined. Suppose we replace all sequence point in your code with pseudo code "SQ":
SQ
int b = (a = 0 SQ a) + (a = 1 SQ a) SQ
Then we have SQ a) + (a = 1 SQ where two accesses and one side effect happens to a between sequence points, so it is still undefined behavior.
We could write well-defined (but of course very bad and fishy) code like this:
(0, a = 0) + (0, a = 1)
The order of evaluation of the + operands is still unspecified, but the compiler must evaluate either parenthesis before moving on to the next. So there's always a comma operator sequence point between the side-effects/access of a.

what is the value of a++ + a if value of a is 5? [duplicate]

This question already has answers here:
Order of operations for pre-increment and post-increment in a function argument? [duplicate]
(4 answers)
Closed 7 years ago.
i am a beginner in c, and i am finding it difficult to understand the post and pre increment i have given my code below,i already compiled it in a turbo c++ compiler and i got output as
a = 6 and b = 10 but since the post increment operator is used the output should be a = 6 and b = 11 ,why is it not happening?could someone explain it..
#include<stdio.h>
int main()
{
int a=5,b;
b = a++ + a;
printf("\na = %d and b = %d",a,b);
return 0;
}
The behaviour of a++ + a; is undefined in C. This is because the + is not a sequencing point and you're essentially attempting to increment and read a in the same expression.
So you can't guarantee a particular answer.
In order to understand prefix and postfix increments, use statements like b = a++; and b = ++a;
What happens in the following?
b = a++ + a;
1) Is a incremented and its original value is then added to the original value of a?
2) Is a incremented and its original value is then added to the new value of a?
3) Is a on the right side fetched first and then added to the original value of an incremented a?
C allows any of theses approaches (and likely others) as this line of code lacks a sequence point which would define evaluation order. This lack of restriction allows compilers often to make optimized code. It comes at a cost as the approaches do not generate the same result when accessing a in the various ways above.
Therefore it is implementation defined behavior. Instead:
b = a++;
b = b + a;
or
b = a;
b = b + a++;
After int a = 5; the value of a is 5
b = a; // b is 5;
After int a = 5; the value of a++ is 5
b = a++; // b is 5
but the side effect of a++ is to increase the value of a. That increase can happen anytime between the last and next sequence points (basically the last and next semicolon).
So
/* ... */;
b = a++ + a;
#if 0
/* side-effect */ 5 + 6
5 /* side-effect */ + 6
5 + /* side effect mixed with reading the value originating a strange value */ BOOM
5 + 5 /* side effect */
#endif

what is the difference between i = i + j; and i += j; in c language? [duplicate]

This question already has answers here:
Is a += b more efficient than a = a + b in C?
(7 answers)
Closed 9 years ago.
what is the difference between i = i + j; and i += j; in C?
Are they equivalent? Is there any side effect of i?
I was trying to check the assignment mechanism in C using the GCC compiler.
They're almost the same. The only difference is that i is only evaluated once in the += case versus twice in the other case.
There is almost no difference, but if i is a complex expression, it is only computed once. Suppose you had:
int ia[] = {1, 2, 3, 4, 5};
int *pi = &(ia[0]); // Yes, I know. I could just have written pi = ia;
*pi++ += 10;
// ia now is {11, 2, 3, 4, 5}.
// pi now points to ia[1].
// Note this would be undefined behavior:
*pi++ = *pi++ + 10;
i = i + j is equivalent to i += j but not same.
In some cases(rare) i += j differs from i = i + j because i itself has a side effect.
Also one more problem is operator precedence i.e
i = i * j + k;
is not same as
i *= j + k;
The two statements i = i + j and i += j, are functionally same, in first case you are using the general assignment operation, while the second one uses the combinatorial assignment operator. += is additive assignment operator (addition followed by assignment).
The use of combinatorial assignment operators generates smaller source code that is less susceptible to maintenance errors and also possibly a smaller object code where it would also run faster. Compilation is also likely to be a little faster.
Syntactic sugar baby.
Any differences are just going to come down to compiler implementation.
http://en.wikipedia.org/wiki/Syntactic_sugar
In both cases i (the variable or expression being assigned) must be an lvalue. In most simple cases this will yield code that is identical in both cases so long as i is not declared volatile.
However there are a few cases where a lvalue can be an expression involving operators, and this may cause evaluation of i twice. The most plausible example of an lvalue expression that might be used in that way is perhaps simple dereferencing of a pointer (*p):
*p = *p + j ;
*p += j ;
may generate different code, but it is trivially optimised so I would expect not even without optimisation enabled. Again p cannot be volatile, otherwise the expressions are semantically different.
A less plausible scenario is to use a conditional operator expression as an lvalue. For example the following adds j to b or c depending on a:
(a ? b : c) += j ;
(a ? b : c) = (a ? b : c) + j ;
These might generate different code - the compiler might reasonably not spot that idiom and apply an optimisation. If the expression a has side effects - for example were the expression getchar() == '\n' or a is volatile (regardless of b or c), then they are not equivalent since the second would evaluate to:
c = b + j for the input "Y\n",
b = b + j for input "\n\n",
c = c + j for input "YN".
These points are of course mostly irrelevant - if you write code like that and it does things you did not expect, sympathy may be in short supply!

why lvalue required as increment operand error? [duplicate]

This question already has answers here:
lvalue required as increment operand error
(4 answers)
Closed 8 years ago.
Why lvalue required as increment operand Error In a=b+(++c++); ?
Just Wanted to assign 'b+(c+1)' to 'a' and Increment 'C' by 2 at the same time.
I'M A Beginner Just Wanted A Clarification About What "LVALUE ERROR" Actually Is?
main()
{
int a=1,b=5,c=3;
a=b+(++c++);
printf("a=%d b= %d c= %d \n",a,b,c);
}
Postfix increment binds tighter than prefix increment so what you would want would be something like:
a = b + (++c)++;
This is not legal C, though, as the the result of prefix increment (like the result of postfix increment in your example) is not an lvalue. This means that it's just a value; it doesn't refer to a particular object like 'c' any more so trying to change it makes no sense. It would have no visible effect as no object would be updated.
Personally I think that doing it in two statements is clearer in any case.
a = b + c + 1;
c += 2;
LVALUE means, that there isn't a variable the operation is supposed to be performed on.
C files are basically nothing but text files, which require a particular formatting, so the compiler can understand it.
Writing something like ++Variable++ is complete nonsense for the compiler.
You can basically imagine ++c as:
Var += 1;
return Var;
while c++ is:
int Buf = Var;
Var += 1;
return Buf;
To 'repair' your code:
void main() {
int a=1,b=5,c=3;
a = b + (++c); //Equals 5 + 4
printf("a=%d b= %d c= %d \n",a,b, ++c); //a = 9, b = 5, c = 5
}
This way, you'll get the result you wanted, without the compiler complaining.
Please remember, that when using ++c or c++ in a combined operation, the order DOES matter.
When using ++c, the higher value will be used in the operation, when using c++, it will operate with the old value.
That means:
int a, c = 5;
a = 5 + ++c; //a = 11
while
int a, c = 5;
a = 5 + c++; //a = 10
Because in the latter case, c will only be '6' AFTER it is added to 5 and stored in a.

Resources