C operator order - c

Why is the postfix increment operator (++) executed after the assignment (=) operator in the following example? According to the precedence/priority lists for operators ++ has higher priority than = and should therefore be executed first.
int a,b;
b = 2;
a = b++;
printf("%d\n",a);
will output a = 2.
PS: I know the difference between ++b and b++ in principle, but just looking at the operator priorities these precende list tells us something different, namely that ++ should be executed before =

++ is evaluated first. It is post-increment, meaning it evaluates to the value stored and then increments. Any operator on the right side of an assignment expression (except for the comma operator) is evaluated before the assignment itself.

It is. It's just that, conceptually at least, ++ happens after the entire expression a = b++ (which is an expression with value a) is evaluated.

Operator precedence and order of evaluation of operands are rather advanced topics in C, because there exists many operators that have their own special cases specified.
Postfix ++ is one such special case, specified by the standard in the following manner (6.5.2.4):
The value computation of the result is sequenced before the side
effect of updating the stored value of the operand.
It means that the compiler will translate the line a = b++; into something like this:
Read the value of b into a CPU register. ("value computation of the result")
Increase b. ("updating the stored value")
Store the CPU register value in a.
This is what makes postfix ++ different from prefix ++.

The increment operators do two things: add +1 to a number and return a value. The difference between post-increment and pre-increment is the order of these two steps. So the increment actually is executed first and the assignment later in any case.

Related

Operator associativity, precedence

I just wonder if, for the following code, the compiler uses associativity/precedence alone or some other logic to evaluate.
int i = 0, k = 0;
i = k++;
If we evaluate based on associativity and precedence, postfix ++ has higher precedence than =, so k++(which becomes 1) is evaluated first and then comes =, now the value of k which is 1 is assigned to i.
So the value of i and k would be 1. However, the value of i is 0 and k is 1.
So I think that the compiler splits this i = k++; into two (i = k; k++;). So here compiler is not going for the statements associativity/precedence, it splits the line as well. Can someone explain how the compiler resolves these kinds of statements?
++ does two separate things.
k++ does two things:
It has the value of k before any increment is performed.
It increments k.
These are separate:
Producing the value of k occurs as part of the main evaluation of i = k++;.
Incrementing k is a side effect. It is not part of the main evaluation. The program may increment the value of k after evaluating the rest of the expression or during it. It may even increment the value before the rest of the expression, as long as it “remembers” the pre-increment value to use for the expression.
Precedence and associativity are not involved.
This effectively has nothing to do with precedence or associativity. The increment part of a ++ operator is always separate from the main evaluation of an expression. The value used for k++ is always the value of k before the increment regardless of what other operators are present.
Supplement
It is important to understand that the increment part of ++ is detached from the main evaluation and is sort of “floating around” in time–it is not anchored to a certain spot in the code, and you do not control when it occurs. This is important because if there is another use or modification of the operand, such as in k * k++, the increment can occur before, during, or after the main evaluation of the other occurrence. When this happens, the C standard does not define the behavior of the program.
Postfix operators have higher precedence than assignment operators.
This expression with the assignment operator
i = k++
contains two operands.
It is equivalently can be rewritten like
i = ( k++ );
The value of the expression k++ is 0. So the variable i will get the value 0.
The operands of the assignment operator can be evaluated in any order.
According to the C Standard (6.5.2.4 Postfix increment and decrement operators)
2 The result of the postfix ++ operator is the value of the operand.
As a side effect, the value of the operand object is incremented (that
is, the value 1 of the appropriate type is added to it).
And (6.5.16 Assignment operators)
3 An assignment operator stores a value in the object designated by
the left operand. An assignment expression has the value of the left
operand after the assignment,111) but is not an lvalue. The type of an
assignment expression is the type the left operand would have after
lvalue conversion. The side effect of updating the stored value of
the left operand is sequenced after the value computations of the left
and right operands. The evaluations of the operands are unsequenced.
Unlike C++, C does not have "pass by reference". Only "pass by value". I'm going to borrow some C++ to explain. Let's implement the functionality of ++ for both postfix and prefix as regular functions:
// Same as ++x
int inc_prefix(int &x) { // & is for pass by reference
x += 1;
return x;
}
// Same as x++
int inc_postfix(int &x) {
int tmp = x;
x += 1;
return tmp;
}
So your code is now equivalent to:
i = inc_postfix(k);
EDIT:
It's not completely equivalent for more complex things. Function calls introduces sequence points for instance. But the above is enough to explain what happens for OP.
It's similar to (only with an additional sequence point for illustration):
i = k; // i = 0
k = k + 1; // k = 1
Operator associativity doesn't apply here. Operator precedence merely states which operand that sticks to which operator. It's not particularly relevant in this case, it just says that the expression should be parsed as i = (k++); and not as (i = k)++; which wouldn't make any sense.
From there on, how this expression is evaluated/executed is specified by specific rules for each operator. The postfix operator is specified to behave as (6.5.2.4):
The value computation of the result is sequenced before the side effect of
updating the stored value of the operand.
That is, k++ is guaranteed to evaluate to 0 and then at some point later on, k is increased by 1. We don't really know when, only that it happens somewhere between the point when k++ is evaluated but before the next sequence point, in this case the ; at the end of the line.
The assignment operator behaves as (6.5.16):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
In this case, the right operand of = has its value computed before updating the left operand.
In practice, this means that the executable can look as either this:
k is evaluated to 0
set i to 0
increase k by 1
semicolon/sequence point
Or this:
k is evaluated to 0
increase k by 1
set i to 0
semicolon/sequence point
Precedence and associativity only affect how operators and operands are associated with each other - they do not affect the order in which expressions are evaluated. Precedence rules dictate that
i = k++
is parsed as
i = (k++)
instead of something like
(i = k)++
The postfix ++ operator has a result and a side effect. In the expression
i = k++
the result of k++ is the current value of k, which gets assigned to i. The side effect is to increment k.
It's logically equivalent to writing
tmp = k
i = tmp
k = k + 1
with the caveat that the assignment to i and the update to k can happen in any order - the operations can even be interleaved with each other. What matters is that i gets the value of k before the increment and that k gets incremented, not necessarily the order in which those operations occur.
The fundamental issue here is that precedence is not the right way to think about what
i = k=+;
means.
Let's talk about what k++ actually means. The definition of k++ is that if gives you the old value of k, and then adds 1 to the stored value of k. (Or, stated another way, it takes the old value of k, plus 1, and stores it back into k, while giving you the old value of k.)
As far as the rest of the expression is concerned, the important thing is what the value of k++ is. So when you say
i = k++;
the answer to the question of "What gets stored in i?" is, "The old value of k".
When we answer the question of "What gets stored in i?", we don't think about precedence at all. We think about the meaning of the postfix ++ operator.
See also this older question.
Postscript: The other thing you have to be really careful about is when you think about the side question, "When does it store the new value into k? It turns out that's a really hard question to answer, because the answer is not as well defined as you might like. The new value gets stored back into k sometime before the end of the larger expression it's in (formally, "before the next sequence point"), but we don't know whether it happens before or after, say, the point at which the thing gets stored into i, or before or after other interesting points in the expression.
Ahh, this is quite an interesting question. To help you understand better, this is what actually happens.
I'm going to try to explain using a bit of operator overloading concepts from C++, so bear with me if you do not know C++.
This is how you would overload the postfix-increment operator:
int operator++(int) // Note that the 'int' parameter is just a C++ way of saying that this is the postfix and not prefix operator
{
int copy = *this; // *this just means the current object which is calling the function
*this += 1;
return copy;
}
Essentially what the postfix-increment operator does is that it creates a copy of the operand, increases the original variable, and then returns the copy.
In your case of i = k++, k++ does actually happen first but the value returned is actually k (think of it like a function call). This then gets assigned to i.

*p++->str : Understanding evaluation of ->

My question is about the following line of code, taken from "The C Programming Language" 2nd Edition:
*p++->str;
The book says that this line of code increments p after accessing whatever str points to.
My understanding is as follows:
Precedence and associativity say that the order in which the operators will be evaluated is
->
++
*
The postfix increment operator ++ yields a value (i.e. value of its operand), and has the side effect of incrementing this operand before the next sequence point (i.e. the following ;)
Precedence and associativity describe the order in which operators are evaluated and not the order in which the operands of the operators are evaluated.
My Question:
My question is around the evaluation of the highest precedence operator (->) in this expression. I believe that to evaluate this operator means to evaluate both of the operands, and then apply the operator.
From the perspective of the -> operator, is the left operand p or p++? I understand that both return the same value.
However, if the first option is correct, I would ask "how is it possible for the evaluation of the -> operator to ignore the presence of the ++".
If the second option is correct, I would ask "doesn't the evaluation of -> in this case then require the evaluation of a lower precedence operator ++ here (and the evaluation of ++ completes before that of ->)"?
To understand the expression *p++->str you need to understand how *p++ works, or in general how postfix increment works on pointers.
In case of *p++, the value at the location p points to is dereferenced before the increment of the pointer p.
n1570 - §6.5.2.4/2:
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]. The value computation of the result is sequenced before the side effect of updating the stored value of the operand.
In case of *p++->str, ++ and -> have equal precedence and higher than * operator. This expression will be parenthesised as *((p++)->str) as per the operator precedence and associativity rule.
One important note here is precedence and associativity has nothing to do with the order of evaluation. So, though ++ has higher precedence it is not guaranteed that p++ will be evaluated first. Which means the expression p++ (in the expression *p++->str) will be evaluated as per the rule quoted above from the standard. (p++)->str will access the str member p points to and then it's value is dereferenced and then the value of p is incremented any time between the last and next sequence point.
Postfix ++ and -> have the same precedence. a++->b parses as (a++)->b, i.e. ++ is done first.
*p++->str; executes as follows:
The expression parses as *((p++)->str). -> is a meta-postfix operator, i.e. ->foo is a postfix operator for all identifiers foo. Postfix operators have the highest precedence, followed by prefix operators (such as *). Associativity doesn't really apply: There is only one operand and only one way to "associate" it with a given operator.
p++ is evaluated. This yields the (old) value of p and schedules an update, setting p to p+1, which will happen at some point before the next sequence point. Call the result of this expression tmp0.
tmp0->str is evaluated. This is equivalent to (*tmp0).str: It dereferences tmp0, which must be a pointer to a struct or union, and gets the str member. Call the result of this expression tmp1.
*tmp1 is evaluated. This dereferences tmp1, which must be a pointer (to a complete type). Call the result of this expression tmp2.
tmp2 is ignored (the expression is in void context). We reach ; and p must have been incremented before this point.

Undefined behaviour seems to be contradicting with operator precedence rule in C [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 4 years ago.
Consider the following line of code.
int i = 2;
i = i++
The second line of code has been identified as undefined. I know this question has been asked before several times and example being this.
But nowhere could I see the issue of operator precedence being addressed in this issue. It has been clearly mentioned that postfix operator precedes assignment operator.
i = (i++)
So clearly i++ will be evaluated first and this value of i is the assigned to i again.
This looks like this particular undefined behavior is contradicting the precedence rule.
Similar to this is the code:
int i = 2;
i++ * i++;
Here according to operator precedence the code can be written as
int i =2;
(i++) * (i++)
Now we do not know whether the (i++) in LHS or RHS of '*' operator is going to be evaluated first. But either way it is going produce the same result. So how is it undefined?
If we write say:
int p;
p = f1() + f2()
where f1() and f2() are defined functions then obviously it's clear we can't decide whether f1() or f2() is going to be evaluated first as precedence rules does not specify this. But a confusion like this does not seem to arise in the current problem.
Please explain.
I do not understand why this question got a negative vote. I needed a clarity between operator precedence and UB and I have seen no other question addressing it.
What you're looking for is in section 6.5 on Expressions, paragraph 3 of the C standard:
The grouping of operators and operands is indicated by the syntax.
Except as specified later, side effects and value computations of
subexpressions are unsequenced.
This means that the side effect of incrementing (or decrementing) via the ++ or -- operators doesn't necessarily happen immediately when the operator is encountered. The only guarantee is that it happens before the next sequence point.
In the case of i = i++;, there is no sequence point in the evaluation of the operands of = nor in the evaluation of postfix ++. So an implementation is free to perform assigning the current value of i to itself and the side effect of incrementing of i in any order. So i could potentially be either 2 or 3 in your example.
This goes back to paragraph 2:
If a side effect on a scalar object is unsequenced relative
to either a different side effect on the same scalar object
or a value computation using the value of the same scalar
object, the behavior is undefined.
Since i = i++ attempts to update i more than once without a sequence point, it invokes undefined behavior. The result could be 2 or 3, or something else might happen as a result of optimizations for example.
The reason that this is not undefined:
int p;
p = f1() + f2()
Is because a variable is not being updated more than once in a sequence point. It could however be unspecified behavior if both f1 and f2 update the same global variables, since the evaluation order is unspecified.
The problem with using
i = i++
is that the order in which the address of i is accessed to read and write is not specified. As a consequence, at the end of that line, the value of i could be 3 or 2.
When will it be 3?
Evaluate the RHS - 2
Assign it to the LHS. i is now 2.
Increment i. i is now 3.
When will it be 2?
Evaluate the RHS - 2
Increment i. i is now 3.
Assign the result of evaluating the RHS to the LHS. i is now 2.
Of course, if there is a race condition, we don't know what's going to happen.
But nowhere could I see the issue of operator precedence being addressed in this issue.
Operator precedence only affects how expressions are parsed (which operands are grouped with which operators) - it has no effect on how expressions are evaluated. Operator precedence says that a * b + c should be parsed as (a * b) + c, but it doesn't say that either a or b must be evaluated before c.
Now we do not know whether the (i++) in LHS or RHS of '*' operator is going to be evaluated first. But either way it is going produce the same result. So how is it undefined?
Because the side effect of the ++ operator does not have to be applied immediately after evaluation. Side effects may be deferred until the next sequence point, or they may applied before other operations, or sprinkled throughout. So if i is 2, then i++ * i++ may be evaluated as 2 * 2, or 2 * 3, or 3 * 2, or 2 * 4, or 4 * 4, or something else entirely.
OPERATOR PRECEDENCE DOES NOT RESOLVE UNDEFINED EXPRESSION ISSUES.
Sorry for shouting, but people ask about this all the time. I'm afraid I have to say your research on this question must not have been very thorough, if you didn't see this aspect being discussed.
See for example this essay.
The expression i = i++ tries to assign to object i twice. It's therefore undefined. Period. Precedence doesn't save you.

(p++)->x Why are the parentheses unnecessary? (K&R)

From page 123 of The C Programming Language by K&R:
(p++)->x increments p after accessing x. (This last set of parentheses is unnecessary. Why?)
Why is it unnecessary considering that -> binds stronger than ++?
EDIT: Contrast the given expression with ++p->x, the latter is evaluated as ++(p->x) which would increment x, not p. So in this case parentheses are necessary and we must write (++p)->x if we want to increment p.
The only other possible interpretation is:
p++(->x)
and that doesn't mean anything. It's not even valid. The only possible way to interpret this in a valid way is (p++)->x.
Exactly because -> binds stronger than ++. (it doesn't, thanks #KerrekSB.)
increments p after accessing x.
So first you access x of p, then you increment p. That perfectly matches the order of evaluation of the -> and the + operators.
Edit: aww, these edit's...
So what happens when you write ++p->x is that it could be interpreted either as ++(p->x) or as (++p)->x (which one is actually chosen is just a matter of language design, K&R thought it would be a good idea to make it evaluate as in the first case). The thing is that this ambiguity doesn't exist in the case of p++->x, since it can only be interpreted as (p++)->x. The other alternatives, p(++->x), p(++->)x and p++(->x) are really just syntactically malformed "expressions".
The maximal munch strategy says that p++->x is divided into the following preprocessing tokens:
p then ++ then -> then x
In p++->x expression there are two operators, the postfix ++ operator and the postifx -> operator. Both operators being postfix operators, they have the same precedence and there is no ambiguity in parsing the expression. p++->x is equivalent to (p++)->x.
For ++p->x expression, the situation is different.
In ++p->x, the ++ is not a postfix operator, it is the ++ unary operator. C gives postfix operators higher precedence over all unary operators and this is why ++p->x is actually equivalent to ++(p->x).
EDIT: I changed the first part of the answer as a result of Steve's comment.
Both post-increment and member access operator are postfix expressions and bind the same. Considering that they apply to the primary or postfix expression to the left, there can't be ambiguity.
In
p++->x
The postfix-++ operator can apply only to the expression to the left of it (i.e. to p).
Similarly ->x can only be an access to the expression to its left, which is p++. Writing that expression as (p++) is not needed, but also does no harm.
The "after" in your description of the effects, does not express temporal order of increment and member access. It only expresses that the result of p++ is the value p had before the increment and that that value is the value used for the member access.
The expresion p++ results in a pointer with the value of p. Later on, the ++ part is performed, but for the purposes of interpreting the expression, it may just as well not be there. ->x makes the compiler add the offset for the member x to the original address in p and access that value.
If you change the statement to :
p->x; p++;
it would do exactly the same thing.
The order of precedence is actually exactly the same, as can be seen here - but it doesn't really matter.

Operator associativity in C specifically prefix and postfix increment and decrement

In C operation associativity is as such for increment, decrement and assignment.
2. postfix ++ and --
3. prefix ++ and --
16. Direct assignment =
The full list is found here Wikipedia Operators in C
My question is when we have
int a, b;
b = 1;
a = b++;
printf("%d", a); // a is equal to 1
b = 1;
a = ++b;
printf("%d", a); //a is equal to 2
Why is a equal to 1 with b++ when the postfix increment operator should happen before the direct assignment?
And why is the prefix increment operator different than the postfix when they are both before the assignment?
I'm pretty sure I don't understand something very important when it comes to operation associativity.
The postfix operator a++ will increment a and then return the original value i.e. similar to this:
{ temp=a; a=a+1; return temp; }
and the prefix ++a will return the new value i.e.
{ a=a+1; return a; }
This is irrelevant to the operator precedence.
(And associativity governs whether a-b-c equals to (a-b)-c or a-(b-c).)
Operator precedence and associativity does not tell you what happens before and what happens after. Operator precedence/associativity has nothing to do with it. In C language temporal relationships like "before" or "after" are defined by so called sequence points and only by sequence points (and that's a totally separate story).
Operator precedence/associativity simply tells you which operands belong to which operators. For example, the expression a = b++ can be formally interpreted as (a = b)++ and as a = (b++). Operator precedence/associativity is this case simply tells you that the latter interpretation is correct and the former is incorrect (i.e. ++ applies to b and not to the result of a = b).
That, once again, does not mean that b should be incremented first. Operator precedence/associativity, once again, has noting to do with what happens "first" and what happens "next". It simply tells you that the result of b++ expression is assigned to a. By definition, the result of b++ (postfix increment) is the original value of b. This is why a will get the original value of b, which is 1. When the variable b will get incremented is completely irrelevant, as long as a gets assigned b's original value. The compiler is allowed to evaluate this expression in any order and increment b at any time: anything goes, as long as a somehow gets the original value of b (and nobody really cares how that "somehow" works internally).
For example, the compiler can evaluate a = b++ as the following sequence of elementary operations
(1) a := b
(2) b := b + 1
or it can evaluate it as follows
(1) b := b + 1
(2) a = b - 1
Note that in the first case b is actually incremented at the end, while in the second case b is incremented first. But in both cases a gets the same correct value - the original value of b, which is what it should get.
But I have to reiterate that the above two examples are here just for illustrative purposes. In reality, expressions like a = ++b and a = b++ have no sequence points inside, which means that from your point of view everything in these expressions happens simultaneously. There's no "before", "after", "first", "next" or "last". Such expressions are "atomic" in a sense that they cannot be meaningfully decomposed into a sequence of smaller steps.
As AndreyT already pointed out, precedence and associativity don't tell you about order of evaluation. They only tell you about grouping. For example, precedence is what tells use that a*b+c is grouped as (a*b)+c instead of a*(b+c). The compiler is free to evaluate a, b and c in any order it sees fit with either of those expressions. Associativity tells you about grouping when you have operators of the same precedence, most often, the same operators. For example, it's what tells you that a-b-c is equivalent to (a-b)-c, not a-(b-c) (otherwise stated, subtraction is left associative).
Order of evaluation is defined by sequence points. There's a sequence point at the end of a full expression (among other things). At the sequence point, all the previous evaluations have to have taken place, and none of the subsequent evaluations can have taken place yet.
Looking at your specific examples, in a=b++;, the result is mostly from the definition of post-increment itself. A post-increment yields the previous value of the variable, and sometime before the next sequence point, the value of that variable will be incremented. A pre-increment yields the value of the variable with the increment applied. In neither case, however, does that mean the variable has to be incremented in any particular order relative to the assignment. For example, in your pre-increment example, the compiler is entirely free to do something equivalent to:
temp = b+1;
a = temp;
b = b + 1;
Likewise, in the post-increment version, the variable can be incremented before or after the assignment:
a = b;
b = b + 1;
or:
temp = b;
b = b + 1;
a = temp;
Either way, however, the value assigned to a must be the value of b before it's incremented.

Resources