Can someone please explain whether i = x[i]++; lead to undefined behavior?
Note: x[i] and i are not both volatile and x[i] does not overlap i.
There is C11, 6.5 Expressions, 2 (emphasis added):
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings. 84)
As I understand:
there is no "different side effect on the same scalar object"
there is no "value computation using the value of the same scalar object"
Are there "multiple allowable orderings"?
Overall: how can the i = x[i]++; be interpreted w.r.t. sequence points, side effects, and undefined behavior (if any)?
UPD. Conclusion: the i = x[i]++; leads to 2 side effects:
"the value of the operand object is incremented" (Postfix increment)
"updating the stored value of the left operand" (Assignment operators)
The Standard does not define the order in which the side effects take place.
Hence, per C11, 4. Conformance, 2:
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior.
Experiments show that GCC/LLVM/ICC have order 1-2, while MSVC (and some others) have order 2-1.
Extra (speculating): why not making it unspecified behavior? Example: "an example of unspecified behavior is the order in which the side effects take place"?
Imagine:
i = 3;
x[] = {1, 1, 1, 1, 1};
So, x[i] equals 1, x[i]++ equals 2 and x becomes {1, 1, 2, 1, 1}, and i becomes 1.
Why would there be any undefined behaviour?
If it were true that
there is no "different side effect on the same scalar object"
there is no "value computation using the value of the same scalar object"
(in every allowed ordering of the subexpressions), then the provision you cite would present no particular issue. That is, the antecedent of its "if" would not hold, so the consequence of that "if" (undefined behavior) would not be asserted.
However, there is both a side effect on i and a value computation using the value of i. The former is the side effect of the assignment, and the latter is the value computation of x[i]++. This is not a problem, however, because, for all forms of assignment,
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.
(C17 6.5.16/3)
Also, for completeness,
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
(C17 6.5/1)
Thus, the assignment's side effect on i is sequenced after the value computation of x[i]++, which is sequenced after the evaluation of i.
There is a value computation using the value of the same scalar object. x[i] uses the value of i.
Since C11, there is a sequence relation in assignment.
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
(C11 6.5.16/3)
Prior to then the way the standard discussed expressions was looser. It did not describe a sequenced before relation. Instead we have:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
You aren't reading the value "other than to determine the value to be stored", so the behaviour is defined.
(C99 6.5/2)
Related
I have a legacy code performing double buffering with this instruction:
bufferIndex = ++bufferIndex & 1;
clang warns about unsequenced modification order of bufferIndex
warning: multiple unsequenced modifications to 'bufferIndex' [-Wunsequenced]
In the doubt, I would avoid such construct, especially since knowing that the bufferIndex has been appropriately initialized to either 0 or 1, this flip/flop could have been written more simply:
bufferIndex ^= 1;
Even without a doubt, I would avoid such pre-increment construct so that I don't cause confusion in my rewiewers mind, and so that I don't generate brainstorming warnings uselessly.
But that's not my point. My point is that I want to understand if unsequenced modification is possibly true here for my own culture.
Is the evaluation order really undefined in this case, or is the warning a bit abusive?
Note: it's not the same case as Unsequenced modification warning where the variable clearly appear twice in an unsequenced manner, nor in the other possible duplicates suggested by SO (unless I overlooked).
It's not at all obvious that this applies to this case of assignment operator = to my understanding, since expression has to be evaluated BEFORE being assigned to, and since the sole side effect is pre-increment and will necessarily happen sometime during evaluation and before assignment.
This statement still triggers undefined behavior.
None of the operators involved (=, binary &, prefix ++) introduce a sequence point, and both the = and ++ operators update one of their operands as a side effect. And because bufferIndex is being modified by a side effect multiple times without a sequence point, we have undefined behavior.
This is spelled out in section 6.5p2 of the C standard:
If a side effect on a scalar object is unsequenced relative
to either a different side effect on the same scalar object
or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple
allowable orderings of the subexpressions of an expression, the
behavior is undefined if such an unsequenced side effect occurs
in any of the orderings.84)
And in fact footnote 84 referenced above gives an almost identical example to yours:
84)) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
Is the evaluation order really undefined in this case, or is the warning a bit abusive?
The relevant rule is this:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings.
(C17, paragraph 6.5/2)
Pursuant to that, we also have
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
(C17, paragraph 6.5/1)
Note that only the value computations are constrained by that, not the application of side effects.
Of the assignment operator, we have:
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C17, paragraph 6.5.16/3)
Note that that sequences the assignment's side effect relative to the value computations of its left and right operands, but not relative to any other side effects.
There is also a sequence point at the terminating semicolon, meaning that all the value computations and side effects of the code preceding are sequenced before all those of the code following.
There are no other sequencing constraints on the given statement expression. In particular, the arithmetic and operator (&) does not add any sequencing constraints, nor does the pre-increment operator (prefix ++).
This last may be what has you confused. The pre-increment operator both computes a result (equal the sum of the stored value of its operand plus 1) and has a side effect of updating the stored value of the object designated by its operand. Although you might think about it in terms of updating the variable and then evaluating to the resulting value, that ordering is neither required nor specified.
None of the applicable constraints sequence the side effect of the assignment on variable bufferIndex relative to the side effect of the pre-increment on that same variable, therefore the behavior is undefined.
There are two side effects on bufferIndex - one from the ++ operation and one from the =, and yes, they are unsequenced with respect to each other and can happen in any order:
tmp = bufferIndex + 1
bufferIndex = tmp & 1
bufferIndex = bufferIndex + 1
or
tmp = bufferIndex + 1
bufferIndex = bufferIndex + 1
bufferIndex = tmp & 1
or in any other order. They can even be executed simultaneously (either interleaved or in parallel if the system supports it).
The behavior is undefined - the compiler is not required to handle the situation in any particular way; any result is equally "correct" as far as the language is concerned.
I'm looking at the final draft of C17, N2176. Here, I'm concerned with what kind of expression with side effects would have it's behaviour undefined.
In section 6.5 Expressions of the standard, there is paragraph 2 that starts with:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined.
As I understand it, evaluation of expression x=1 would both produce a value and also initiate a side effect changing the value of object designated by x. The determining factor would than be whether the side effect is sequenced in any way in relation to the value computation that uses the value of object designated by x.
The description in section 6.5.16 Assignment operators contains this sentence:
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
That does not resolve sequencing of value computation of the whole assignment and the side effect of the assignment.
Also, another sentence:
An assignment expression has the value of the left operand after the
assignment, but is not an lvalue.
Specifies what the final value should be, but does not mandate any sequencing. And I don't see any other text mentioning sequencing regarding side effect and value.
I know that when written as a full statement x=1; the value of assignment is not used. However, the standard says that the value is discarded. That means that it is as if first the value was evaluated and later discarded, so the undefined behaviour should still be triggered.
Is there any other part of the standard that makes this statement behaviour not undefined?
The value of the assignment expression is not a use of the assigned object. This is because the value of x = 1, in, say y = x = 1, is not obtained by lvalue conversion of x but is a value computation of the assignment expression, per C 2018 6.5.16 3:
… An assignment expression has the value of the left operand after the assignment,…
It further says:
… The type of an assignment expression is the type the left operand would have after lvalue conversion…
This is about type, rather than value, but its use of the subjunctive “would have” indicates that “lvalue conversion” is fictitious; it does not actually occur. So an assignment expression is not obtaining its value by reading the left operand; its value is merely a result of the operation.
There is a footnote there (115) that says:
The implementation is permitted to read the object to determine the value but is not required to, even when the object has volatile-qualified type.
This tells us that the left operand may in fact be used to compute the value of the assignment expression. However, this is a statement that it may be implemented by doing so, not a statement that this is part of the semantics of assignment expressions in the C model of computing.
I'm getting different results for the same code with different compilers. Is this a undefined behaivour?
#include <stdio.h>
int a;
int b=10;
int puan_ekle(int puan, int bonus){
puan=puan+bonus;
a=puan-5;
bonus--;
return bonus;
}
int main(){
a=23;
printf("Result1 %d \n", a);
a=a+puan_ekle(a,b);
printf("Result2 %d \n", a);
a=a+puan_ekle(a,b);
printf("Result3 %d \n", a);
}
The behavior is unspecified, not undefined.
The C standard distinguishes these. C 2018 3.4.4 1 says:
unspecified behavior
behavior, that results from the use of an unspecified value, or other behavior upon which this document provides two or more possibilities and imposes no further requirements on which is chosen in any instance
And 3.4.3 1 says:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements
In some situations, when an object is both used for its value and modified, a rule in the C standard makes the behavior undefined. 6.5 2 says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
Let’s see how this applies to a=a+puan_ekle(a,b);. In this expression:
a is modified by the a=.
a is used in the a+.
a is used in the arguments (a,b).
Inside the function puan_ekle, a is modified with a=puan-5;.
The modifications are side effects—they are something that happens separately from computing the value of the expression. If either of the modifications, 1 or 4, is unsequenced relative to any of the other items, the behavior is undefined.
Regarding 1, 6.5.16 3 says:
… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands…
So 1 is sequenced after 2 and 3. Since 4 is a side effect, not a value computation, we still have to consider the relationship of 1 and 4. To resolve this, we will consider sequence points. Per 5.1.2.3, “The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.”
Next we need to know what a full expression is and that there is a sequence point after each full expression. 6.8 4 says:
A full expression is an expression that is not part of another expression, nor part of a declarator or abstract declarator… There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated.
This means that every statement inside puan_ekle is or contains a full expression: puan=puan+bonus is a full expression, a=puan-5 is a full expression, bonus-- is a full expression, and the bonus in return bonus is a full expression. So, after a=puan-5, there is a sequence point.
Since, for a=, the side effect of modifying a is sequenced after the value computations of the operands. Evaluating those operands includes calling the function, which includes its sequence points. So effect 4, modifying a in a=puan-5;, must be completed before execution continues to the next statement, and hence must be completed before effect 1. So 1 and 4 are sequenced.
What is left is to consider effect 4 with respect to 2 and 3. 4 is sequenced after 3 because a function call is sequenced after evaluation of its arguments, per 6.5.2.2 10:
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call…
Now all we have left is the sequencing of 2 relative to 4. In this, there is no specification of which is first. The evaluations of the operands of + are unsequenced, so, for a+puan_ekle(a,b), a C implementation may do either a first or puan_ekle(a,b) first. However, whichever it does first, there is a sequence point between 2 and 4:
If a is evaluated first, then, before the function call, there is a sequence point (per 6.5.2.2 10, quoted above).
If puan_ekle(a,b) is evaluated first, there is a sequence point after the full expression a=puan-5.
Thus, 2 and 4 are indeterminately sequenced. (5.1.2.3 3: “… Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which…”) But they are not unsequenced, so there is no undefined behavior. The behavior is unspecified because there are two possibilities. The C implementation is required implement one of those two possibilities, which is different from undefined behavior, in which there would be no requirements.
The order pf evaluation of the operands of an additive operator is unspecified.
So for example in this statement
a=a+puan_ekle(a,b);
one compiler can at first evaluate the value of a and then call the function puan_ekle(a,b) that has a side effect of changing a. While other compiler can at first call the function and after that get the value of a after it was changed in the function.
So the program has undefined behavior.
If the function had no the side effect then the behavior would be well-defined independently of the order of evaluation of operands of the additive operator.
Does this C99 code produce undefined behavior?
#include <stdio.h>
int main() {
int a[3] = {0, 0, 0};
a[a[0]] = 1;
printf("a[0] = %d\n", a[0]);
return 0;
}
In the statement a[a[0]] = 1; , a[0] is both read and modified.
I looked n1124 draft of ISO/IEC 9899. It says (in 6.5 Expressions):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
However, I feel it strange. Does this actually produce undefined behavior?
(I also want to know about this problem in other ISO C versions.)
the prior value shall be read only to determine the value to be stored.
This is a bit vague and caused confusion, which is partly why C11 threw it out and introduced a new sequencing model.
What it is trying to say is that: if reading the old value is guaranteed to occur earlier in time than writing the new value, then that's fine. Otherwise it is UB. And of course it is a requirement that the new value be computed before it is written.
(Of course the description I have just written will be found by some to be more vague than the Standard text!)
For example x = x + 5 is correct because it is not possible to work out x + 5 without first knowing x. However a[i] = i++ is wrong because the read of i on the left hand side is not required in order to work out the new value to store in i. (The two reads of i are considered separately).
Back to your code now. I think it is well-defined behaviour because the read of a[0] in order to determine the array index is guaranteed to occur before the write.
We cannot write until we have determined where to write. And we do not know where to write until after we read a[0]. Therefore the read must come before the write, so there is no UB.
Someone commented about sequence points. In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.
Does this C99 code produce undefined behavior?
No. It will not produce undefined behavior. a[0] is modified only once between two sequence points (first sequence point is at the end of initializer int a[3] = {0, 0, 0}; and second is after the full expression a[a[0]] = 1).
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
An object can be read more than once to modify itself and its a perfectly defined behavior. Look at this example
int x = 10;
x = x*x + 2*x + x%5;
Second statement of the quote says:
Furthermore, the prior value shall be read only to determine the value to be stored.
All the x in the above expression is read to determine the value of object x itself.
NOTE: Note that there are two parts of the quote mentioned in the question. First part says: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression., and
therefore the expression like
i = i++;
comes under UB (Two modifications between previous and next sequence points).
Second part says: Furthermore, the prior value shall be read only to determine the value to be stored., and therefore the expressions like
a[i++] = i;
j = (i = 2) + i;
invoke UB. In both expressions i is modified only once between previous and next sequence points, but the reading of the rightmost i do not determine the value to be stored in i.
In C11 standard this has been changed to
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
In expression a[a[0]] = 1, there is only one side effect to a[0] and the value computation of index a[0] is sequenced before the value computation of a[a[0]].
C99 presents an enumeration of all the sequence points in annex C. There is one at the end of
a[a[0]] = 1;
because it is a complete expression statement, but there are no sequence points inside. Although logic dictates that the subexpression a[0] must be evaluated first, and the result used to determine to which array element the value is assigned, the sequencing rules do not ensure it. When the initial value of a[0] is 0, a[0] is both read and written between two sequence points, and the read is not for the purpose of determining what value to write. Per C99 6.5/2, the behavior of evaluating the expression is therefore undefined, but in practice I don't think you need to worry about it.
C11 is better in this regard. Section 6.5, paragraph (1) says
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
Note in particular the second sentence, which has no analogue in C99. You might think that would be sufficient, but it isn't. It applies to the value computations, but it says nothing about the sequencing of side effects relative to the value computations. Updating the value of the left operand is a side effect, so that extra sentence does not directly apply.
C11 nevertheless comes through for us on this one, as the specifications for the assignment operators provide the needed sequencing (C11 6.5.16(3)):
[...] The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(In contrast, C99 just says that updating the stored value of the left operand happens between the previous and next sequence points.) With sections 6.5 and 6.5.16 together, then, C11 gives a well-defined sequence: the inner [] is evaluated before the outer [], which is evaluated before the stored value is updated. This satisfies C11's version of 6.5(2), so in C11, the behavior of evaluating the expression is defined.
The value is well defined, unless a[0] contains a value that is not a valid array index (i.e. in your code is not negative and does not exceed 3). You could change the code to the more readable and equivalent
index = a[0];
a[index] = 1; /* still UB if index < 0 || index >= 3 */
In the expression a[a[0]] = 1 it is necessary to evaluate a[0] first. If a[0] happens to be zero, then a[0] will be modified. But there is no way for a compiler (short of not complying with the standard) to change order of evaluations and modify a[0] before attempting to read its value.
A side effect includes modification of an object1.
The C standard says that behavior is undefined if a side effect on object is unsequenced with a side effect on the same object or a value computation using the value of the same object2.
The object a[0] in this expression is modified (side effect) and it's value (value computation) is used to determine the index. It would seem this expression yields undefined behavior:
a[a[0]] = 1
However the text in assignment operators in the standard, explains that the value computation of both left and right operands of the operator =, is sequenced before the left operand is modified3.
The behavior is thus defined, as the first rule1 isn't violated, because the modification (side effect) is sequenced after the value computation of the same object.
1 (Quoted from ISO/IEC 9899:201x 5.1.2.3 Program Exectution 2):
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.
2 (Quoted from ISO/IEC 9899:201x 6.5 Expressions 2):
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
3 (Quoted from ISO/IEC 9899:201x 6.5.16 Assignment operators 3):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
In C99 6.5 says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored
What does "Furthermore, the prior value shall be read only to determine the value to be stored" mean? In C99, why a[i++] = 1 is undefined behavior?
a[i++] = 1 is defined (unless it has other reasons to be undefined than the sequencing of side-effects: out of bound access, or uninitialized i).
You mean a[i++] = i, which is undefined behavior because it reads i between the same sequence points as i++, which change it.
The “Furthermore, the prior value shall be read only to determine the value to be stored” part means that i = i + 1; is allowed, although it reads from i and modifies i.
On the other hand, a[i] = (i=1); isn't allowed, because despite writing to i only once, the read from i is not for computing the value being stored.
The "prior value shall be read only to determine the value to be stored" wording is admittedly counterintuitive; why should the purpose for which a value is read matter?
The point of that sentence is to impose a requirement for which results depend on which operations.
I'll steal examples from Pascal's answer.
This:
i = i + 1;
is perfectly fine. i is read and written in the same expression, with no intervening sequence point, but it's ok because the write cannot occur until after the read has completed. The value to be stored cannot be computed until the expression i + 1, and its subexpression i, have been completely evaluated. (And i + 1 has no side effects that might be delayed until after the write.) That dependency imposes a strict ordering: the read must be completed before the write can begin.
On the other hand, this:
a[i] = (i=1);
has undefined behavior. The subexpression a[i] reads the value of i, and the subexpression i=1 writes the value of i. But the value to be stored in i by the write does not depend on the evaluation that reads i on the left hand side, and so the ordering of the read and the write are not defined. The "value to be stored" is 1; the read of i in a[i] does not determine that value.
I suspect this confusion is why the 2011 revision of the ISO C standard (available in draft form as N1570) re-worded that section. The standard still has the concept of sequence points, but 6.5p2 now says:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings.
And paragraph 1 states explicitly what was only implicitly assumed in C99:
The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator.
Section 5.1.2.3 paragraph 2 explains the sequenced before and sequenced after relationships.