A discussion arose around the C statement x = b[i] + i++; and its definedness.
The argument for said statement to be undefined goes something like this:
§ 6.5 of C99 states:
[…] the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
Thus it is not guaranteed that i is incremented after it is used in the subscript operator as index of the array.
However, I interpret said specification differently.
§ 6.5 of C99 additionally states:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
§ 5.1.2.3 of C99 states:
At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.
A list of sequence points is given in annex C and only the following matches the statement in question IMHO.
The end of a full expression
The evaluation of b[i] (the value of element i of b) and that of i++ (just i) can happen in any order before the addition (and evaluation of =, which is the value of the RHS) is done. However, the side effects of the whole statement are deferred until after all these evaluations because that's the only sequence point. In this case the side effects are the change of x and the increment of i.
Who is right? Are there additional paragraphs relevant for the argument? Is it any different in C++?
Side effects don't have to be deferred until the sequence point -- they may be applied immediately upon evaluation. Or not.
C 2011 has some slightly different (more precise) language:
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
C 2011, §6.5 ¶2
i++ has a side effect on i, b[i] uses i in a value computation, and the two subexpressions are unsequenced relative to each other (i.e., there is no intervening sequence point). Thus, the behavior of b[i] + i++ is undefined.
Your quotation from section 6.5 is the relevant one:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
[In that event, f]urthermore, the prior value
shall be read [between those sequence points] only to determine the value
to be stored.
(Clarifications mine.)
In your statement, the value of i is both modified and used as an index into b. Your statement contains no internal sequence points, so these effects must occur between the same pair of sequence points. The statement therefore violates the quoted requirement. Section 4, paragraph 2 then applies:
If a ''shall'' or ''shall not'' requirement that appears outside of a
constraint is violated, the behavior is undefined. [...]
That's all there is to it. No other considerations are required. Your argument about actual order of operations is completely irrelevant.
Nevertheless, your claim that
the side effects of the whole statement are deferred until after all
these evaluations because that's the only sequence point.
reflects a serious misunderstanding of sequence points. Sequence points do not represent times when things happen, but rather boundaries between which things happen. Not only are side effects not deferred to the next sequence point, they are far less constrained (by the standard) than operations involved in computing the values of expressions.
§6.5.2.4 states
The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
Just like Eugene's comment suggested. In case this is not clear enough the statement cited in the question in § 6.5 (1)
the prior value shall be read only to determine the value to be stored.
is violated directly as well. The value of i is not just read to determine the value after incrementing but also as operand of the subscript operator.
This question and its accepted answer might give additional insights as it discusses the sequence points introduced by the , operator and its interaction with the potential UB-provoking behavior of assignments.
Related
Word limit on question length..
As pointed out by #Karl Knechtel I am confused that isn't fetching the operation of the array indexing unsequenced relative to the i++ increment operation? If they are unsequenced, why the C Standard 6.5.2 line mentioning about (emphasis added to the words/phrase which i understand, applies here)
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the
behavior is undefined.
I read this question I can not understand some sentences in C99 wherein the OP tries to understand why a[i++] = 1 is undefined. Accepted and one of the highest voted answers by Pascal Cuoq mentions that this is defined behavior.
I also tried compiling the program using the -std=c99, -Wall and -Wextra flag and a slew of other flags (basically all the flags which are enabled in GCC 11.2.0), but the code didn't throw any warning.
However, my question/confusion is why is this a defined behaviour?
From the C11 standard S6.5.2
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings of the subexpressions of an
expression, the behaviour is undefined if such an unsequenced side effect occurs in any of the
orderings.
my understanding/reasoning after reading through most of the threads on SO (with Tags [C] and [sequence-points]) is that i++ would result in a side effect of updating the value of i. in that case this side-effect is unsequenced to the value computation using the same scaler object. I understand that a[integer object] constitutes value computation. Then, it should be undefined behavior?
Even from the C99 S6.5(p2)
Furthermore, the prior value shall be read only to determine the value to be stored.
I understand/construe that this expression should also render a[i++] = 1 undefined?
in that case this side-effect is unsequenced to the value computation using the same scaler object.
The scalar object involved in i++ is i. The side effect of updating i is not unsequenced relative to the computation of the value of i++ because C 2018 6.5.2.4 (which specifies behavior of postfix increment and decrement operators) paragraph 2 says:
… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…
C 2011 has the same wording. (C 2018 contains only technical corrections and clarifications to C 2011.)
Even from the C99 S6.5(p2)
Furthermore, the prior value shall be read only to determine the value to be stored.
A rule in the C 1999 standards has no application to the 2011 or 2018 standards; it must be interpreted separately. Between 1999 and 2011, the standard moved from solitary sequence points to finer rules about sequencing relationships.
In i++, the prior value is read to determine what the new value of i should be, so it conforms to that rule.
The rule was an attempt to say that any reads of a scalar object had to be in the prerequisite chain of an writes of the object. For example, in i = 3*i + i*i, all three reads of i are necessary to compute the value to be written to i, so they are necessarily performed before the write. But in i = ++i + i;, the read of i for that last term is not a prerequisite for writing to i for the ++i, so it is not necessarily performed before the write. Thus, it would not conform to the rule.
… I am confused that isn't fetching the operation of the array indexing unsequenced relative to the i++ increment operation?
The read of the array element is unsequenced relative to the update of i, and that is okay because there is no rule that requires it to be sequenced. C 2018 6.5 2 says, emphasis added:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
The array element is a different scalar object from i, so we do not care that there is no sequencing between the read of the array element and the update to i.
Thanks a lot for the immediate responses from members. I would try to attempt an answer in the language, which, I understood.
After reading a suggestion to read this article use of abstract tree to tackle sequence point problem (I know, it's not normative)
Let me represent a[i++] =1 using the abstract syntax tree.
I realize that merely asking about undefined behavior leads to downvotes by some, but I have a question comparing C99 v. Sep 2007 (the only one I have access to, and which so matters to me), and the one from 2011. The relevant quotes are from 6.5 (2) in either version:
2007: "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored. (highlight added)"
2011: "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the same scalar object, the behavior is undefined. (...)"
An example given to illustrate what contradicts this in the 2007 version is:
i = ++i + 1;
As C considers an assignment always an expression (no assignment statements), this expression is semantically delineated by 2 sequence points. It is fairly obvious that both versions declare the above to result in undefined behavior.
However, given the highlighted sentence of the 2007 version, it would be my understanding that even the following expression (lying again between two sequence points) would result in undefined behavior:
++i; // or i++; or a = ++i;
, clearly, the "value to be stored" is not only read ('stored' is a bit ambiguous, but I would naturally read it as the one read): it is read, incremented, then stored back. It is sequenced though, and so fine (as it probably should be) by the 2011 wording.
Was this adjustment to the wording made to address the above, in order to match intent to description?
Note: I realize that to an extent this is opinion-based, but (1) the best-case would be that someone actually involved in writing the standard sees this, and (2) while I believe my interpretation to be reasonable/"true", if someone argues convincingly against it, this would be useful too.
However, given the highlighted sentence of the 2007 version, it would be my understanding that even the following expression (lying again between two sequence points) would result in undefined behavior:
++i; // or i++; or a = ++i;
You understood wrong. This is best explained in c-faq question-3.8:
....And that's what the second sentence says: if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written. This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification. For example, the old standby i = i + 1 is allowed, because the access of i is used to determine i's final value. The example
a[i] = i++
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored.
In case of i++; or ++i; the access of i and its incrementation has to do with the value which ends up being stored in i.
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 8 years ago.
For my compiler class, we are gradually creating a pseudo-PASCAL compiler. It does, however, follow the same precedence as C. That being said, in the section where we create prefix and postfix operators, I get 0 for
int a = 1;
int b = 2;
++a - b++ - --b + a--
when C returns a 1. What I don't understand is how you can even get a 1. By doing straight prefix first, the answer should be 2. And by doing postfix first, the answer should be -2. By doing everything left to right, I get zero.
My question is, what should my precedence of my operators be to return a 1?
Operator precedence tells you for example whether ++a - b means (++a) - b or ++(a - b). Clearly it should be the former since the latter isn't even valid. In your implementation it's clearly the former (or you wouldn't be getting a result at all), so you implemeneted operator precedence correctly.
Operator precedence has nothing to do with the order in which subexpressions are evaluated. In fact the order in which the operator operands to + and - are evaluated is unspecified in C and any code that modifies the same variable twice without a sequence point in between invokes undefined behavior. So whichever order you choose is fine and 0 is as valid a result as any other value.
It is illegal to change variables several times in a row like that (roughly between asignments, the standard talks about sequence points). Technically, this is what the C standard calls undefined behaviour. The compiler has no obligation to detect you are writing nonsense, and can assume you will never do. Anything whatsoever can happen when you run the program (or even while compiling). Also check nasal demons in the Jargon File.
The ++ increment and -- decrement operators can be placed before or after a value, different affect. If placed before the operand (prefix), its value is immediately changed, if placed after the operand (postfix) its value is noted first, then the value is changed.
McGrath, Mike. (2006). C programming in easy steps, 2nd Edition. United Kingdom : Computer Step.
C99 §6.5 Expressions
(1) An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof.
(2) Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)
with the footnotes
72) A floating-point status flag is not an object and can be set more than once within an expression.
73) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
where C11 §6.5 changed to (the text of (1) has an addendum):
(1) […] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
(2) If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)
where footnote 84 in C11 is the same as 73 in C99.
I'm a little confused… I read C11 (2) as "[…] either (a different side effect on the same scalar object) or (a value computation using the value of the same scalar object) […]" which seems to not even allow foo = ++i (there is a side effect and we use a value depending on the changed object). I'm not a native speaker, though, so it would be nice if one could tell me how this sentence should be "parsed". I understand C99, but I don't quite understand the wording of C11.
Anyway, the actual question: Is this a change from C99 to C11, or are these wordings equivalent? And if so, why it has been changed? And if not, could someone give an example of an expression which is UB in C99 but not in C11 or vice versa?
C11 (and also C++11) has completely reworked the wording of sequencing because C11 now has threads, and it had to explain what sequencing between threads that access the same data means. The intention of the committee was to keep things backward compatible to C99 for the case where there is only one thread of execution.
Let's have a look at the C99 version:
Between the previous and next sequence point
an object
shall have
its stored value modified at most once
by the evaluation of an expression.
compared to the new text
If a side effect on
different terminolgie for 4, modifying the stored value
a scalar object
a restriction of the previous wording in 2. The new text only says
something about scalar objects
is unsequenced relative to either
unsequenced is a generalization of the concept in 1. that two statements
were separated by a sequence point. Think of two threads that modify
the same data without using a lock or something similar.
a different side effect on the same scalar object
the object is only allowed be modified once
or a value
computation using the value of the same scalar object,
or a read of the value may not appear concurrently to the modification
the behavior is undefined.
The "shall" in 3. is saying this implicitly. All "shall"s lead to UB if
they are not fulfilled.
I'm a little confused… I read C11 (2) as "[…] either (a different side effect on the same scalar object) or (a value computation using the value of the same scalar object) […]" which seems to not even allow foo = ++i (there is a side effect and we use a value depending on the changed object).
If you read the standard quote carefully
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)
then you will find that your wording should be:
If a side effect on a scalar object is unsequenced relative to either (a different side effect on the same scalar object) or (a value computation using the value of the same scalar object).
This means that foo = ++i is a defined statement. It is true that there is a side effect on i (on foo also) but nothing is unsequenced here for the object i.
This is an explanation of foo = ++i but not really an answer to the question.
Prefix increment is defined in terms of compound assignment, see 6.5.3/2
The expression ++E is equivalent to (E+=1)
For assignment in general, there's a guarantee in 6.5.16/3
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
So foo = ++i is equivalent to foo = (i+=1). The inner i+=1 requires the modification of i to be sequenced after the computation i+1. The resulting value of the expression (i+=1) is specified in 6.5.16/3 as:
An assignment expression has the value of the left operand after the assignment, but is not an lvalue.
It seems as if this requires the value computation of i+=1 to be sequenced after the modification of i, and in C++11, this is even guaranteed explicitly [expr.ass]/1
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
(which is clearer to me, but I know C++ far better than C)
The modification of i is sequenced before the value computation of i+=1, so we don't have UB accessing the value of ++i in foo = ++i (as the value computation of the left and right operands of foo = x are sequenced before the modification of foo).
As far as I understand it,
If a side effect on a scalar object is unsequenced relative to ... a value computation using the value of the same scalar object
does not apply here because of (1) which states that
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
In other words, the result is defined to "come later", i. e. it is sequenced.
When does the post increment operator affect the increment? I have come across two opinions:
1) From http://gd.tuwien.ac.at/languages/c/programming-bbrown/c_015.htm:
POST means do the operation after any
assignment operation.
2) Closer home, an answer on SO(albeit on C++) says:
... that delays the increment
until the end of the expression
(next sequence point).
So does the post increment operation...
A) wait until a sequence point is reached or
B) happen post an assignment operator or
C) happen anytime before the sequence point?
The correct interpretation is C, ie. the increment happens sometime before the next sequence point, specifically the C standard (C99, 6.5.2.4, 2) says this:
The side effect of updating the stored value of the operand shall occur between
the previous and the next sequence point.
Full paragraph quotation:
The result of the postfix ++ operator is the value of the operand. After the result is
obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate
type is added to it.) See the discussions of additive operators and compound assignment
for information on constraints, types, and conversions and the effects of operations on
pointers. The side effect of updating the stored value of the operand shall occur between
the previous and the next sequence point.
The post increment operation always occurs before the next sequence point irrespective of the expression where the increment operator is being used.
See this link for more info http://en.wikipedia.org/wiki/Sequence_point