Defined behaviour for expressions - c

The C99 Standard says in $6.5.2.
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
(emphasis by me)
It goes on to note, that the following example is valid (which seems obvious at first)
a[i] = i;
While it does not explicitly state what a and i are.
Although I believe it does not, I'd like to know whether this example covers the following case:
int i = 0, *a = &i;
a[i] = i;
This will not change the value of i, but access the value of i to determine the address where to put the value. Or is it irrelevant that we assign a value to i which is already stored in i? Please shed some light.
Bonus question; What about a[i]++ or a[i] = 1?

The first sentence:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
is clear enough. The language doesn't impose an order of evaluation on subexpressions unless there's a sequence point between them, and rather than requiring some unspecified order of evaluation, it says that modifying an object twice produces undefined behavior. This allows aggressive optimization while still making it possible to write code that follows the rules.
The next sentence:
Furthermore, the prior value shall be read only to determine the value to be stored
does seem unintuitive at first (and second) glance; why should the purpose for which a value is read affect whether an expression has defined behavior?
But what it reflects is that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated. The C90 and C99 standards do not state this explicitly.
A clearer violation of that sentence, given in an example in the footnote, is:
a[i++] = i; /* undefined behavior */
Assuming that a is a declared array object and i is a declared integer object (no pointer or macro trickery), no object is modified more than once, so it doesn't violate the first sentence. But the evaluation of i++ on the LHS determines which object is to be modified, and the evaluation of i on the RHS determines the value to be stored in that object -- and the relative order of the read operation on the RHS and the write operation on the LHS is not defined. Again, the language could have required the subexpressions to be evaluated in some unspecified order, but instead it left the entire behavior undefined, to permit more aggressive optimization.
In your example:
int i = 0, *a = &i;
a[i] = i; /* undefined behavior (I think) */
the previous value of i is read both to determine the value to be stored and to determine which object it's going to be stored in. Since a[i] refers to i (but only because i==0), modifying the value of i would change the object to which the lvalue a[i] refers. It happens in this case that the value stored in i is the same as the value that was already stored there (0), but the standard doesn't make an exception for stores that happen to store the same value. I believe the behavior is undefined. (Of course the example in the standard wasn't intended to cover this case; it implicitly assumes that a is a declared array object unrelated to i.)
As for the example that the standard says is allowed:
int a[10], i = 0; /* implicit, not stated in standard */
a[i] = i;
one could interpret the standard to say that it's undefined. But I think that the second sentence, referring to "the prior value", applies only to the value of an object that's modified by the expression. i is never modified by the expression, so there's no conflict. The value of i is used both to determine the object to be modified by the assignment, and the value to be stored there, but that's ok, since the value of i itself never changes. The value of i isn't "the prior value", it's just the value.
The C11 standard has a new model for this kind of expression evaluation -- or rather, it expresses the same model in different words. Rather than "sequence points", it talks about side effects being sequenced before or after each other, or unsequenced relative to each other. It makes explicit the idea that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated.
In the N1570 draft, section 6.5 says:
1 An expression is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof. The value computations of the operands
of an operator are sequenced before the value computation of the
result of the operator.
2 If a side effect on a scalar object is unsequenced relative to
either a different side effect on the same scalar object or a
value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings
of the subexpressions of an expression, the behavior is undefined
if such an unsequenced side effect occurs in any of the orderings.
3 The grouping of operators and operands is indicated by the syntax.
Except as specified later, side effects and value computations
of subexpressions are unsequenced.

Reading an object's value to determine where to store it does not count as "determining the value to be stored". This means that the only point of contention can be whether or not we are "modifying" the object i: if we are, it is undefined; if we are not, it is OK.
Does storing the value 0 into an object which already contains the value 0 count as "modifying the stored value"? By the plain English definition of "modify" I would have to say not; leaving something unchanged is the opposite of modifying it.
It is, however, clear that this would be undefined behaviour:
int i = 0, *a = &i;
a[i] = 1;
Here there can be no doubt that the stored value is read for a purpose other than determining the value to be stored (the value to be stored is a constant), and that the value of i is modified.

Related

Are a[i]=y++; and a[i++]=y; undefined behavior or unspecified in C language?

When I was looking for the expression v[i++]=i; why it is to define the behavior, I suddenly saw an explanation because the expression exists between two sequence points in the program, and the c standard stipulates that in the two sequence points The order of occurrence of the side effects is uncertain, so when the expression is run in the program, it is not sure whether the ++ operator is operated first or the = operator is operated first. I am puzzled by this. When the expression is evaluated In the process, shouldn't the priority be used to judge first, and then the sequence point should be introduced to judge which sub-expression is executed first? Am I missing something?
When user AnT stands with Russia explained it like this, does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.
The reason v[i++]=i; is undefined behavior is because the variable i is both read and written in the same expression without sequencing.
Expressions such as a[i]=y++ and a[i++]=y do not exhibit undefined behavior because no variable is both read and written in the expression without sequencing.
The = operator does however ensure that both of its operands are fully evaluated before the side effect of assigning to the left side. Specifically, a[i] is evaluated to be an lvalue designating the ith element of the array a, and y++ is evaluated to be the current value of y.
The specific rule in the C standard is C 2018 6.5 2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
The first sentence is the critical one here. First, consider v[i] = i++;. Here, the i in v[i] computes the value of i, and the i++ both computes the value of i and increments the stored value of i. Computing the value of i is a value computation of i. Incrementing the stored value of i is a side effect. To determine whether the behavior of v[i] = i++; is undefined, we ask whether the side effect is unsequenced relative to any other side effect on i or to a value computation on i.
There is no other side effect on i, so it is not unsequenced relative to any other side effect.
There is a value computation in i++, but the side effect and this value computation are sequenced by the specification of the postfix ++ operator. C 2018 6.5.2.4 2 says:
… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…
So we know the computation of the value of i in i++ is sequenced before the side effect of incrementing the stored value.
Now we consider the value computation of the i in v[i]. The ++ specification does not tell us about this, so let’s consider the assignment operator, =. The specification of assignment does say something about sequencing, in C 2018 6.5.16 3:
… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
The first sentence tells us the update of v[i] is sequenced after the value computations of the left and right operands. But it does not tell us anything about the side effect in ++ relative to the value computation of i in v[i].
Therefore, the value computation of i in v[i] is unsequenced relative to the side effect on i in i++, so the behavior of the statement is not defined by the C standard.
In a[i] = y++; we have:
A value computation on i in a[i].
A value computation on y in y++.
An update of the stored value of y in y++.
A value computation on a in a[i].
An update of the stored value of a[i] in a[i] = ….
The only object that is updated twice or that is both updated and evaluated is y, and we know from above that the value computation on y in y++ is sequenced before the update of y. So this statement does not contain any side effect that is unsequenced relative to another side effect or value cmputation on the same object. So its behavior is not undefined by the rule in C 2018 6.5 2.
Similarly, in a[i++] = y;, we have:
A value computation on i in a[i++].
An update of the stored value of i in i++.
A value computation on y.
A value computation on a in a[i].
An update of the stored value of a[i] in a[i++] = ….
Again, there is only one object with two operations on it, and those operations are sequenced. The behavior is not undefined by the rule in C 2018 6.5 2.
Note
In the above, we assume neither a nor v is a pointer such that a[i] or v[i] would be i or y. If instead we consider this code:
int y = 3;
int *a = &y;
int i = 0;
a[i] = y++;
Then the behavior is undefined because a[i] is y, so the code updates y twice, once for the assignment a[i] = … and once for y++, and these updates are unsequenced. The specification of assignment says the update to the left operand is sequenced after the value computation of the result (which is the value of the right side of the assignment), but the increment for ++ is a side effect, not part of the value computation. So the two updates are unsequenced, and the behavior is not defined by the C standard.
An attempt to explain the "standardese" terms plainly:
The standard says (C17 6.5) that in an expression, a side effect of a variable may not occur in an unsequenced order in relation to a value computation of that same object.
To make sense of these strange terms:
Side effect = writing to a variable or perform a read or write access to a volatile variable.
Value computation = reading the value from memory.
Unsequenced = The order between accesses/evaulations is not specified nor well-defined. C has the concept of sequence points, which are certain points in the program that when reached, previous side effects must have been evaluated. For example, a ; introduces a sequence point. Two parts of an expression are unsequenced in relation to each other when the order of evaluation of each part is not well-defined before the next sequence point. (A complete list of all sequence points can be found in C17 Annex C.)
So when translated from standardese to English, v[i++]=i; has undefined behavior since i is written to in an unspecified order related to the other read of i in the same expression. How do we know that?
The assignment operator = says that (6.5.16) "the evaluations of the operands are unsequenced", refering to the left and right operands of =.
The postfix ++ operator says that (6.5.2.4) "As a side effect, the
value of the operand object is incremented" and "The value computation of the result is sequenced before the side effect of updating the stored value of the operand". In practice meaning that i is first read and the ++ is applied later, though before the next sequence point, in this case the ;.
In case of a[i]=y++; or a[i++]=y; everything happens on different variables. There are two side effects, updating i (or y) and updating a[i] but they are done on different objects, so both examples are well-defined.
The C standard (C11 draft) says the following about the postfix ++ operator:
(6.5.2.4.2) The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]
A sequence point is defined by a point in the code where it is guaranteed that all side effects before the point have taken effect and no side effects after the point have taken effect.
There is no intermediate sequence points in the expression v[i++] = i;. Thus it is not defined whether the side effect of the expression i++ (incrementing i) takes effect before or after the right-hand side i is evaluated. Thus it is the value of the right-hand side i which is not defined in this expression.
This problem does not exist in the expression a[i++] = y; because the value of the right-hand side y is not affected by the side effect of i++.
When the expression is evaluated In the process
Which expression?
v[i++]=i;
is a statement. It consists of a toplevel assignment expression a = b, where a and b are both themselves expressions.
The left-hand expression a is itself of the form c[d], where d is another subexpression of the form d ++ and d is yet another expression, finally resolved to i.
If it helps we can write the whole thing out in pseudo-function-call style, like
assign(array_index(v, increment_and_return_old_value(i)), i);
Now, the problem is that the standard doesn't tell us whether the final value parameter i is obtained before or after i is mutated by increment_and_return_old_value(i) (or by i++).
... and then the sequence point should be introduced to judge which sub-expression is executed first?
The , in a function call parameter list isn't a sequence point. The relative order in which function parameters are evaluated is not defined (only that they must all have been evaluated before the function body is entered).
The same logic applies to the original code - the standard says there is no sequence point, so there is no sequence point.
does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.
It's not the assignment that is the problem, it is evaluating the right-hand operand to be assigned.
And, in these cases, there is no relationship between left-hand side thing being assigned to and the right-hand side value being assigned. So although we still cannot be sure which is evaluated first, it doesn't matter.
If I wrote out explicitly
int *lhs = &a[i];
int rhs = y++;
*lhs = rhs;
then reversing the first two lines would make no difference. Their relative order doesn't matter, so the lack of a defined relative order doesn't matter.
Conversely, for completeness,
int *lhs = v[i++];
int rhs = i;
*lhs = rhs;
is the original case where the order of the first two lines does matter, and the fact that it is unspecified is a problem.

Is this gcc "sequence point" UB warning valid in C?

#include <stdio.h>
int main(void)
{
char buf[100], *p = buf;
p = buf + sprintf(p=buf, "%d", 123);
return 0;
}
Using gcc 9.3.0 or 12.1.0 with -std=c17 (or c11/c99/c89) I get:
warning: operation on ‘p’ may be undefined [-Wsequence-point]
pointing to the = in p = buf +....
I see p is being assigned to twice in one expression, but there is a sequence point after the assignment in the the function call. Can this be undefined behavior?
The code was originally p += sprintf(p=buf, "%d", 123);. That got the same warning, pointing to +=. Is this the same situation? Is this UB?
The issue we are concerned about is whether p is both modified twice without sequencing, per C 2018 6.5 2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…
As explained below, GCC is wrong to complain there may be undefined behavior due to this, subject to ensuring that sprintf is a function call. Per 7.1.4 1, any function declared in a standard library header may additionally be implemented as a function-like macro. This means that sprintf may be macro-replaced in p = buf + sprintf(p=buf, "%d", 123);, yielding some expression in which there is no sequencing between p=buf and larger p = buf +…. However, GCC complains even when this possibility is suppressed by enclosing sprintf in parentheses, using p = buf + (sprintf)(p=buf, "%d", 123);.
The two accesses of p in question are its updates performed as side effects of the assignments. An assignment evaluates its left operand for its lvalue, evaluates its right operand, and, as a side effect, updates the object referred to by its left operand. The first two evaluations are unsequenced, but evaluating the left operand for its lvalue is neither a side effect on the referenced object nor a value computation using its value, so we are unconcerned with it. (Evaluating p for its lvalue is trivial; p refers to the object named “p”. Evaluating an expression for its lvalue is more complicated in expressions such as q[3+f(x)], which must evaluate the subscript expression and, if q is a pointer rather than an array, retrieve its value. Then a final address is calculated, which serves for the lvalue of the object being assigned to.)
For the assignment, C 2018 6.5.16 3 says:
… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands…
This means the update of p in the larger assignment is sequenced after the value computations of the operands, but it does not tell us the update is sequenced after the update in p=buf.
However, p=buf appears in the arguments of sprintf. C 2018 6.5.2.2 10 tells us:
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call…
Per 5.1.2.3 3, “… The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B…
There is a problem here. We can easily take A as p=buf, and there is a sequence point between this A and, say, the call to sprintf, and therefore the side effect associated with A is sequenced before the call to sprintf. However, what do we use for B to ensure that the side effect of the larger assignment p = buf + … is after the sequence point? We cannot take B to be this larger assignment itself because then the sequence point would not be between A and B; it would be after A but somewhere inside B.
My interpretation is this is a defect in the wording of the standard. If we need to literally identify some entirely separate expression B in order to apply this rule about sequence points, then the presence of a sequence point before the function call loses much of its effect. It would not make much sense to say there is a sequence point between the evaluations of the arguments and literally the “actual call” since the “actual call” is not an expression. So the wording about sequence points between expressions must be intended to apply to parts of expressions or things done in the course of evaluating expressions.
If sprintf were an ordinary routine instead of a library routine, this would not be a problem. In an ordinary routine, there are expressions inside the function, and the sequence point means our A (p=buf) is sequenced before those expressions, and those expressions are in turn sequenced before the update of p for the larger p = buf + … because they are part of the right operand of =, whose evaluation is sequenced before the side effect of updating p. However, C library routines are specified “holistically,” as special routines that perform stated effects, not as ordinary C code. Nonetheless, my interpretation is that the argument p=buf is intended to be completed before the function call and as part of the right operand of the larger p = buf +… and hence its side effect is sequenced before the side effect in that larger assignment.

Does a[a[0]] = 1 produce undefined behavior?

Does this C99 code produce undefined behavior?
#include <stdio.h>
int main() {
int a[3] = {0, 0, 0};
a[a[0]] = 1;
printf("a[0] = %d\n", a[0]);
return 0;
}
In the statement a[a[0]] = 1; , a[0] is both read and modified.
I looked n1124 draft of ISO/IEC 9899. It says (in 6.5 Expressions):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
However, I feel it strange. Does this actually produce undefined behavior?
(I also want to know about this problem in other ISO C versions.)
the prior value shall be read only to determine the value to be stored.
This is a bit vague and caused confusion, which is partly why C11 threw it out and introduced a new sequencing model.
What it is trying to say is that: if reading the old value is guaranteed to occur earlier in time than writing the new value, then that's fine. Otherwise it is UB. And of course it is a requirement that the new value be computed before it is written.
(Of course the description I have just written will be found by some to be more vague than the Standard text!)
For example x = x + 5 is correct because it is not possible to work out x + 5 without first knowing x. However a[i] = i++ is wrong because the read of i on the left hand side is not required in order to work out the new value to store in i. (The two reads of i are considered separately).
Back to your code now. I think it is well-defined behaviour because the read of a[0] in order to determine the array index is guaranteed to occur before the write.
We cannot write until we have determined where to write. And we do not know where to write until after we read a[0]. Therefore the read must come before the write, so there is no UB.
Someone commented about sequence points. In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.
Does this C99 code produce undefined behavior?
No. It will not produce undefined behavior. a[0] is modified only once between two sequence points (first sequence point is at the end of initializer int a[3] = {0, 0, 0}; and second is after the full expression a[a[0]] = 1).
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
An object can be read more than once to modify itself and its a perfectly defined behavior. Look at this example
int x = 10;
x = x*x + 2*x + x%5;
Second statement of the quote says:
Furthermore, the prior value shall be read only to determine the value to be stored.
All the x in the above expression is read to determine the value of object x itself.
NOTE: Note that there are two parts of the quote mentioned in the question. First part says: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression., and
therefore the expression like
i = i++;
comes under UB (Two modifications between previous and next sequence points).
Second part says: Furthermore, the prior value shall be read only to determine the value to be stored., and therefore the expressions like
a[i++] = i;
j = (i = 2) + i;
invoke UB. In both expressions i is modified only once between previous and next sequence points, but the reading of the rightmost i do not determine the value to be stored in i.
In C11 standard this has been changed to
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
In expression a[a[0]] = 1, there is only one side effect to a[0] and the value computation of index a[0] is sequenced before the value computation of a[a[0]].
C99 presents an enumeration of all the sequence points in annex C. There is one at the end of
a[a[0]] = 1;
because it is a complete expression statement, but there are no sequence points inside. Although logic dictates that the subexpression a[0] must be evaluated first, and the result used to determine to which array element the value is assigned, the sequencing rules do not ensure it. When the initial value of a[0] is 0, a[0] is both read and written between two sequence points, and the read is not for the purpose of determining what value to write. Per C99 6.5/2, the behavior of evaluating the expression is therefore undefined, but in practice I don't think you need to worry about it.
C11 is better in this regard. Section 6.5, paragraph (1) says
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
Note in particular the second sentence, which has no analogue in C99. You might think that would be sufficient, but it isn't. It applies to the value computations, but it says nothing about the sequencing of side effects relative to the value computations. Updating the value of the left operand is a side effect, so that extra sentence does not directly apply.
C11 nevertheless comes through for us on this one, as the specifications for the assignment operators provide the needed sequencing (C11 6.5.16(3)):
[...] The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(In contrast, C99 just says that updating the stored value of the left operand happens between the previous and next sequence points.) With sections 6.5 and 6.5.16 together, then, C11 gives a well-defined sequence: the inner [] is evaluated before the outer [], which is evaluated before the stored value is updated. This satisfies C11's version of 6.5(2), so in C11, the behavior of evaluating the expression is defined.
The value is well defined, unless a[0] contains a value that is not a valid array index (i.e. in your code is not negative and does not exceed 3). You could change the code to the more readable and equivalent
index = a[0];
a[index] = 1; /* still UB if index < 0 || index >= 3 */
In the expression a[a[0]] = 1 it is necessary to evaluate a[0] first. If a[0] happens to be zero, then a[0] will be modified. But there is no way for a compiler (short of not complying with the standard) to change order of evaluations and modify a[0] before attempting to read its value.
A side effect includes modification of an object1.
The C standard says that behavior is undefined if a side effect on object is unsequenced with a side effect on the same object or a value computation using the value of the same object2.
The object a[0] in this expression is modified (side effect) and it's value (value computation) is used to determine the index. It would seem this expression yields undefined behavior:
a[a[0]] = 1
However the text in assignment operators in the standard, explains that the value computation of both left and right operands of the operator =, is sequenced before the left operand is modified3.
The behavior is thus defined, as the first rule1 isn't violated, because the modification (side effect) is sequenced after the value computation of the same object.
1 (Quoted from ISO/IEC 9899:201x 5.1.2.3 Program Exectution 2):
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.
2 (Quoted from ISO/IEC 9899:201x 6.5 Expressions 2):
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
3 (Quoted from ISO/IEC 9899:201x 6.5.16 Assignment operators 3):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.

What makes C standard so difficult to determine the sequence point? [duplicate]

What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)

I can not understand some sentences in C99

In C99 6.5 says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored
What does "Furthermore, the prior value shall be read only to determine the value to be stored" mean? In C99, why a[i++] = 1 is undefined behavior?
a[i++] = 1 is defined (unless it has other reasons to be undefined than the sequencing of side-effects: out of bound access, or uninitialized i).
You mean a[i++] = i, which is undefined behavior because it reads i between the same sequence points as i++, which change it.
The “Furthermore, the prior value shall be read only to determine the value to be stored” part means that i = i + 1; is allowed, although it reads from i and modifies i.
On the other hand, a[i] = (i=1); isn't allowed, because despite writing to i only once, the read from i is not for computing the value being stored.
The "prior value shall be read only to determine the value to be stored" wording is admittedly counterintuitive; why should the purpose for which a value is read matter?
The point of that sentence is to impose a requirement for which results depend on which operations.
I'll steal examples from Pascal's answer.
This:
i = i + 1;
is perfectly fine. i is read and written in the same expression, with no intervening sequence point, but it's ok because the write cannot occur until after the read has completed. The value to be stored cannot be computed until the expression i + 1, and its subexpression i, have been completely evaluated. (And i + 1 has no side effects that might be delayed until after the write.) That dependency imposes a strict ordering: the read must be completed before the write can begin.
On the other hand, this:
a[i] = (i=1);
has undefined behavior. The subexpression a[i] reads the value of i, and the subexpression i=1 writes the value of i. But the value to be stored in i by the write does not depend on the evaluation that reads i on the left hand side, and so the ordering of the read and the write are not defined. The "value to be stored" is 1; the read of i in a[i] does not determine that value.
I suspect this confusion is why the 2011 revision of the ISO C standard (available in draft form as N1570) re-worded that section. The standard still has the concept of sequence points, but 6.5p2 now says:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings.
And paragraph 1 states explicitly what was only implicitly assumed in C99:
The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator.
Section 5.1.2.3 paragraph 2 explains the sequenced before and sequenced after relationships.

Resources