Take the following:
int a(void) {
puts("a");
return 0;
}
int b(void) {
puts("b");
return 1;
}
int c(void) {
puts("c");
return 2;
}
int d(void) {
puts("d");
return 3;
}
Will the following have predictable behavior?
int arr[4][4][4][4];
arr[a()][b()][c()][d()] = 1;
Is it guaranteed to print in this order:
a
b
c
d
I am aware that constructs such as the following are invalid:
int i;
i = i++;
This is because = is an unsequenced operator, so whether i or i++ is evaluated first is undefined. It is undefined behavior to access and modify a single object before another sequence point.
Put another way, is the following valid:
int i = 0, arr[4][4][4][4];
arr[i++][i++][i++][i++] = 1;
Or does it invoke undefined behavior due to unsequenced modification and access to i?
According to the C standard, is there a defined sequence point between each successive [] while indexing a multidimensional array?
To be clear, neither of these examples have to do with precedence, the placement of implicit parenthesis, the order in which the operators operate on their operands. The question is about sequencing, the order in which the operands themselves are evaluated.
Will the following have predictable behavior?
int arr[4][4][4][4];
arr[a()][b()][c()][d()] = 1;
No.
While the evaluation of the array elements will be evaluated from left to right, as one is an operand to the next, there is no guarantee that the array indexes themselves will be evaluated from left to right.
To be more specific, arr[a()] is evaluated before arr[a()][b()], which is evaluated before arr[a()][b()][c()], which is evaluated before arr[a()][b()][c()][d()]. However, a(), b(), c(), and d() may be evaluated in any order.
Section 6.5p3 of the C standard regarding expressions states:
The grouping of operators and operands is indicated by the syntax.
Except as specified later, side effects and value computations of
subexpressions are unsequenced
Sections 6.5.2.1 regarding array subscripting makes no mention of sequencing of the operands E1[E2], although it does state that the prior expression is exactly equivalent to (*((E1)+(E2))). Then, looking at section 6.5.3.2 regarding the indirection operator * and section 6.5.6 regarding the additive operator +, neither make any mention of the evaluation of their operands being sequenced in any way. So 6.5p3 applies, and the functions a, b, c, and d may be called in any order.
For the same reasons, this:
arr[i++][i++][i++][i++] = 1;
Triggers undefined behavior since the evaluation of the array indices are unsequenced with relation to each other, and you have multiple side effects on the same object without a sequence point.
The indices of a multidimensional array access are not guaranteed to evaluate in any particular order. A demonstration (courtesy of yano) shows the functions being called from left to right, but selecting a different compiler in Godbolt evaluates them right to left.
Upon further investigation, the second code example causes a warning with Clang:
warning: multiple unsequenced modifications
to 'i' [-Wunsequenced]
arr[i++][i++][i++][i++] = 1;
^ ~~
The multidimensional array access can be broken down into a series of (array)[index] where array is a higher dimension of the multidimensional array (arr itself is the highest dimension) and index is the index expression, such as a function call or i++ expression.
The Standard holds that lhs[rhs] is equivalent to *((lhs)+(rhs)), hence it is indeterminate whether any given index, or the array it is indexing, is evaluated first, since the + operator is unsequenced. In all cases where array is not arr itself, evaluating array involves evaluating its index in an even higher dimension.
Therefore, the order in which the indices of a multidimensional array access are evaluated is indeterminate.
Given the constructs
int x; // At file scope
... and then within some function
arr[f()][x] += 1;
arr[x][f()] += 2;
it would not be unusual for a compiler to process each line by performing the call to f() prior to the read of x, regardless of whether the function call was the first or second index. While the actual addition of the inner index to the sub-array pointer may not be possible until after the sub-array address is computed, but a compiler may compute any or all of the array subscripts before starting work on address calculations.
Related
When I was looking for the expression v[i++]=i; why it is to define the behavior, I suddenly saw an explanation because the expression exists between two sequence points in the program, and the c standard stipulates that in the two sequence points The order of occurrence of the side effects is uncertain, so when the expression is run in the program, it is not sure whether the ++ operator is operated first or the = operator is operated first. I am puzzled by this. When the expression is evaluated In the process, shouldn't the priority be used to judge first, and then the sequence point should be introduced to judge which sub-expression is executed first? Am I missing something?
When user AnT stands with Russia explained it like this, does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.
The reason v[i++]=i; is undefined behavior is because the variable i is both read and written in the same expression without sequencing.
Expressions such as a[i]=y++ and a[i++]=y do not exhibit undefined behavior because no variable is both read and written in the expression without sequencing.
The = operator does however ensure that both of its operands are fully evaluated before the side effect of assigning to the left side. Specifically, a[i] is evaluated to be an lvalue designating the ith element of the array a, and y++ is evaluated to be the current value of y.
The specific rule in the C standard is C 2018 6.5 2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
The first sentence is the critical one here. First, consider v[i] = i++;. Here, the i in v[i] computes the value of i, and the i++ both computes the value of i and increments the stored value of i. Computing the value of i is a value computation of i. Incrementing the stored value of i is a side effect. To determine whether the behavior of v[i] = i++; is undefined, we ask whether the side effect is unsequenced relative to any other side effect on i or to a value computation on i.
There is no other side effect on i, so it is not unsequenced relative to any other side effect.
There is a value computation in i++, but the side effect and this value computation are sequenced by the specification of the postfix ++ operator. C 2018 6.5.2.4 2 says:
… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…
So we know the computation of the value of i in i++ is sequenced before the side effect of incrementing the stored value.
Now we consider the value computation of the i in v[i]. The ++ specification does not tell us about this, so let’s consider the assignment operator, =. The specification of assignment does say something about sequencing, in C 2018 6.5.16 3:
… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
The first sentence tells us the update of v[i] is sequenced after the value computations of the left and right operands. But it does not tell us anything about the side effect in ++ relative to the value computation of i in v[i].
Therefore, the value computation of i in v[i] is unsequenced relative to the side effect on i in i++, so the behavior of the statement is not defined by the C standard.
In a[i] = y++; we have:
A value computation on i in a[i].
A value computation on y in y++.
An update of the stored value of y in y++.
A value computation on a in a[i].
An update of the stored value of a[i] in a[i] = ….
The only object that is updated twice or that is both updated and evaluated is y, and we know from above that the value computation on y in y++ is sequenced before the update of y. So this statement does not contain any side effect that is unsequenced relative to another side effect or value cmputation on the same object. So its behavior is not undefined by the rule in C 2018 6.5 2.
Similarly, in a[i++] = y;, we have:
A value computation on i in a[i++].
An update of the stored value of i in i++.
A value computation on y.
A value computation on a in a[i].
An update of the stored value of a[i] in a[i++] = ….
Again, there is only one object with two operations on it, and those operations are sequenced. The behavior is not undefined by the rule in C 2018 6.5 2.
Note
In the above, we assume neither a nor v is a pointer such that a[i] or v[i] would be i or y. If instead we consider this code:
int y = 3;
int *a = &y;
int i = 0;
a[i] = y++;
Then the behavior is undefined because a[i] is y, so the code updates y twice, once for the assignment a[i] = … and once for y++, and these updates are unsequenced. The specification of assignment says the update to the left operand is sequenced after the value computation of the result (which is the value of the right side of the assignment), but the increment for ++ is a side effect, not part of the value computation. So the two updates are unsequenced, and the behavior is not defined by the C standard.
An attempt to explain the "standardese" terms plainly:
The standard says (C17 6.5) that in an expression, a side effect of a variable may not occur in an unsequenced order in relation to a value computation of that same object.
To make sense of these strange terms:
Side effect = writing to a variable or perform a read or write access to a volatile variable.
Value computation = reading the value from memory.
Unsequenced = The order between accesses/evaulations is not specified nor well-defined. C has the concept of sequence points, which are certain points in the program that when reached, previous side effects must have been evaluated. For example, a ; introduces a sequence point. Two parts of an expression are unsequenced in relation to each other when the order of evaluation of each part is not well-defined before the next sequence point. (A complete list of all sequence points can be found in C17 Annex C.)
So when translated from standardese to English, v[i++]=i; has undefined behavior since i is written to in an unspecified order related to the other read of i in the same expression. How do we know that?
The assignment operator = says that (6.5.16) "the evaluations of the operands are unsequenced", refering to the left and right operands of =.
The postfix ++ operator says that (6.5.2.4) "As a side effect, the
value of the operand object is incremented" and "The value computation of the result is sequenced before the side effect of updating the stored value of the operand". In practice meaning that i is first read and the ++ is applied later, though before the next sequence point, in this case the ;.
In case of a[i]=y++; or a[i++]=y; everything happens on different variables. There are two side effects, updating i (or y) and updating a[i] but they are done on different objects, so both examples are well-defined.
The C standard (C11 draft) says the following about the postfix ++ operator:
(6.5.2.4.2) The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]
A sequence point is defined by a point in the code where it is guaranteed that all side effects before the point have taken effect and no side effects after the point have taken effect.
There is no intermediate sequence points in the expression v[i++] = i;. Thus it is not defined whether the side effect of the expression i++ (incrementing i) takes effect before or after the right-hand side i is evaluated. Thus it is the value of the right-hand side i which is not defined in this expression.
This problem does not exist in the expression a[i++] = y; because the value of the right-hand side y is not affected by the side effect of i++.
When the expression is evaluated In the process
Which expression?
v[i++]=i;
is a statement. It consists of a toplevel assignment expression a = b, where a and b are both themselves expressions.
The left-hand expression a is itself of the form c[d], where d is another subexpression of the form d ++ and d is yet another expression, finally resolved to i.
If it helps we can write the whole thing out in pseudo-function-call style, like
assign(array_index(v, increment_and_return_old_value(i)), i);
Now, the problem is that the standard doesn't tell us whether the final value parameter i is obtained before or after i is mutated by increment_and_return_old_value(i) (or by i++).
... and then the sequence point should be introduced to judge which sub-expression is executed first?
The , in a function call parameter list isn't a sequence point. The relative order in which function parameters are evaluated is not defined (only that they must all have been evaluated before the function body is entered).
The same logic applies to the original code - the standard says there is no sequence point, so there is no sequence point.
does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.
It's not the assignment that is the problem, it is evaluating the right-hand operand to be assigned.
And, in these cases, there is no relationship between left-hand side thing being assigned to and the right-hand side value being assigned. So although we still cannot be sure which is evaluated first, it doesn't matter.
If I wrote out explicitly
int *lhs = &a[i];
int rhs = y++;
*lhs = rhs;
then reversing the first two lines would make no difference. Their relative order doesn't matter, so the lack of a defined relative order doesn't matter.
Conversely, for completeness,
int *lhs = v[i++];
int rhs = i;
*lhs = rhs;
is the original case where the order of the first two lines does matter, and the fact that it is unspecified is a problem.
Does this C99 code produce undefined behavior?
#include <stdio.h>
int main() {
int a[3] = {0, 0, 0};
a[a[0]] = 1;
printf("a[0] = %d\n", a[0]);
return 0;
}
In the statement a[a[0]] = 1; , a[0] is both read and modified.
I looked n1124 draft of ISO/IEC 9899. It says (in 6.5 Expressions):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
However, I feel it strange. Does this actually produce undefined behavior?
(I also want to know about this problem in other ISO C versions.)
the prior value shall be read only to determine the value to be stored.
This is a bit vague and caused confusion, which is partly why C11 threw it out and introduced a new sequencing model.
What it is trying to say is that: if reading the old value is guaranteed to occur earlier in time than writing the new value, then that's fine. Otherwise it is UB. And of course it is a requirement that the new value be computed before it is written.
(Of course the description I have just written will be found by some to be more vague than the Standard text!)
For example x = x + 5 is correct because it is not possible to work out x + 5 without first knowing x. However a[i] = i++ is wrong because the read of i on the left hand side is not required in order to work out the new value to store in i. (The two reads of i are considered separately).
Back to your code now. I think it is well-defined behaviour because the read of a[0] in order to determine the array index is guaranteed to occur before the write.
We cannot write until we have determined where to write. And we do not know where to write until after we read a[0]. Therefore the read must come before the write, so there is no UB.
Someone commented about sequence points. In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.
Does this C99 code produce undefined behavior?
No. It will not produce undefined behavior. a[0] is modified only once between two sequence points (first sequence point is at the end of initializer int a[3] = {0, 0, 0}; and second is after the full expression a[a[0]] = 1).
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
An object can be read more than once to modify itself and its a perfectly defined behavior. Look at this example
int x = 10;
x = x*x + 2*x + x%5;
Second statement of the quote says:
Furthermore, the prior value shall be read only to determine the value to be stored.
All the x in the above expression is read to determine the value of object x itself.
NOTE: Note that there are two parts of the quote mentioned in the question. First part says: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression., and
therefore the expression like
i = i++;
comes under UB (Two modifications between previous and next sequence points).
Second part says: Furthermore, the prior value shall be read only to determine the value to be stored., and therefore the expressions like
a[i++] = i;
j = (i = 2) + i;
invoke UB. In both expressions i is modified only once between previous and next sequence points, but the reading of the rightmost i do not determine the value to be stored in i.
In C11 standard this has been changed to
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
In expression a[a[0]] = 1, there is only one side effect to a[0] and the value computation of index a[0] is sequenced before the value computation of a[a[0]].
C99 presents an enumeration of all the sequence points in annex C. There is one at the end of
a[a[0]] = 1;
because it is a complete expression statement, but there are no sequence points inside. Although logic dictates that the subexpression a[0] must be evaluated first, and the result used to determine to which array element the value is assigned, the sequencing rules do not ensure it. When the initial value of a[0] is 0, a[0] is both read and written between two sequence points, and the read is not for the purpose of determining what value to write. Per C99 6.5/2, the behavior of evaluating the expression is therefore undefined, but in practice I don't think you need to worry about it.
C11 is better in this regard. Section 6.5, paragraph (1) says
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
Note in particular the second sentence, which has no analogue in C99. You might think that would be sufficient, but it isn't. It applies to the value computations, but it says nothing about the sequencing of side effects relative to the value computations. Updating the value of the left operand is a side effect, so that extra sentence does not directly apply.
C11 nevertheless comes through for us on this one, as the specifications for the assignment operators provide the needed sequencing (C11 6.5.16(3)):
[...] The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(In contrast, C99 just says that updating the stored value of the left operand happens between the previous and next sequence points.) With sections 6.5 and 6.5.16 together, then, C11 gives a well-defined sequence: the inner [] is evaluated before the outer [], which is evaluated before the stored value is updated. This satisfies C11's version of 6.5(2), so in C11, the behavior of evaluating the expression is defined.
The value is well defined, unless a[0] contains a value that is not a valid array index (i.e. in your code is not negative and does not exceed 3). You could change the code to the more readable and equivalent
index = a[0];
a[index] = 1; /* still UB if index < 0 || index >= 3 */
In the expression a[a[0]] = 1 it is necessary to evaluate a[0] first. If a[0] happens to be zero, then a[0] will be modified. But there is no way for a compiler (short of not complying with the standard) to change order of evaluations and modify a[0] before attempting to read its value.
A side effect includes modification of an object1.
The C standard says that behavior is undefined if a side effect on object is unsequenced with a side effect on the same object or a value computation using the value of the same object2.
The object a[0] in this expression is modified (side effect) and it's value (value computation) is used to determine the index. It would seem this expression yields undefined behavior:
a[a[0]] = 1
However the text in assignment operators in the standard, explains that the value computation of both left and right operands of the operator =, is sequenced before the left operand is modified3.
The behavior is thus defined, as the first rule1 isn't violated, because the modification (side effect) is sequenced after the value computation of the same object.
1 (Quoted from ISO/IEC 9899:201x 5.1.2.3 Program Exectution 2):
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.
2 (Quoted from ISO/IEC 9899:201x 6.5 Expressions 2):
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
3 (Quoted from ISO/IEC 9899:201x 6.5.16 Assignment operators 3):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)
What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)
The C99 Standard says in $6.5.2.
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
(emphasis by me)
It goes on to note, that the following example is valid (which seems obvious at first)
a[i] = i;
While it does not explicitly state what a and i are.
Although I believe it does not, I'd like to know whether this example covers the following case:
int i = 0, *a = &i;
a[i] = i;
This will not change the value of i, but access the value of i to determine the address where to put the value. Or is it irrelevant that we assign a value to i which is already stored in i? Please shed some light.
Bonus question; What about a[i]++ or a[i] = 1?
The first sentence:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
is clear enough. The language doesn't impose an order of evaluation on subexpressions unless there's a sequence point between them, and rather than requiring some unspecified order of evaluation, it says that modifying an object twice produces undefined behavior. This allows aggressive optimization while still making it possible to write code that follows the rules.
The next sentence:
Furthermore, the prior value shall be read only to determine the value to be stored
does seem unintuitive at first (and second) glance; why should the purpose for which a value is read affect whether an expression has defined behavior?
But what it reflects is that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated. The C90 and C99 standards do not state this explicitly.
A clearer violation of that sentence, given in an example in the footnote, is:
a[i++] = i; /* undefined behavior */
Assuming that a is a declared array object and i is a declared integer object (no pointer or macro trickery), no object is modified more than once, so it doesn't violate the first sentence. But the evaluation of i++ on the LHS determines which object is to be modified, and the evaluation of i on the RHS determines the value to be stored in that object -- and the relative order of the read operation on the RHS and the write operation on the LHS is not defined. Again, the language could have required the subexpressions to be evaluated in some unspecified order, but instead it left the entire behavior undefined, to permit more aggressive optimization.
In your example:
int i = 0, *a = &i;
a[i] = i; /* undefined behavior (I think) */
the previous value of i is read both to determine the value to be stored and to determine which object it's going to be stored in. Since a[i] refers to i (but only because i==0), modifying the value of i would change the object to which the lvalue a[i] refers. It happens in this case that the value stored in i is the same as the value that was already stored there (0), but the standard doesn't make an exception for stores that happen to store the same value. I believe the behavior is undefined. (Of course the example in the standard wasn't intended to cover this case; it implicitly assumes that a is a declared array object unrelated to i.)
As for the example that the standard says is allowed:
int a[10], i = 0; /* implicit, not stated in standard */
a[i] = i;
one could interpret the standard to say that it's undefined. But I think that the second sentence, referring to "the prior value", applies only to the value of an object that's modified by the expression. i is never modified by the expression, so there's no conflict. The value of i is used both to determine the object to be modified by the assignment, and the value to be stored there, but that's ok, since the value of i itself never changes. The value of i isn't "the prior value", it's just the value.
The C11 standard has a new model for this kind of expression evaluation -- or rather, it expresses the same model in different words. Rather than "sequence points", it talks about side effects being sequenced before or after each other, or unsequenced relative to each other. It makes explicit the idea that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated.
In the N1570 draft, section 6.5 says:
1 An expression is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof. The value computations of the operands
of an operator are sequenced before the value computation of the
result of the operator.
2 If a side effect on a scalar object is unsequenced relative to
either a different side effect on the same scalar object or a
value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings
of the subexpressions of an expression, the behavior is undefined
if such an unsequenced side effect occurs in any of the orderings.
3 The grouping of operators and operands is indicated by the syntax.
Except as specified later, side effects and value computations
of subexpressions are unsequenced.
Reading an object's value to determine where to store it does not count as "determining the value to be stored". This means that the only point of contention can be whether or not we are "modifying" the object i: if we are, it is undefined; if we are not, it is OK.
Does storing the value 0 into an object which already contains the value 0 count as "modifying the stored value"? By the plain English definition of "modify" I would have to say not; leaving something unchanged is the opposite of modifying it.
It is, however, clear that this would be undefined behaviour:
int i = 0, *a = &i;
a[i] = 1;
Here there can be no doubt that the stored value is read for a purpose other than determining the value to be stored (the value to be stored is a constant), and that the value of i is modified.