Does x = i++ in C have a defined behavior and why? - c

According to FAQ, i = i++ is a undefined behaviour in C because this statement has only one sequence point (full expression), and in this statement i has been changed twice (side effect of i++ and =) so it is an undefined behaviour.
Question 1 is do I get that correctly? Or do I misunderstand why i = i++ is undefined?
The second question is that is x = i++ an valid expression?
I guess it is valid and the value of x will always be the origin value of i. Because although there is only one sequence point in this statement, but both x and i are modified only once, and i++ has a higher precedence, which means it should be valid and x++ will always done before the assignment, make x equals to the origin value of x. Is that correct?

Yes, at least as of C++03. I believe C++11 changes this somewhat, but I can't get my hands on a copy of that standard to check.
Because you modify x once, and i once. There's no multiple writes to a single variable without an intervening sequence point.

Related

Does the statement `int val = (++i > ++j) ? ++i : ++j;` invoke undefined behavior?

Given the following program:
#include <stdio.h>
int main(void)
{
int i = 1, j = 2;
int val = (++i > ++j) ? ++i : ++j;
printf("%d\n", val); // prints 4
return 0;
}
The initialization of val seems like it could be hiding some undefined behavior, but I don't see any point at which an object is either modified more than once or modified and used without a sequence point in between. Could someone
either correct or corroborate me on this?
The behavior of this code is well defined.
The first expression in a conditional is guaranteed to be evaluated before either the second expression or the third expression, and only one of the second or third will be evaluated. This is described in section 6.5.15p4 of the C standard:
The first operand is evaluated; there is a sequence point
between its evaluation and the evaluation of the second or third
operand (whichever is evaluated). The second operand is evaluated
only if the first compares unequal to 0; the third operand is
evaluated only if the first compares equal to 0; the result is
the value of the second or third operand (whichever is
evaluated), converted to the type described below.
In the case of your expression:
int val = (++i > ++j) ? ++i : ++j;
++i > ++j is evaluated first. The incremented values of i and j are used in the comparison, so it becomes 2 > 3. The result is false, so then ++j is evaluated and ++i is not. So the (again) incremented value of j (i.e. 4) is then assigned to val.
too late, but maybe useful.
(++i > ++j) ? ++i : ++j;
In the document ISO/IEC 9899:201xAnnex C(informative)Sequence points we find that there is a sequence point
Between the evaluations of the first operand of the conditional ?: operator and whichever of the second and third operands is evaluated
In order to be well defined behavior one must not modify 2 times (via side-effects) the same object between 2 sequence points.
In your expression the only conflict that could appear would be between the first and second ++i or ++j.
At every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine (this is what you would compute on paper, like on a turing machine).
Quote from 5.1.2.3p3 Program execution
The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.
When you have side-effects in your code, they are sequenced by different expressions. The rule says that between 2 sequence points you can permute these expressions as you wish.
For example. i = i++. Because none of the operators involved in this expression represent sequence points, you can permute the expressions that are side-effects as you want. The C language allows you to use any of these sequences
i = i; i = i+1; or i = i+1; i=i; or tmp=i; i = i+1 ; i = tmp; or tmp=i; i = tmp; i = i+1; or anything that provides the same result as the abstract semantics of computation asks for interpretation of this computation. The Standard ISO9899 defines the C language as abstract semantics.
There may be no UB in your program, but in the question:
Does the statement int val = (++i > ++j) ? ++i : ++j; invoke undefined behavior?
The answer is yes. Either or both of the increment operations may overflow, since i and j are signed, in which case all bets are off.
Of course this doesn't happen in your full example because you've specified the values as small integers.
I was going to comment on #Doug Currie that signed integer overflow was a tidbit too far fetched, although technically correct as answer. On the contrary!
On a second thought, I think Doug's answer is not only correct, but assuming a not entirely trivial three-liner as in the example (but a program with maybe a loop or such) should be extended to a clear, definite "yes". Here's why:
The compiler sees int i = 1, j = 2;, so it knows that ++i will be equal to j and thus cannot possibly be larger than j or even ++j. Modern optimizers see such trivial things.
Unless of course, one of them overflows. But the optimizer knows that this would be UB, and therefore assumes that, and optimizes according to, it will never happen.
So the ternary operator's condition is always-false (in this easy example certainly, but even if invoked repeatedly in a loop this would be the case!), and i will only ever be incremented once, whereas j will always be incremented twice. Thus not only is j always larger than i, it even gains at every iteration (until overflow happens, but this never happens per our assumption).
Thus, the optimizer is allowed to turn this into ++i; j += 2; unconditionally, which surely isn't what one would expect.
The same applies for e.g. a loop with unknown values of i and j, such as user-supplied input. The optimizer might very well recognize that the sequence of operations only depends on the initial values of i and j. Thus, the sequence of increments followed by a conditional move can be optimized by duplicating the loop, once for each case, and switching between the two with a single if(i>j). And then, while we're at it, it might fold the loop of repeated increment-by-twos into something like (j-i)<<1 which it just adds. Or something.
Under the assumption that overflow never happens -- which is the assumption that the optimizer is allowed to make, and does make -- such a modification which may completely changes the entire sense and mode of operation of the program is perfectly fine.
Try and debug that.

What would happen if "i = i++" was not considered undefined behavior? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm having trouble understanding the difference between unspecified and undefined behavior. I think trying to understand some examples would be useful. For instance, x = x++. The problem with this assignment is that:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
This violates a shall rule, but does not explicitly invoke undefined behavior, but it involves UB according to:
The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.
Assuming none of these rules existed and there are no other rules that "invalidate" x = x++. The value of x would then be unspecified, right?
The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.
Edit: As pointed out by P.W, there is a somewhat related, well-received, version of this question for C++: What made i = i++ + 1; legal in C++17?.
I'm having trouble understanding the difference between unspecified and undefined behavior.
Then let's start with the definitions of those terms from the Standard:
undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this
International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
(C2011, 3.4.3)
unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen
in any instance
EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.
(C2011, 3.4.4)
You remark that
The doubt arised because sometimes it is argued that things in C are
UB by "default" are only valid you can justify that the construction
is valid.
It is perhaps over-aggrandizing that to call it an argument, as if there were some doubt about its validity. In truth, it reflects explicit language from the standard:
If a ''shall'' or ''shall not'' requirement that appears outside of a
constraint or runtime- constraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words ''undefined behavior'' or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe ''behavior
that is undefined''.
(C2011, 4/2; emphasis added)
When you posit
Assuming none of these rules existed and there are no other rules that
"invalidate" x = x++.
, that doesn't necessarily change anything. In particular, removing the explicit rule that the order of evaluation of the operands is unspecified does not make the order specified. I'd be inclined to argue that the order remains unspecified, but the alternative is that the behavior would be undefined. The primary purpose served by explicitly saying it's unspecified is to sidestep that question.
The rule explicitly declaring UB when an object is modified twice between sequence points is a little less clear, but falls in the same boat. One could argue that the standard still did not define behavior for your example case, leaving it undefined. I think that's a bit more of a stretch, but that's exactly why it is useful to have an explicit rule, one way or the other. It would be possible to define behavior for your case -- Java does, for example -- but C chooses not to do, for a variety of technical and historical reasons.
The value of x would then be unspecified, right?
That's not entirely clear.
Please understand, too, that the various provisions of the standard for the most part do not stand alone. They are designed to work together, as a (mostly) coherent whole. Removing or altering random provisions has considerable risk of producing inconsistencies or gaps, leaving it difficult to reason about the result.
Modern C11/C17 has changed the text, but it has pretty much the same meaning. C17 6.5/2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
There are several slightly different issues here, mixed into one:
Between sequence points, x is written to (side effect) more than once. This is UB as per the above.
Between sequence points, the expression contains at least one side effect and there is a value computation of the same variable not related to which value to be stored. This is also UB as per the above.
In the expression x = x++, the evaluation of the operand x is not sequenced in relation to the operand x++. The evaluation order is unspecified behavior as per C17 6.5.16.
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
If not for the first cited part labelling this UB, then we still wouldn't know if the x++ would be sequenced before or after the evaluation of the left x operand, so it is hard to reason about how this could become "just unspecified behavior".
C++17 actually fixed this part, making it well-defined there, unlike in C or earlier C++ versions. They did so by defining the sequence order (C++17 8.5.18):
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
I don't see how there can be any middle-ground here; either the expression is undefined or it is well-defined.
Unspecified behavior is deterministic behavior which we cannot know or assume anything about. But unlike undefined behavior, it won't cause crashes and random program behavior.
A good example is a() + b(). We can't know which function that will be executed first - the program doesn't even have to be consistent if the same line appears later on in the same program. But we can know that both functions will be executed, one before the other.
Unlike x = a() + b() + x++; which is undefined behavior and we can't assume anything about it. One, both or none of the functions might be executed, in any order. The program might crash, produce incorrect results, produce seemingly correct results or do nothing at all.
There have been instances in other programming languages when a previously undefined behavior has become defined in a later standard. One instance I can remember is in C++ where what was undefined behavior in C++11 became well defined in C++17.
i = i++ + 1; // the behavior is undefined in C++11
i = i++ + 1; // the behavior is well-defined in C++17. The value of i is incremented
There has been a well received question on this topic.
What made this well defined is a guarantee in the C++17 standard that
The right operand is sequenced before the left operand.
So in a sense, it is upto the standards committee people to change the standard and provide strong guarantees to make it well defined.
But I do not think that something as simple as x = x++; will be made unspecified. It's will either be undefined or well-defined.
The problem seems that it cannot be properly defined what i= i++; would mean:
Interpretation 1:
int i1= i;
int i2= i1+1;
i = i2;
i = i1;
In this interpretation the value of i is retrieved and 1 is added (i2), then this i2 is saved to i but the original i in i1 is further used in the assignment (because here the ++ is interpreted to apply to the value after it has been used) and so i is unchanged.
Interpretation 2:
int i1= i;
i1= i1+1;
i= i1;
int i2= i;
i= i2;
In this interpretation the i++ is performed first (and modifies i) and now the modified i is retrieved again and used in the assignment (so i has the incremented value).
Interpretation 3:
int i1= i;
i = i1;
int i2= i1+1;
i= i2;
In this interpretation first the assignment of i to i is executed and then i is incremented.
To me, all these three interpretations are correct, and there could even be a few more interpretations, but they each do something different. Hence the standard could/did not define it and which interpretation a compiler uses is up to the compiler builder and as a result which behavior a compiler exhibits is undefined: undefined behavior.
(A compiler could even generate a jmp toTheMoon instruction or ignore the whole statement.)
The order of evaluation and application of the side effect of ++ is left unspecified - the language standard does not mandate left-to-right or right-to-left order (for arithmetic operators, anyway). Consider the well-defined expression a = b++ * ++c. The expressions a, b++, and ++c may be evaluated in any order. Similarly, the side effects to b and c may be applied immediately after evaluation, or deferred until just before the next sequence point, or anywhere in between. All that matters is that the result of b * (c+1) is computed before being assigned to a. The following is one perfectly legal evaluation:
tmp <- c + 1;
a = b * tmp;
c <- c + 1
b <- b + 1
So is this:
c <- c + 1
a <- b * c
b <- b + 1
So is this:
tmp1 <- b
b <- b + 1
tmp2 <- c + 1
a <- tmp1 * tmp2
c <- c + 1
What matters is that, no matter what order of evaluation is chosen, you will always get the same result.
x = x++ could be evaluated in either of the following ways, depending on when the side effect is applied:
Option 1 Option 2
-------- --------
tmp <- x tmp <- x
x <- x + 1 x <- tmp
x <- tmp x <- x + 1
The problem is that the two methods give different results. Other, completely different methods may be available based on the instruction set that give different results than these two.
The language standard doesn't mandate what to do when an expression gives different results depending on the order in which it is evaluated - it doesn't place any requirements on the compiler or the runtime environment to pick either option. This is what undefined means - literally, the behavior is not defined by the language specification. You will get a result, but it's not guaranteed to be consistent, or the result you would expect.
Undefined does not mean illegal. Nor does it mean your code is guaranteed to crash. It just means that the result is not predictable or guaranteed to be consistent. An implementation doesn't even have to issue a diagnostic saying "hey, dummy, this is a bad idea."
An implementation is free to define and document a behavior left undefined by the standard (such as MSVC defining fflush on input streams). A number of compilers take advantage of certain behaviors being undefined to perform some optimizations. And some compilers do issue warnings for common mistakes like x = x++.

Why is `x-- > 0` not undefined behaviour, while `x = x--` is?

As everyone knows, this loops through zero:
while (x-- > 0) { /* also known as x --> 0 */
printf("x = %d\n", x);
}
But x = x-- yields undefined behaviour.
Both examples need some 'return' value of x--, which is not there I guess. How can it be that x-- > 0 is defined but x = x-- is not?
Because in x = x-- you're modifying the value of x twice without an intervening sequence point. So the order of operations is not defined. In x-- > 0 the value of x is modified once, and it is clearly defined that result of evaluating x-- will be the value of x before the decrement.
I don't know where you got that idea about "need some 'return' value of x--, which is not there". Firstly, it is not exactly clear what you mean. Secondly, regardless of what you mean this doesn't seem to have anything to do with the source of undefined behavior in x = x--.
x = x-- produces undefined behavior because it attempts to modify x twice without an intervening sequence point. No "need" for any "return value" is involved here.
The underlying problem with x = x-- is that it has two side-effects that occur at undefined moments in undefined order. One side-effect is introduced by the assignment operator. Another side-effect is introduced by postfix -- operator. Both side-effects attempt to modify the same variable x and generally contradict each other. This is why the behavior in such cases is declared undefined de jure.
For example, if the original value of x was 5, then your expression requires x to become both 4 (side-effect of decrement) and 5 (side-effect of assignment) at the same time. Needless to say, it is impossible for x to become 4 and 5 at the same time.
Although such a straightforward contradiction (like 4 vs 5) is not required for UB to occur. Every time you have two side-effects hitting the same variable without intervening sequence point, the behavior is undefined, even if the values these side-effects are trying to put into the variable match.
In order to understand this you need to have a basic understanding of sequence points. See this link: http://en.wikipedia.org/wiki/Sequence_point
For the = operator there is no sequence point, so there is no guarantee that the value of x will be modified before it is again assigned to x.
When you are checking the condition in the while loop x-- > 0, x-- is evaluated and the value is used in the relational operator evaluation so there is no chance of undefined behaviour because x is getting modified only once.
Just to add something to other answers, try reading this wikipedia page about sequence points.
I suggest reading https://stackoverflow.com/a/21671069/258418. If you chuck together that = is not a sequence point, and the compiler is free to interleave operations, as long as they are not separated by a sequence point from the answers linked by you, you see that i.e. the following two sequences would be legal:
load i to reg
increment i
assign reg to i
=> i has previous value of i
load i to reg
assign reg to i
increment i
=> i has value of previous value of i + 1
In general: avoid assigning (this includes modiying by pre/post ++/--) to the same variable twice in one expression.

is this c code still not of a defined output?

I was reading C Traps and Pitfalls and read that the following code may work on some implementations and won't on others due to an undefined order of = and ++. Is this still true of C?
int i = 0;
while (i < n)
y[i] = x[i++];
If so, that's really incredible.
Nothing incredible. Pretty defined undefined behavior. Read more about sequence points.
Just writing as:
int i = 0;
while (i < n)
{
y[i] = x[i];
i++;
}
is safer and more readable.
The postfix ++ has a result and a side effect. The result is the current value of the operand. The side effect is that the operand gets incremented by one. Where the problem comes in is that the side effect doesn't have to be applied immediately after the expression has been evaluated; it only has to be applied before the next sequence point.
From the C language standard (n1256):
6.5 Expressions
...
2 Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression.72) Furthermore, the prior value
shall be read only to determine the value to be stored.73)
...
72) A floating-point status flag is not an object and can be set more than once within an expression.
73) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
It is not particularly suprising that this code has undefined behaviour, because it's semantically ambiguous: In y[i], which value of i is intended? The value before the increment, or after? (Bear in mind that the = operator does not specify that one side is evaluated before the other)
Yes, it's still UB.
§ 1.9p7 At certain specified
points in the execution sequence
called sequence points, all side
effects of previous evaluations shall
be complete and no side effects of
subsequent evaluations shall have
taken place. (§1.9/7)
Well, i++ means "increment i after using its value", so it I think it's correct (well, I changed idea reading the other posts). Rather I would do:
while((i++) < n)
y[i] = x[i];

why "f = f++" is unsafe in c?

I read about 'side effect' from this website:
but still not understand why f = f++ considered unsafe ?
Can somebody explain?
The problem is Sequence Points. There are two operations in this statment with no sequence point, so there is no defined order to the statement, is the assignment happening first or the increment?
Nothing says it's unsafe, it's just undefined, which means that different implementations may have different results or it may format your hard drive...
Using x and x++ (or ++x) within the same statement is undefined behaviour in C. The compiler is free to do whatever it wants: either increment x before doing the assignment, or after that. Taking Ólafur's code, it might yield f == 5 or f == 6, depending on your compiler.
The article at the (cleaned up) link you provided gives the answer. "C makes almost no promise that side effects will occur in a predictable order within a single expression." This means that you don't know in what order the = and the ++ will occur. It's compiler dependent.
If you follow the link from that article to the article about sequence points on the same site, you'll see that the compiler can optimize what and when it writes values back from the registers into the variables.
From the standard
6.5 (2) If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.74)
74) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
I support Arthur's answer in this respect. Though the implementation of the post incrementing operator i.e f++ is confusing, it is not considered unsafe. U should first understand how the compiler interprets it. whether it will increment f after it encounters a sentence termination (;) or immediately after using the value of f.

Resources