C compiler's pre/post incrementation evaluation in expressions [duplicate] - c

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 8 years ago.
Today I found something, that made me very anxious about my C++ or basic programming skills. The problem is C++ expression evaluation with post/pre incrementation.
Let's check this, let me say that, trivial example of code:
int a = 5;
int d = a++ + a;
As far as I expected, left and right operands of '=' sign would be calucalted independently, and the final result is (a++) 5 + (a) 5, where post-incremented 'a' has value of 6after 'd' is computed.
But, here's what I got under two popular C compilers:
MinGW: d == 11;
MSVC: d == 10;
Same situation is with:
int a = 5;
int d = a-- + a;
where compilers gave:
MinGW: d == 9; // 5 + 4 , a=4 after 'a--', before '+a'?
MSVC: d == 10; // 5 + 5 , a=4 after 'a-- + a'?
MSVC out is exact as what I expected. Question is what is really happening here? Which compiler is closer to the behaviour defined as standard?

Funny that you should ask about the "behaviour defined as standard"; in fact, both compilers adhere perfectly to the standard, since your programs invoke undefined behaviour.
In a nutshell, the operands to + (and most other binary operators) are unsequenced relative to each other: they can be evaluated in either order, and depending on a particular order (via side effects) invokes undefined behaviour.
With undefined behaviour, of course, a conforming compiler can choose to do anything, legally.

The order of execution for the expression a++ + a is unspecified by the C++ standard, so each compiler is free to evaluate the expression however it wants. Since both are compilers are correct, you need to rewrite your expression into two separate statements to get the particular behavior that you want.

Related

Rewriting a piece of code to become more readable [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
sequence points in c
(4 answers)
Closed 1 year ago.
val = n++ + arr[n];
How can I rewrite the line of code above to become more readable?
How is this code evaluated by a compiler?
This code is invalid (reason) so you should bin it.
It is better to write more lines to keep the code readable and correct than to write "hacky" complex expressions.
val = n + arr[n];
n++;
It's not a matter of readability. That's undefined behavior.
In C + is not a sequence point, therefore you can't know if n++ will be executed before or after arr[n]
Sequence points in the C standard
See the section relative to Program execution
The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B. (A summary of the sequence points is given in annex C.)
It depend on the case, You can do like this.
val = n + arr[n + 1];
n++;

What would happen if "i = i++" was not considered undefined behavior? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm having trouble understanding the difference between unspecified and undefined behavior. I think trying to understand some examples would be useful. For instance, x = x++. The problem with this assignment is that:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
This violates a shall rule, but does not explicitly invoke undefined behavior, but it involves UB according to:
The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.
Assuming none of these rules existed and there are no other rules that "invalidate" x = x++. The value of x would then be unspecified, right?
The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.
Edit: As pointed out by P.W, there is a somewhat related, well-received, version of this question for C++: What made i = i++ + 1; legal in C++17?.
I'm having trouble understanding the difference between unspecified and undefined behavior.
Then let's start with the definitions of those terms from the Standard:
undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this
International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
(C2011, 3.4.3)
unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen
in any instance
EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.
(C2011, 3.4.4)
You remark that
The doubt arised because sometimes it is argued that things in C are
UB by "default" are only valid you can justify that the construction
is valid.
It is perhaps over-aggrandizing that to call it an argument, as if there were some doubt about its validity. In truth, it reflects explicit language from the standard:
If a ''shall'' or ''shall not'' requirement that appears outside of a
constraint or runtime- constraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words ''undefined behavior'' or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe ''behavior
that is undefined''.
(C2011, 4/2; emphasis added)
When you posit
Assuming none of these rules existed and there are no other rules that
"invalidate" x = x++.
, that doesn't necessarily change anything. In particular, removing the explicit rule that the order of evaluation of the operands is unspecified does not make the order specified. I'd be inclined to argue that the order remains unspecified, but the alternative is that the behavior would be undefined. The primary purpose served by explicitly saying it's unspecified is to sidestep that question.
The rule explicitly declaring UB when an object is modified twice between sequence points is a little less clear, but falls in the same boat. One could argue that the standard still did not define behavior for your example case, leaving it undefined. I think that's a bit more of a stretch, but that's exactly why it is useful to have an explicit rule, one way or the other. It would be possible to define behavior for your case -- Java does, for example -- but C chooses not to do, for a variety of technical and historical reasons.
The value of x would then be unspecified, right?
That's not entirely clear.
Please understand, too, that the various provisions of the standard for the most part do not stand alone. They are designed to work together, as a (mostly) coherent whole. Removing or altering random provisions has considerable risk of producing inconsistencies or gaps, leaving it difficult to reason about the result.
Modern C11/C17 has changed the text, but it has pretty much the same meaning. C17 6.5/2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
There are several slightly different issues here, mixed into one:
Between sequence points, x is written to (side effect) more than once. This is UB as per the above.
Between sequence points, the expression contains at least one side effect and there is a value computation of the same variable not related to which value to be stored. This is also UB as per the above.
In the expression x = x++, the evaluation of the operand x is not sequenced in relation to the operand x++. The evaluation order is unspecified behavior as per C17 6.5.16.
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
If not for the first cited part labelling this UB, then we still wouldn't know if the x++ would be sequenced before or after the evaluation of the left x operand, so it is hard to reason about how this could become "just unspecified behavior".
C++17 actually fixed this part, making it well-defined there, unlike in C or earlier C++ versions. They did so by defining the sequence order (C++17 8.5.18):
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
I don't see how there can be any middle-ground here; either the expression is undefined or it is well-defined.
Unspecified behavior is deterministic behavior which we cannot know or assume anything about. But unlike undefined behavior, it won't cause crashes and random program behavior.
A good example is a() + b(). We can't know which function that will be executed first - the program doesn't even have to be consistent if the same line appears later on in the same program. But we can know that both functions will be executed, one before the other.
Unlike x = a() + b() + x++; which is undefined behavior and we can't assume anything about it. One, both or none of the functions might be executed, in any order. The program might crash, produce incorrect results, produce seemingly correct results or do nothing at all.
There have been instances in other programming languages when a previously undefined behavior has become defined in a later standard. One instance I can remember is in C++ where what was undefined behavior in C++11 became well defined in C++17.
i = i++ + 1; // the behavior is undefined in C++11
i = i++ + 1; // the behavior is well-defined in C++17. The value of i is incremented
There has been a well received question on this topic.
What made this well defined is a guarantee in the C++17 standard that
The right operand is sequenced before the left operand.
So in a sense, it is upto the standards committee people to change the standard and provide strong guarantees to make it well defined.
But I do not think that something as simple as x = x++; will be made unspecified. It's will either be undefined or well-defined.
The problem seems that it cannot be properly defined what i= i++; would mean:
Interpretation 1:
int i1= i;
int i2= i1+1;
i = i2;
i = i1;
In this interpretation the value of i is retrieved and 1 is added (i2), then this i2 is saved to i but the original i in i1 is further used in the assignment (because here the ++ is interpreted to apply to the value after it has been used) and so i is unchanged.
Interpretation 2:
int i1= i;
i1= i1+1;
i= i1;
int i2= i;
i= i2;
In this interpretation the i++ is performed first (and modifies i) and now the modified i is retrieved again and used in the assignment (so i has the incremented value).
Interpretation 3:
int i1= i;
i = i1;
int i2= i1+1;
i= i2;
In this interpretation first the assignment of i to i is executed and then i is incremented.
To me, all these three interpretations are correct, and there could even be a few more interpretations, but they each do something different. Hence the standard could/did not define it and which interpretation a compiler uses is up to the compiler builder and as a result which behavior a compiler exhibits is undefined: undefined behavior.
(A compiler could even generate a jmp toTheMoon instruction or ignore the whole statement.)
The order of evaluation and application of the side effect of ++ is left unspecified - the language standard does not mandate left-to-right or right-to-left order (for arithmetic operators, anyway). Consider the well-defined expression a = b++ * ++c. The expressions a, b++, and ++c may be evaluated in any order. Similarly, the side effects to b and c may be applied immediately after evaluation, or deferred until just before the next sequence point, or anywhere in between. All that matters is that the result of b * (c+1) is computed before being assigned to a. The following is one perfectly legal evaluation:
tmp <- c + 1;
a = b * tmp;
c <- c + 1
b <- b + 1
So is this:
c <- c + 1
a <- b * c
b <- b + 1
So is this:
tmp1 <- b
b <- b + 1
tmp2 <- c + 1
a <- tmp1 * tmp2
c <- c + 1
What matters is that, no matter what order of evaluation is chosen, you will always get the same result.
x = x++ could be evaluated in either of the following ways, depending on when the side effect is applied:
Option 1 Option 2
-------- --------
tmp <- x tmp <- x
x <- x + 1 x <- tmp
x <- tmp x <- x + 1
The problem is that the two methods give different results. Other, completely different methods may be available based on the instruction set that give different results than these two.
The language standard doesn't mandate what to do when an expression gives different results depending on the order in which it is evaluated - it doesn't place any requirements on the compiler or the runtime environment to pick either option. This is what undefined means - literally, the behavior is not defined by the language specification. You will get a result, but it's not guaranteed to be consistent, or the result you would expect.
Undefined does not mean illegal. Nor does it mean your code is guaranteed to crash. It just means that the result is not predictable or guaranteed to be consistent. An implementation doesn't even have to issue a diagnostic saying "hey, dummy, this is a bad idea."
An implementation is free to define and document a behavior left undefined by the standard (such as MSVC defining fflush on input streams). A number of compilers take advantage of certain behaviors being undefined to perform some optimizations. And some compilers do issue warnings for common mistakes like x = x++.

C Pointers from Past Paper [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 8 years ago.
I have another C pointers question.
Consider executing the following program:
int x[5] = {0,3,5,7,9};
int* y = &x[2];
*(y+2) = *(y--);
What values does the array x hold afterwards?
What the hell is going on with y--? I know how *(y+2) works, and understand the rest, but not how y-- ties in with the rest.
Also, the answer given is {0, 3, 5, 5, 9}.
There's no sequence point between y-- and y + 2 in *(y+2) = *(y--);, so whether y + 2 refers to &x[4] or &x[3] is unspecified. Depending on how your compiler does things, you can either get 0 3 5 5 9 or 0 3 5 7 5.
What it means that there is no sequence point between the two expressions is, in a nutshell, that it is not specified whether the side effects of one operation (--y in this case) have been applied by the time the other (y - 2) is evaluated. You can read more about sequence points here.
ISO/IEC 9899:201x
6.5 Expressions
p2: If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.
You should not trust the answers given by your professor in this case.
Expanding on Wintermute's answer a bit...
The problem is with the statement
*(y+2) = *(y--);
The expression y-- evaluates to the current value of y, and as a side effect decrements the variable. For example:
int a = 10;
int b;
b = a--;
After the above expression has been evaluated, b will have the value 10 and a will have the value 9.
However, the C language does not require that the side effect be applied immediately after the expression has been evaluated, only that it be applied before the next sequence point (which in this case is at the end of the statement). Neither does it require that expressions be evaluated from left to right (with a few exceptions). Thus, it's not guaranteed that the value of y in y+2 represents the value of y before or after the decrement operation.
The C language standard explicitly calls operations like this out as undefined behavior, meaning that the compiler is free to handle the situation in any way it wants to. The result will vary based on the compiler, compiler settings, and even the surrounding code, and any answer will be equally correct as far as the language definition is concerned.
In order to make this well-defined and give the same result, you would need to decrement y before the assignment statement:
y--;
*(y+2) = *y;
This is consistently one of the most misunderstood and mis-taught aspects of the C language. If your professor is expecting this particular result to be well-defined, then he doesn't know the language as well as he thinks he does. Then again, he's not unique in that respect.
Repeating and expanding on the snippet from the C 2011 draft standard that Wintermute posted:
6.5 Expressions
...
2 If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
84) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
85) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ? : (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
86) In an expression that is evaluated more than once during the execution of a program, unsequenced and
indeterminately sequenced evaluations of its subexpressions need not be performed consistently in
different evaluations.
Emphasis added. Note that this has been true since the C89 standard, although the wording has changed a bit since then.
"Unsequenced" simply means it's not guaranteed that one operation is completed before the other. The assignment operator does not introduce a sequence point, so it's not guaranteed that the LHS of the expression is evaluated before the RHS.
Now for the hard bit - your professor obviously expects a specific behavior for these kinds of expressions. If he gives a test or a quiz that asks what the result of something like a[i] = i--; will be, he's probably not going to accept an answer of "the behavior is undefined", at least not on its own. You might want to discuss the answers Wintermute and I have given with him, along with the sections of the standard quoted above.
The problem is in this statement.
*(y+2) = *(y--);
Because in C, reading a variable twice in an expression (in which it's modified) has undefined behavior.
Another example is:
i = 5;
v[i] = i++;
In this case the most likely to happen (AFAIK) is that the compiler first evalue RHS or LHS, if LHS is first evaluated, then we will have v[5] = 5; and after the assignment i will be equal to 6, if instead of that RHS is evaluated in the first place, then we will have that the evaluation of the right side will be equal to 5, but when we start evaluating the left side i will be equal to 6, so we will end up with v[6] = 5;, however, given the quote "undefined behavior allow the compiler to do anything it chooses, even to make demons fly out of your nose" you should not expect one of those options, instead of that you should expect anything, because it depends on the compiler what happens.
First of all int x[5] = {0, 3, 5, 7, 9} means
x[0] = 0, x[1] = 3, x[2] = 5, x[3] = 7, x[4] = 9
Next int *y = &x[2] Here you are trying to use pointer y to point the address of x[2]
Now here comes to your confusion *(y + 2) means you are pointing address of x[4]
and *(y--), here y-- is a post decrement operator, hence first of all the the value at *y must be used which is x[2] = 5 so now the value assigned is x[4] = 5.
The final output would be 0 3 5 7 5

when are the assignment operators inside a parenthesis in a expression evaluated in c? [duplicate]

This question already has answers here:
a = (a + b) - (b = a); C++ vs php
(3 answers)
Closed 9 years ago.
So I came across this snippet of code in quora article to swap two numbers.
a = a + b - (b = a);
I tried this out and it worked fine. But since b = a is in parenthesis shouldn't b value be assigned the value of a first ? and the whole thing should become a + a - a making a to retain its value ?
I tried a = b + (b = a); with a = 5 b = 10 and I got a = 10 in the end. See here I guess it evaluated as a = a + a
Why this anomaly ?
This is undefined behavior because of section 6.5.2 from the C99 draft standard which states:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored
In this case we are modifying b and using it's value to determine the result of a, the standard gives the following examples as undefined:
i = ++i + 1;
a[i++] = i;
cranking up warning at least in gcc would have alerted to a problem, using -W -Wall I receive the following warning:
warning: operation on ‘b’ may be undefined [-Wsequence-point]
Precedence establishes what operands are linked by what operators. It does not establish order of evaluation.
The assignment operator has very low precedence. As a result, without the parentheses, the expression
a = a + b - b = a
would be parsed as:
(a) = (a + b - b) = (a)
This would result in an error, because (a + b - b) is not an lvalue. So the parentheses are required in order to group the two operands b and a with the assignment operator, as in your original statement:
a = a + b - (b = a)
But all the parentheses impose is the grouping, not the order of evaluation.
All you can be sure of is that whenever (b = a) is evaluated:
the value of the entire expression will be the value of a
As a side effect, b will be assigned the value of a.
When that will happen, however, is not predictable across compilers. The standard is explicit: in a complex expression, the order in which the subexpressions are evaluated and the order in which side-effects take place is unspecified, i.e., compiler dependent. The standard does not impose any requirements on the order in which subexpressions should be evaluated.
More generally, in C, if you modify a variable's value while using that variable elsewhere in the expression, the result you get will be undefined, which means anything could happen. You absolutely cannot rely on the expression a = a + b - (b = a), as it both modifies the value of b and uses the value of b elsewhere in the expression.
So the expression is evoking both unspecified behavior (relying on a specific order of evaluation) and undefined behavior (modifying a variable's value while using it elsewhere in the expression). Quite an impressive feat, when you think about it!
Edit: The above is true for C99. The latest standard, C11, explicitly calls this behavior undefined: "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings." (6.5.2). The draft standard is available for free.
As stated in the comments, it's Undefined behavior. the code is read straight from left-to-right instead of using PEMDAS.

how it's answer is 36? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc…)
main()
{
int a=5;
a= a++ + ++a + ++a + a++ + a++;
printf("%d",a);
}
This is not defined.
You can find the Committee Draft from May 6, 2005 of the C-standard here (pdf)
See section 6.5 Expressions:
2 Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
and the example:
71) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
The answer is actually undefined.
Answer in undefined because you've got some situations in which the parser doesn't know how to parse the code..
is a+++b: a + ++b or a++ + b?
Think the fact that usually white space is just ignored when lexing the source code. It may depends upon implementation of the compiler (and some other languages with same ++ operators may choose to give priority to one instead of another) but in general this is not safe.
For example in Java your code line gives 37 as the answer, because it chooses to bind ++ operators in a specific way according to precedence, but it's just a choice..

Resources