undefined behaviour in C [duplicate] - c

This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
Parameter evaluation order before a function calling in C
order of evaluation of function parameters
What will be the output of the following code:
n=5;
printf("%d %d\n", ++n, power(2, n));
output=32
shoulld not be the output be 2^6 =64?
will different compilers give different result?

Order of evaluation of function arguments is unspecified. The compiler can compute the arguments in any order it pleases, but it must do it in some particular order (so there's no undefined behaviour here). The output can be either 32 or 64.
UPD: this is wrong, there's UB here, see here.

Contrary to what other answers say, the code may indeed exhibit undefined behavior.
As has been stated, the order of evaluation of function arguments is unspecified in C. In your program, you have an expression composed of several subexpressions:
ex0(ex1, ex2, ex3(ex4, ex5));
There is only a partial ordering between these subexpressions: ex4 and ex5 must obviously be evaluated before ex3 can be evaluated, and ex1, ex2, and ex3 must be evaluated before ex0 can be evaluated. Other than that, the order of evaluation of the subexpressions is unspecified, and it is up to your compiler to decide the order in which to evaluate the subexpressions.
There are certain, valid orders of evaluations that yield undefined behavior. For example, if ++n is evaluated before power(2, n), the results are undefined: there is a sequence point after the evaluation of all of the arguments to a function but not before or between, and the C Language Standard very clearly states (C99 §6.5/2; emphasis mine):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
If ++n is evaluated first, this rule is violated because n is modified by ++n and n is read for the call to power(2, n) without a sequence point between those two steps.
A program that exhibits undefined behavior may produce any result: it may crash, it may print an unexpected answer, or it may appear to work as you expect it to work. Since your program potentially exhibits undefined behavior, it's difficult to discuss with certainty the actual behavior that you see. It is best to avoid writing programs that potentially (or worse, actually) exhibit undefined behavior.

And what does your power function contain? I just checked C math libraries pow function. I had to cast it to <int> like this;
#include <cstdio>
#include <cmath>
using namespace std;
int main () {
int n=5;
printf("%d %d\n", ++n, (int)pow(2.0, n));
return 0;
}
Output: 6 64
I used Microsoft Compiler(used by Visual Studio). Hope it helps.

Related

What would happen if "i = i++" was not considered undefined behavior? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm having trouble understanding the difference between unspecified and undefined behavior. I think trying to understand some examples would be useful. For instance, x = x++. The problem with this assignment is that:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
This violates a shall rule, but does not explicitly invoke undefined behavior, but it involves UB according to:
The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.
Assuming none of these rules existed and there are no other rules that "invalidate" x = x++. The value of x would then be unspecified, right?
The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.
Edit: As pointed out by P.W, there is a somewhat related, well-received, version of this question for C++: What made i = i++ + 1; legal in C++17?.
I'm having trouble understanding the difference between unspecified and undefined behavior.
Then let's start with the definitions of those terms from the Standard:
undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this
International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
(C2011, 3.4.3)
unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen
in any instance
EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.
(C2011, 3.4.4)
You remark that
The doubt arised because sometimes it is argued that things in C are
UB by "default" are only valid you can justify that the construction
is valid.
It is perhaps over-aggrandizing that to call it an argument, as if there were some doubt about its validity. In truth, it reflects explicit language from the standard:
If a ''shall'' or ''shall not'' requirement that appears outside of a
constraint or runtime- constraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words ''undefined behavior'' or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe ''behavior
that is undefined''.
(C2011, 4/2; emphasis added)
When you posit
Assuming none of these rules existed and there are no other rules that
"invalidate" x = x++.
, that doesn't necessarily change anything. In particular, removing the explicit rule that the order of evaluation of the operands is unspecified does not make the order specified. I'd be inclined to argue that the order remains unspecified, but the alternative is that the behavior would be undefined. The primary purpose served by explicitly saying it's unspecified is to sidestep that question.
The rule explicitly declaring UB when an object is modified twice between sequence points is a little less clear, but falls in the same boat. One could argue that the standard still did not define behavior for your example case, leaving it undefined. I think that's a bit more of a stretch, but that's exactly why it is useful to have an explicit rule, one way or the other. It would be possible to define behavior for your case -- Java does, for example -- but C chooses not to do, for a variety of technical and historical reasons.
The value of x would then be unspecified, right?
That's not entirely clear.
Please understand, too, that the various provisions of the standard for the most part do not stand alone. They are designed to work together, as a (mostly) coherent whole. Removing or altering random provisions has considerable risk of producing inconsistencies or gaps, leaving it difficult to reason about the result.
Modern C11/C17 has changed the text, but it has pretty much the same meaning. C17 6.5/2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
There are several slightly different issues here, mixed into one:
Between sequence points, x is written to (side effect) more than once. This is UB as per the above.
Between sequence points, the expression contains at least one side effect and there is a value computation of the same variable not related to which value to be stored. This is also UB as per the above.
In the expression x = x++, the evaluation of the operand x is not sequenced in relation to the operand x++. The evaluation order is unspecified behavior as per C17 6.5.16.
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
If not for the first cited part labelling this UB, then we still wouldn't know if the x++ would be sequenced before or after the evaluation of the left x operand, so it is hard to reason about how this could become "just unspecified behavior".
C++17 actually fixed this part, making it well-defined there, unlike in C or earlier C++ versions. They did so by defining the sequence order (C++17 8.5.18):
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
I don't see how there can be any middle-ground here; either the expression is undefined or it is well-defined.
Unspecified behavior is deterministic behavior which we cannot know or assume anything about. But unlike undefined behavior, it won't cause crashes and random program behavior.
A good example is a() + b(). We can't know which function that will be executed first - the program doesn't even have to be consistent if the same line appears later on in the same program. But we can know that both functions will be executed, one before the other.
Unlike x = a() + b() + x++; which is undefined behavior and we can't assume anything about it. One, both or none of the functions might be executed, in any order. The program might crash, produce incorrect results, produce seemingly correct results or do nothing at all.
There have been instances in other programming languages when a previously undefined behavior has become defined in a later standard. One instance I can remember is in C++ where what was undefined behavior in C++11 became well defined in C++17.
i = i++ + 1; // the behavior is undefined in C++11
i = i++ + 1; // the behavior is well-defined in C++17. The value of i is incremented
There has been a well received question on this topic.
What made this well defined is a guarantee in the C++17 standard that
The right operand is sequenced before the left operand.
So in a sense, it is upto the standards committee people to change the standard and provide strong guarantees to make it well defined.
But I do not think that something as simple as x = x++; will be made unspecified. It's will either be undefined or well-defined.
The problem seems that it cannot be properly defined what i= i++; would mean:
Interpretation 1:
int i1= i;
int i2= i1+1;
i = i2;
i = i1;
In this interpretation the value of i is retrieved and 1 is added (i2), then this i2 is saved to i but the original i in i1 is further used in the assignment (because here the ++ is interpreted to apply to the value after it has been used) and so i is unchanged.
Interpretation 2:
int i1= i;
i1= i1+1;
i= i1;
int i2= i;
i= i2;
In this interpretation the i++ is performed first (and modifies i) and now the modified i is retrieved again and used in the assignment (so i has the incremented value).
Interpretation 3:
int i1= i;
i = i1;
int i2= i1+1;
i= i2;
In this interpretation first the assignment of i to i is executed and then i is incremented.
To me, all these three interpretations are correct, and there could even be a few more interpretations, but they each do something different. Hence the standard could/did not define it and which interpretation a compiler uses is up to the compiler builder and as a result which behavior a compiler exhibits is undefined: undefined behavior.
(A compiler could even generate a jmp toTheMoon instruction or ignore the whole statement.)
The order of evaluation and application of the side effect of ++ is left unspecified - the language standard does not mandate left-to-right or right-to-left order (for arithmetic operators, anyway). Consider the well-defined expression a = b++ * ++c. The expressions a, b++, and ++c may be evaluated in any order. Similarly, the side effects to b and c may be applied immediately after evaluation, or deferred until just before the next sequence point, or anywhere in between. All that matters is that the result of b * (c+1) is computed before being assigned to a. The following is one perfectly legal evaluation:
tmp <- c + 1;
a = b * tmp;
c <- c + 1
b <- b + 1
So is this:
c <- c + 1
a <- b * c
b <- b + 1
So is this:
tmp1 <- b
b <- b + 1
tmp2 <- c + 1
a <- tmp1 * tmp2
c <- c + 1
What matters is that, no matter what order of evaluation is chosen, you will always get the same result.
x = x++ could be evaluated in either of the following ways, depending on when the side effect is applied:
Option 1 Option 2
-------- --------
tmp <- x tmp <- x
x <- x + 1 x <- tmp
x <- tmp x <- x + 1
The problem is that the two methods give different results. Other, completely different methods may be available based on the instruction set that give different results than these two.
The language standard doesn't mandate what to do when an expression gives different results depending on the order in which it is evaluated - it doesn't place any requirements on the compiler or the runtime environment to pick either option. This is what undefined means - literally, the behavior is not defined by the language specification. You will get a result, but it's not guaranteed to be consistent, or the result you would expect.
Undefined does not mean illegal. Nor does it mean your code is guaranteed to crash. It just means that the result is not predictable or guaranteed to be consistent. An implementation doesn't even have to issue a diagnostic saying "hey, dummy, this is a bad idea."
An implementation is free to define and document a behavior left undefined by the standard (such as MSVC defining fflush on input streams). A number of compilers take advantage of certain behaviors being undefined to perform some optimizations. And some compilers do issue warnings for common mistakes like x = x++.

C - Prefix and postfix operators in function calls [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 5 years ago.
I have seen in this site that prefix increment or postfix increment in a function call may cause undefined behavior. I have gone through one of those recently. The source code is something like this :
#include <stdio.h>
void call(int,int,int);
int main()
{
int a=10;
call(a,a++,++a);
printf("****%d %d %d***_\n",a,a++,++a);
return 0;
}
void call(int x,int y,int z)
{
printf("%d %d %d",x,y,z);
}
The output comes out as 12 11 12****14 13 14***_. But, when a is printed first in the function, shouldn't it be 10? Why does it become 12? Also, why does a++ decrease from 12 to 11? Can someone please kindly explain? Thank you.
There are two things your example code requires us to consider:
The function arguments order of evaluation is unspecified. Therefore, either ++a or a++ is evaluated first but it is implementation-dependent.
Modifying the value of a more than once without a sequence point in between the modifications is also undefined behavior.
Because of point 2, you have double undefined behavior here (you do it twice). Note undefined behavior doesn't mean nothing happens; it means anything can happen.
call(a,a++,++a); /* UB 1 */
printf("****%d %d %d***_\n",a,a++,++a); /* UB 2 */
That is undefined behaviour and as such it is entirely up to the implementation of the compiler in which order the following operations are done:
submit argument a
submit argument a++
submit argument ++a
increment a for ++a
increment a for a++
The only thing that the compiler knows is: 2. has to happen before 5. and 4. has to happen before 3.
You are observing:
++a;
submit argument 2
a++;
submit the other arguments
The C and C++ standards do not indicate an order of evaluation for function arguments. To be blunt, it is not incorrect for a compiler to cause the parameters to be evaluated from right to left, or left to right.
The best answer is, Stay Away from this type of 'Undefined Behavior'; as it can lead to subtle portability problems.
There's no requirement that a be equal to anything; that's what is meant by undefined behavior. It's entirely up to the compiler to evaluate the arguments to call and to printf in any order it sees fit, because the language does not specify what order they need to be evaluated.

Calling Convention Confusion [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc…)
I'm not able to understand the output of this program (using gcc).
main()
{
int a=10;
printf("%d %d %d\n",++a, a++,a);
}
Output:
12 10 12
Also, please explain the order of evaluation of arguments of printf().
The compiler will evaluate printf's arguments in whatever order it happens to feel like at the time. It could be an optimization thing, but there's no guarantee: the order they are evaluated isn't specified by the standard, nor is it implementation defined. There's no way of knowing.
But what is specified by the standard, is that modifying the same variable twice in one operation is undefined behavior; ISO C++03, 5[expr]/4:
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
printf("%d %d %d\n",++a, a++,a); could do a number of things; work how you expected it, or work in ways you could never understand.
You shouldn't write code like this.
AFAIK there is no defined order of evaluation for the arguments of a function call, and the results might vary for each compiler. In this instance I can guess the middle argument was first evaluated, following by the first, and the third.
As haggai_e hinted, the parameters are evalueted in this order: middle, left, right.
To fully understand why these particular numbers are showing up, you have to understand how the increment works.
a++ means "do something with a, and then increment it afterwards".
++a means "increment a first, then do something with the new value".
In your particular Example, printf evaluates a++ first, reads 10 and prints it and only then increments it to 11. printf then evaluates ++a, increments it first, reads 12 and prints it out. The last variable printf evaluates is read as it is (12) and is printed without any change.
Although they are evaluated in a random order, they are displayed in the order you mentioned them. That's why you get 12 10 12 and not 10 12 12.

Is the output of printf ("%d %d", c++, c); also undefined?

I recently came across a post What is the correct answer for cout << c++ << c;? and was wondering whether the output of
int c = 0;
printf ("%d %d", c++, c);
is also undefined??
I have studied in lectures that post-fix and prefix operators increment value only after getting a semicolon. So according to me, the output 0 0 is correct !!!
I have studied in lectures that post-fix and prefix operators increment value only after getting a semicolon.
Send your lecturer to me so that I can take a baseball bat to him politely point out his mistake.
Exactly when the side effect of either pre- or postfix ++ and -- is applied is unspecified, apart from the requirement that it happen before the next sequence point. In an expression like
x = a++ * b
a may be updated immediately after a++ has been evaluated, or the update may be deferred until a++ * b has been evaluated and the result assigned to x, or anywhere in between.
This is why expressions like i++ * i++ and printf("%d %d", c++, c) and a[i++] = i and a host of others are all bad juju. You will get different results based on the compiler, optimization settings, surrounding code, etc. The language standard explicitly leaves the behavior undefined so that the compiler is under no obligation to "do the right thing", whatever the right thing may be. Remember, the definition for undefined behavior is
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
This is a deliberate design decision - the rationale for leaving the order of these operations unspecified is to give the implementation freedom to rearrange the evaluation order for optimization purposes. However, in exchange for this freedom, certain operations will not have well-defined results.
Note that a compiler is free to try to detect these cases and issue a diagnostic; printf("%d %d", c++, c); would be easy enough to catch, but this would be a bugger to detect in the general case. Imagine if that had been written printf("%d %d", (*p)++, c); if p points to c, then the behavior is undefined, otherwise it's okay. If p is assigned in a different translation unit, then there's no way to know at compile time whether this is a problem or not.
This concept is not difficult to understand, yet it is one of the most consistently misunderstood (and mis-taught) aspects of the C language. No doubt this is why the Java and C# language specifications force a specific evaluation order for everything (all operands are evaluated left-to-right, and all side effects are applied immediately).
I have studied in lectures that post-fix and prefix operators increment value only after getting a semicolon
This is not how the standard describes it. A sequence point is a point in code in which side effects which may have occurred in previous parts of the code have been evaluated. The comma between arguments to a function is not a sequence point, so the behavior there is undefined.
The evaluation order of function arguments is unspecified. There is no guarantee that the arguments to a function will be evaluated in the order (1, 2, N), so there is no guarantee that the increment will be evaluated before the second argument is passed.
So according to me, the output 0 0 is correct !!!
No, the behavior is undefined, so you cannot reasonably claim that the output will be 0 0.
The behavior of the program is undefined because it has violated the requirements of 6.5 Expressions:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
c++ and c are both evaluated without an intervening sequence point, and the prior value of c is read both to determine the value to be stored by c++, and to determine the value of the expression c.
The behaviour will be definitely undefined due to the undefined evaluation order of parameters. You can prove this "undefined output" doing some random testing:
printf("%d %d\n", c++, c);
// result: 0 1
printf("%d %d %d\n", c++, c, c++);
// result: 1 2 0
printf("%d %d %d %d\n", c++, c++, c++, c);
// result: 2 1 0 3
printf("%d %d %d %d\n", c++, c, c++, c);
// result: 1 2 0 2
printf("%d %d %d %d\n", c++, c, c, c);
// result: 0 1 1 1
You are right: it is undefined. The reason is that, though it is guaranteed that the three arguments to printf() will be evaluated before printf() is called, the sequence in which the three arguments are evaluated is undefined.
It is technically incorrect that the incrementation occurs only after the semicolon, incidentally. What the standard guarantees is that the incrementation will occur no later than the semicolon. [Actually, in your case, I believe that the standard guarantees that it will occur before control is passed to the printf() function -- but now this answer is starting to spin off into realms of pedantic trivia, so let me let the matter rest there!]
Anyway, in short, you are right. The behavior is undefined.
Update: As #R.. rightly observes, the undefined behavior comes from the lack of a sequence point between arguments. The standard is quite careful regarding the participles unspecified and undefined, so the correction is accepted with thanks.
This program exhibits a combination of both unspecified behavior and undefined behavior. Starting with the unspecified behavior, the draft C99 standard in section6.5 paragraph 3 says:
The grouping of operators and operands is indicated by the syntax.74)
Except as specified later (for the function-call (), &&, ||, ?:, and
comma operators), the order of evaluation of subexpressions and the
order in which side effects take place are both unspecified.
It also says except as specified later and specifically cites function-call (), so we see that later on the draft standard in section 6.5.2.2 Function calls paragraph 10 says:
The order of evaluation of the function designator, the actual
arguments, and subexpressions within the actual arguments is
unspecified, but there is a sequence point before the actual call.
So we do not know whether the read of C or the evaluation of C++ will happen first at this line of code:
printf ("%d %d", c++, c);
furthermore, in section 6.5.2.4 Postfix increment and decrement operators paragraph 2 says:
[...] After the result is obtained, the value of the operand is incremented. [...] The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
So all we know is that when performing the post increment c will be updated after its value is read but before the next sequence point which is right before printf is called but nothing else. As for the undefined behavior, if we look at section 6.5 paragraph 2 from the draft standard, is says:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
In the printf expression cs prior value is being read in order to evaluate both C++ and C and so we now are in undefined territory.

operator precedence problem? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Compilers and argument order of evaluation in C++
i have a print statement like below...
int i=0;
printf("%d,%d,%d,%d,%d,%d",i++,i,++i,i--,++i,i);
according to precedence i++,i,++i,i--,++i,i should be evaluated like below step by step...
0,i,++i,i--,++i,i // after this i=1;
0,i,++i,1,++i,i // after this i=0;
0,i,++i,1,1,i // after this i=1;
0,i,2,1,1,i // after this i=2;
0,2,2,1,1,2 // after this i=2;
and final result as i think from this logic should be...
0,2,2,1,1,2
but i am getting 2,2,2,1,2,2 what is the reason behind this?
btw i m using visual c++ 2010...
That's undefined behavior. The order of evaluation of the arguments to a function is specifically left unspecified by the C and C++ language standards in order to let the compiler produce the most optimal machine code across a broad range of hardware.
You're not allowed to modify a variable more than once between sequence points. The language standard says that sequence points only come at particular points in your code, such as at the semicolons that delimit statements. There is a sequence point after the initial assignment int i=0;, there is a sequence point after printf returns, and there is a sequence point after all of the arguments to printf have been evaluated but before printf actually gets called, but there is not a sequence point in between the evaluation of each of the arguments.
Not.
The comma operator here:
a,b;
and the comma which separates functions arguments:
f(a,b);
are different.
Only real comma operator will be a sequence point and will order evaluation of left and right arguments (a and b expressions in my example).
And the comma between function arguments is not a sequence point and order of argument evaluation is undefined (even same compiler may evaluate them in different order in different call sites). Also, it is illegal (undefined behaviour) to change the same lvalue (i variable in your example) twice or more times in the part of program between sequence points. Every modification of same object must be separated by sequence point (e.g. with ;) from another modification of the same object.
The comma in your case is not the , operator that guarantees evaluation in sequence, but belongs to the function call syntax and the sequence in which function arguments are evaluated is undefined. So your code should surely exhibit undefined behaviour (which it seemingly does).
It's up to the compiler what code it generates. Evaluation order of the arguments of a function call is not defined.

Resources