What is Undefined Behaviour in C? [duplicate] - c

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 9 years ago.
What is undefined behaviour in C?
I am using GCC compiler. In some cases I get correct value though the program's output was supposed to be undefined. I ran those programs several times. But the result was consistent. And for some other programs the result was undefined. So, in which cases I should consider that the program behaviour is really undefined? Is there any kind of RULES for this?

undefined behaviour means the compiler can emit any code it likes. Your program might show results that you expect or it might format your harddrive or it can start sending emails to the taliban. anything can happen

The definition of undefined behavior:
C11(ISO/IEC 9899:201x) §3.4.3
1 undefined behavior
behavior, upon use of a non portable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
There is also a list of undefined behaviors in C11 §J.2 Undefined behavior

A behavior, upon use of a non-portable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements.
Example:
i = ++i;
For more you can read this.

I think, in simple words, if the behavior of the instruction is not guarenteed to be consistant across all compilers or all situation, you can say it as undefined behavior.

This can be illustrated by an example,
#include "stdio.h"
int *ptr;
void func2()
{
int k = 300;
}
void func1()
{
int t = 100;
ptr = &t;
}
int main(int argc, char *argv)
{
func1();
printf("The value of t=%d\r\n",*ptr);
func2();
printf("The value of t=%d\r\n",*ptr);
}
On my machine, I got the following.
joshis1#(none) temp]$ ./ud.out
The value of t=100
The value of t=300
This tells that the value of t was not guaranteed. Once the scope of t was over, the stack space was allocated to k. Thus, ptr was accessing the same address - memory location . But the variable scope was over. You will get the consistent result, in case you don't call func2(); Thus, compiler doesn't guarantee the outcome -> This is called the undefined behaviour.

Related

What is the expected output of initializing a variable to itself? [duplicate]

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.
This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.
I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

Change pointer's value in a function [duplicate]

This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 4 years ago.
I want to modify my pointer's value in a function but I don't get why this example works.
#include<stdio.h>
void foo (int **p);
int main()
{
int i=97;
int *p = &i;
foo(&p);
printf("%d",*p);
return 0;
}
void foo (int **p)
{
int j=2;
*p = &j;
printf("%d",**p);
}
The output is 2 2 but how can it be possible if j doesn't exist anymore
in the main? what does p point to?
I would have expected that in the second time I printed the value of p(in the main) there will be garbage.
Dereferencing a pointer that points to an invalid object (e.g. to one with automatic storage duration that has gone out of scope) is undefined behaviour (cf, for example, this online C standard draft):
3.4.3
1 undefined behavior behavior, upon use of a nonportable or erroneous
program construct or of erroneous data, for which this International
Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So undefined behaviour means that anything might happen, even that it "works as intended" like in your case. Yet you must not rely on such a behaviour.

C unchecked function return undefined behavior

In c, this pattern is fairly common:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int init_ptr_or_return_err(int *p) {
srand(time(NULL));
// random to make code compile/demonstrate the question
int error = rand() % 2;
if (error) {
return 1;
}
*p = 10;
return 0;
}
int main() {
int a;
init_ptr_or_return_err(&a);
printf("a = %d\n", a);
return 0;
}
In the main function above, without checking the return code of the function, accessing the value of a might be undefined at runtime (but this is not statically determinable). So, it is usually wrapped in a block such as:
if (init_ptr_or_return_err(&a)) {
// error handling
} else {
// access a
}
In this case, the compiler knows that a is initialized in the else because the function returns 0 if and only if it sets a. So, technically, accessing a in the else is defined, but accessing a in the if is undefined. However, return 0 could easily be "return some fixed, but statically unknown value from a file" (and then check just that value before accessing a). So in either case, it isn't statically determinable whether a is initialized or not.
Therefore, it seems to me like in general, the compiler cannot statically decide if this is undefined behavior or not and therefore should not be able to e.g. optimize it out.
What are the exact semantics of such code (is it undefined behavior, something else, or is there a difference between static and runtime undefined behavior) and where does the standard specify this? If this is not defined by the standard, I am using gcc, so answers in the context of gcc would be helpful.
The vast majority of undefined behavior is not statically determinate. Most undefined behavior is of the form "if this statement is reached, and these conditions are met, the program has undefined behavior".
That's the case here. When the program is invoked at a time such that rand() returns an odd number, it has undefined behavior. When it's invoked at a time such that rand() returns an even number, the behavior is well-defined.
Further, the compiler is free to assume you will only invoke the program at a time when rand() returns an even number. For example it might optimize out the branch to the return 1; case, and thereby always print 10.
This code does not necessarily invoke undefined behavior. It is a wide-spread myth that reading an uninitialized variable always invokes undefined behavior. UB only occurs in two special cases:
Reading uninitalized variables that didn't have their address taken, or
Running on an exotic systems with trap representations for plain integers, where an indeterminate value could be a trap.
On mainstream 2's complement platforms (x86/x64, ARM, PowerPC, almost anything...), this code will merely use an unspecified value and invoke unspecified behavior. This is because the variable had its address taken. This is explained in detail here.
Meaning that the result isn't reliable, but the code will execute as expected, without optimizations going bananas etc.
Indeed the compiler will most likely not optimize out the function, but that is because of the time() call and not because of some poorly-defined behavior.

Is the behaviour of a program that has undefined behaviour on an unreachable path defined? [duplicate]

This question already has answers here:
Can code that will never be executed invoke undefined behavior?
(9 answers)
Closed 4 years ago.
Consider
void swap(int* a, int* b)
{
if (a != b){
*a = *a ^ *b;
*b = *a ^ *b;
*a = *a ^ *b;
}
}
int main()
{
int a = 0;
int b = 1;
swap(&a, &b); // after this b is 0 and a is 1
return a > b ? 0 : a / b;
}
swap is an attempt to fool the compiler into not optimising out the program.
Is the behaviour of this program defined? a / b is never reachable, but if it was then you'd get a division by zero.
It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
(C2011, 3.4.3/1; emphasis added)
Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.
On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.
* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).
The behavior of an expression that is not evaluated is irrelevant to the behavior of a program. Behavior that would be undefined if the expression were evaluated has no bearing on the behavior of the program.
If it did, then this code would be useless:
if (p != NULL)
…; // Use pointer p.
(Your XORs could have undefined behavior, as they may produce a trap representation. You can defeat optimization for academic examples like this by declaring an object to be volatile. If an object is volatile, the C implementation cannot know whether its value may change due to external means, so each use of the object requires the implementation to read its value.)
In general, code which would invoke Undefined Behavior if executed must not have any effect if it is not executed. There are, however, a few cases where real-world implementations may behave in contrary fashion and refuse to generate code which, while not a constraint violation, could not possibly execute in defined behavior.
extern struct foo z;
int main(int argc, char **argv)
{
if (argc > 2) z;
return 0;
}
By my reading of the Standard, it explicitly characterizes lvalue conversions on incomplete types as invoking Undefined Behavior (among other things, it's unclear what an implementation could generate code for such a thing), so the Standard would impose no requirements upon behavior if argc is 3 or more. I can't identify any constraint in the Standard that the above code would violate, however, nor any reason behavior should not be fully defined if argc is 2 or less. Nonetheless, many compilers including gcc and clang reject the above code entirely.

How undefined is undefined behavior?

I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?
I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.
Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.
Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.
If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour
When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)
It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.
I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.
Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.

Resources