How undefined is undefined behavior? - c

I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?

I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.

Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.

Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.

If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour

When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)

It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.

I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.

Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.

Related

What is the difference between directly accessing a variable by name and accessing a variable by using *(&variable) in C?

Suppose we declare a variable
int i = 10;
And we have these 2 statements-
printf("%d",i);
printf("%d",*(&i));
These two statements print the same value, i.e. 10
From my understanding of pointers, not just their output is same, but the above two statements mean exactly the same. They are just two different ways of writing the same statement.
However, I came across an interesting code-
#include <stdio.h>
int main(){
const int i = 10;
int* pt1 = &i;
*pt1 = 20;
printf("%d\n", i);
printf("%d\n", *(&i));
return 0;
}
To my surprise, the result is-
10
20
This suggests that i and *(&i) don't mean the same if i is declared with the const qualifier. Can anyone explain?
The behavior of *pt1 = 20; is not defined by the C standard, because pt1 has been improperly set to point to the const int i. C 2018 6.7.3 says:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined…
Because of this, the behavior of the entire program is not defined by the C standard.
In C code with defined behavior, *(&i) is defined to produce the value of i. However, in code with behavior not defined by the C standard, the normal rules that would apply to *(&i) are canceled. It could produce the value that the const i was initialized to, it could produce the value the program attempted to change i to, it could produce some other value, it could cause the program to crash, or it could cause other behavior.
Think of what it means to declare something as const - you're telling the compiler you don't want the value of i to change, and that it should flag any code that has an expression like i = 20.
However, you're going behind the compiler's back and trying to change the value of i indirectly through pt1. The behavior of this action is undefined - the language standard places no requirements on the compiler or the runtime environment to handle this situation in any particular way. The code is erroneous and you should not expect a meaningful result.
One possible explanation (of many) for this result is that the i in the first printf statement was replaced with the constant 10. After all, you promised that the value of i would never change, so the compiler is allowed to make that optimization.
But since you take the address of i, storage must be allocated for it somewhere, and in this case it apparently allocated storage in a writable segment (const does not mean "put this in read-only memory"), so the update through pt1 was successful. The compiler actually evaluates the expression *(&i) to retrieve the current value of i, rather than replacing it with a constant. On a different platform, or in a different program, or in a different build of the same program, you may get a different result.
The moral of this story is "don't do that". Don't attempt to modify a const object through a non-const expression.

C unchecked function return undefined behavior

In c, this pattern is fairly common:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int init_ptr_or_return_err(int *p) {
srand(time(NULL));
// random to make code compile/demonstrate the question
int error = rand() % 2;
if (error) {
return 1;
}
*p = 10;
return 0;
}
int main() {
int a;
init_ptr_or_return_err(&a);
printf("a = %d\n", a);
return 0;
}
In the main function above, without checking the return code of the function, accessing the value of a might be undefined at runtime (but this is not statically determinable). So, it is usually wrapped in a block such as:
if (init_ptr_or_return_err(&a)) {
// error handling
} else {
// access a
}
In this case, the compiler knows that a is initialized in the else because the function returns 0 if and only if it sets a. So, technically, accessing a in the else is defined, but accessing a in the if is undefined. However, return 0 could easily be "return some fixed, but statically unknown value from a file" (and then check just that value before accessing a). So in either case, it isn't statically determinable whether a is initialized or not.
Therefore, it seems to me like in general, the compiler cannot statically decide if this is undefined behavior or not and therefore should not be able to e.g. optimize it out.
What are the exact semantics of such code (is it undefined behavior, something else, or is there a difference between static and runtime undefined behavior) and where does the standard specify this? If this is not defined by the standard, I am using gcc, so answers in the context of gcc would be helpful.
The vast majority of undefined behavior is not statically determinate. Most undefined behavior is of the form "if this statement is reached, and these conditions are met, the program has undefined behavior".
That's the case here. When the program is invoked at a time such that rand() returns an odd number, it has undefined behavior. When it's invoked at a time such that rand() returns an even number, the behavior is well-defined.
Further, the compiler is free to assume you will only invoke the program at a time when rand() returns an even number. For example it might optimize out the branch to the return 1; case, and thereby always print 10.
This code does not necessarily invoke undefined behavior. It is a wide-spread myth that reading an uninitialized variable always invokes undefined behavior. UB only occurs in two special cases:
Reading uninitalized variables that didn't have their address taken, or
Running on an exotic systems with trap representations for plain integers, where an indeterminate value could be a trap.
On mainstream 2's complement platforms (x86/x64, ARM, PowerPC, almost anything...), this code will merely use an unspecified value and invoke unspecified behavior. This is because the variable had its address taken. This is explained in detail here.
Meaning that the result isn't reliable, but the code will execute as expected, without optimizations going bananas etc.
Indeed the compiler will most likely not optimize out the function, but that is because of the time() call and not because of some poorly-defined behavior.

Suppress Divide or Mod by zero Compiler Error in C

I am writing a C program where I want to trigger an exception such that the program crashes on Windows and invokes the Windows Error Reporting.
I thought, one good way to trigger a crash would be to divide by zero. However, when I try to do that, the compiler wouldn't compile the program and give the error: C2124.
Is there a way that I can force the compiler to compile the program and ignore the above divide by zero statement in the code?
Or should I find another way to crash my program? :)
#include <stdio.h>
int main(int argc, char **argv)
{
return 1/0;
}
abort, defined in <stdlib.h> is the preferred way to cause abnormal program termination. However, a solution to your specific intent to cause a divide-by-zero is to conceal the divisor from the compiler:
volatile int x = 0;
return 1/x;
The keyword volatile tells the compiler that x may be changed in ways unknown to the compiler. This prevents the compiler from knowing that it is zero, so the compiler must generate code to perform the division at run-time in case x is changed to something other than zero.
Of course, the behavior of division by zero is not defined, so it is not guaranteed to cause program termination.
Also keep in mind that “If you lie to the compiler, it will get its revenge.” (Henry Spencer).
1/0 is evaluated at compile time. The compiler is smart enough to detect that it's not going to work.
I'd do either:
int a = 0;
return 1/a;
(you may get a warning but that's going to compile)
Or trigger a null pointer access:
int *p = NULL;
return *p;
It's hard to get a C program to crash in a well-defined way. Really it's a contradiction in terms.
Favourite ways are inducing stack overflows, null pointer dereferences, and divisions by zero. But compilers are getting better at spotting these tricks. Recursions can get unrolled to loops, and since 1/0 is a compile-time evaluable constant expression, and it's behaviour is undefined, a compiler is allowed to fail compilation.
A more reliable way is to use void abort(void) from <stdlib.h>. That at least is guaranteed by the C standard not to return to the caller:
#include <stdlib.h>
int main(int, char **)
{
abort();
}

Is the behaviour of a program that has undefined behaviour on an unreachable path defined? [duplicate]

This question already has answers here:
Can code that will never be executed invoke undefined behavior?
(9 answers)
Closed 4 years ago.
Consider
void swap(int* a, int* b)
{
if (a != b){
*a = *a ^ *b;
*b = *a ^ *b;
*a = *a ^ *b;
}
}
int main()
{
int a = 0;
int b = 1;
swap(&a, &b); // after this b is 0 and a is 1
return a > b ? 0 : a / b;
}
swap is an attempt to fool the compiler into not optimising out the program.
Is the behaviour of this program defined? a / b is never reachable, but if it was then you'd get a division by zero.
It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
(C2011, 3.4.3/1; emphasis added)
Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.
On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.
* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).
The behavior of an expression that is not evaluated is irrelevant to the behavior of a program. Behavior that would be undefined if the expression were evaluated has no bearing on the behavior of the program.
If it did, then this code would be useless:
if (p != NULL)
…; // Use pointer p.
(Your XORs could have undefined behavior, as they may produce a trap representation. You can defeat optimization for academic examples like this by declaring an object to be volatile. If an object is volatile, the C implementation cannot know whether its value may change due to external means, so each use of the object requires the implementation to read its value.)
In general, code which would invoke Undefined Behavior if executed must not have any effect if it is not executed. There are, however, a few cases where real-world implementations may behave in contrary fashion and refuse to generate code which, while not a constraint violation, could not possibly execute in defined behavior.
extern struct foo z;
int main(int argc, char **argv)
{
if (argc > 2) z;
return 0;
}
By my reading of the Standard, it explicitly characterizes lvalue conversions on incomplete types as invoking Undefined Behavior (among other things, it's unclear what an implementation could generate code for such a thing), so the Standard would impose no requirements upon behavior if argc is 3 or more. I can't identify any constraint in the Standard that the above code would violate, however, nor any reason behavior should not be fully defined if argc is 2 or less. Nonetheless, many compilers including gcc and clang reject the above code entirely.

What main(++i) will return in C

I have a program like this.
‪#include<stdio.h>
#include<stdlib.h>
int main(int i) { /* i will start from value 1 */
if(i<10)
printf("\n%d",main(++i)); /* printing the values until i becomes 9 */
}
output :
5
2
2
2
Can anyone explain how the output is coming ?? what main(++i) is returning for each iteration.
Also it is producing output 5111 if i remove the \n in the printf function.
Thanks in advance.
First of all, the declaration of main() is supposed to be int main(int argc, char **argv). You cannot modify that. Even if your code compiles, the system will call main() the way it is supposed to be called, with the first parameter being the number of parameters of your program (1 if no parameter is given). There is no guarantee it will always be 1. If you run your program with additional parameters, this number will increase.
Second, your printf() is attempting to print the return value of main(++i), howover, your main() simply don't return anything at all. You have to give your function a return value if you expect to see any coherence here.
And finally, you are not supposed to call your own program's entrypoint, much less play with recursion with it. Create a separate function for this stuff.
Here's what the C Draft Standard (N1570) says about main:
5.1.2.2.1 Program startup
1 The function called at program startup is named main. The implementation declares no
prototype for this function. It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be
used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent or in some other implementation-defined manner.
Clearly, the main function in your program is neither of the above forms. Unless your platform supports the form you are using, your program is exhibiting undefined behavior.
This program has undefined behavior (UB) all over the place, and if you have a single instance undefined behavior in your program, you can't safely assume any output or behavior of your program - It legally can happen anything (although in real world the effects often are somewhat localized near the place of UB in the code.
The old C90 standard listed are more than 100 (if i recall right) situations of UB and there is a not known number of UBs on top, which is behavior for situations, the standard do not describe. A set of situations, that are UB exists, for every C and C++ Standard.
In your case (without consulting standards) instances of UB are at least:
not returning an value of a function that is declared with a return value. (exception: calling main the FIRST time - thanks, Jim for the comments)
defining (and calling) main other than with the predefined forms of the standard, or as specified (as implementation defined behavior) by your compiler.
Since you have at least one instance of UB in your program, speculations about the results, are somewhat... speculative and must make assumptions about your compiler, your operating system, hardware, and even software running on parallel, that are normally not documented or can be known.
You are not initializing i, so by default value will be taken from the address where it is stored in RAM.
This code will produce garbage output if you run the code multiple times after restarting your computer.
The output will also depend on compiler.
I'm surprised that even compiles.
When the operating system actually runs the program and main() gets called, two 32 (or 64) bit values are passed to it. You can either ignore them by declaring main(void), or use them by declaring main(int argc, char** args).
As the above prototype suggests, the first value passed is a count of the number of command-line arguments that are being passed to the process, and the second is a pointer to where a list of these arguments is stored in memory, likely on the program's local stack.
The value of argc is always at least 1, because the first item string in args is always the name of the program itself, generated by the OS.
Regarding your unexpected output, I'd say something is not getting pulled off or pushed onto the stack, so variables are getting mixed up. This is either due to the incomplete argument list for main() or the fact that you've declared main to return an int, but haven't returned anything.
I think, the main method is calling itself inside the main method.
to increment the value of a variable, i++ is printing the value of i before it increment while ++i it increment the value of i first before it print the value of i.
you could use this..
int x=0;
main()
{
do
{
printf(x++);
}while (i<10);
}

Resources