In c, this pattern is fairly common:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int init_ptr_or_return_err(int *p) {
srand(time(NULL));
// random to make code compile/demonstrate the question
int error = rand() % 2;
if (error) {
return 1;
}
*p = 10;
return 0;
}
int main() {
int a;
init_ptr_or_return_err(&a);
printf("a = %d\n", a);
return 0;
}
In the main function above, without checking the return code of the function, accessing the value of a might be undefined at runtime (but this is not statically determinable). So, it is usually wrapped in a block such as:
if (init_ptr_or_return_err(&a)) {
// error handling
} else {
// access a
}
In this case, the compiler knows that a is initialized in the else because the function returns 0 if and only if it sets a. So, technically, accessing a in the else is defined, but accessing a in the if is undefined. However, return 0 could easily be "return some fixed, but statically unknown value from a file" (and then check just that value before accessing a). So in either case, it isn't statically determinable whether a is initialized or not.
Therefore, it seems to me like in general, the compiler cannot statically decide if this is undefined behavior or not and therefore should not be able to e.g. optimize it out.
What are the exact semantics of such code (is it undefined behavior, something else, or is there a difference between static and runtime undefined behavior) and where does the standard specify this? If this is not defined by the standard, I am using gcc, so answers in the context of gcc would be helpful.
The vast majority of undefined behavior is not statically determinate. Most undefined behavior is of the form "if this statement is reached, and these conditions are met, the program has undefined behavior".
That's the case here. When the program is invoked at a time such that rand() returns an odd number, it has undefined behavior. When it's invoked at a time such that rand() returns an even number, the behavior is well-defined.
Further, the compiler is free to assume you will only invoke the program at a time when rand() returns an even number. For example it might optimize out the branch to the return 1; case, and thereby always print 10.
This code does not necessarily invoke undefined behavior. It is a wide-spread myth that reading an uninitialized variable always invokes undefined behavior. UB only occurs in two special cases:
Reading uninitalized variables that didn't have their address taken, or
Running on an exotic systems with trap representations for plain integers, where an indeterminate value could be a trap.
On mainstream 2's complement platforms (x86/x64, ARM, PowerPC, almost anything...), this code will merely use an unspecified value and invoke unspecified behavior. This is because the variable had its address taken. This is explained in detail here.
Meaning that the result isn't reliable, but the code will execute as expected, without optimizations going bananas etc.
Indeed the compiler will most likely not optimize out the function, but that is because of the time() call and not because of some poorly-defined behavior.
Related
Hey I'm working on an exercise where I have to program an unsigned function in C, and I have a question, do I have to return something because of the type of the function?? Or is it optional??
In normal use, any function declared to return a value ought to return a value. This is largely a matter of good programming practice. Failing to return a value is often a sign of an error. However, this is not required by the C standard, and there are two exceptions.
C 2018 6.9.1 12 allows a function to terminate by reaching the end of its code without returning a value, provided the caller does not use the value of the function:
Unless otherwise specified, if the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
One case where a function might return a value in some situations and not in others is where it is designed to perform various different actions upon command. For example:
unsigned DoSomething(int command, int parameters[])
{
switch (command)
{
case GetSomething:
…
return Something;
case RememberSomething;
Something = parameters[0];
…
break;
}
}
Such designs may be error-prone and ought to be considered carefully. Since the function is declared to return a value, a programmer might expect it to do so and inadvertently assign the value of the function to another object in a situation where no value is returned, resulting in undefined behavior.
Additionally, a function may not return at all, and there is a function specifier for explicitly telling the compiler this is so, _Noreturn, as specified in 6.7.4. This behavior is typically used only in abnormal situations. For example, the standard functions abort and exit terminate the program, so they do not return. Of course, they are declared with void, so they never return anything. However, you might have a function that calls exit in some circumstances and returns an integer in others.
It is a good idea to have all your non-void functions return something, but if you exit a function without return statment (i.e. with the } character) and try to access its result, the behavior is undefined.
If you explore different compilers on Godbolt, with the following code as the C source:
int n()
{
int a = 1234;
}
You will notice that some compilers issue a warning on this behavior depending on their warning levels.
The value you get if you try to use the result function as an int, for example, in printf("%d\n", n()); is undefined: there is no guarantee about what value you will get or if your program will crash, or do something else entirely.
That said, depending on the architecture you're compiling for, you may get a local value from your function as the return value. For example, on the x86 architecture, just about every calling convention specifies that the return value is whatever is in the register eax when the function returns. So, if the generated code for the function n above stores the local variable a in eax, the "return value" (quote-unquote) will be 1234.
This is not something you should ever rely on, though.
/*implementation of strcmp*/
#include<stdio.h>
#include<string.h>
/*length of the string*/
static const int MAX_LENGTH = 4;
/*gets string comparison's return value i.e. the difference between the first unmatched characters*/
int getStrCmpReturn(char[], char[]);
/*gets the maximum of the two integers*/
int max(int, int);
int main()
{
char string1[MAX_LENGTH];
char string2[MAX_LENGTH];
gets(string1);
gets(string2);
printf("\n%d", getStrCmpReturn(string1, string2));
return 0;
}
int getStrCmpReturn(char string1[], char string2[])
{
//int test = 50;
int strCmpReturn = 0;
int i;
for(i = 0; (i < max((int)strlen(string1), (int)strlen(string2))); i++)
{
if(string1[i] != string2[i])
{
strCmpReturn = string1[i] - string2[i];
break;
}
}
return strCmpReturn; //not required, why?
}
int max(int string1Length, int string2Length)
{
if(string1Length >= string2Length)
{
return string1Length;
}
else
{
return string2Length;
}
}
Look at the definition of the function getStrCmpReturn(). It is seen that if the the return statement is removed or commented, the function still returns the value stored in the variable strCmpReturn. Even if an extra variable is added to it, like "int test = 5;" (shown in comments), the function still returns the value stored in the variable strCmpReturn.
How is the compiler able to guess that the value in "strCmpReturn" is to be returned, and not the ones stored in other variables like "test" or "i"?
The compiler is not guessing.
You have (without the required return) some undefined behavior. Be scared.
What might happen is that your particular compiler (with your particular compilation flag on your particular machine) has filled (by accident, bad luck or whatever reason) a processor register which contains some apparently suitable return value (as requested by the relevant ABI and calling conventions).
With different compilers (or different versions of them) or a different operating system or computer, or different optimizations flags you could observe some other behavior.
A compiler might use random numbers to allocate registers (or make other decisions); but for the sake of compiler writers, it usually don't ; in other words, compilers writers try to make their compiler somehow deterministic, but the C11 standard (read n1570) does not require that.
6.9.1 Function definitions
...
12 If the } that terminates a function is reached, and the value of the function call is used by
the caller, the behavior is undefined.
C 2011 online draft
In plain English, the behavior of this code is not predictable. It's working as expected for you with your particular combinatoin of hardware, OS, and compiler, but that may not be the case with a different compiler, or even in a different program using the same compiler.
"Undefined behavior" means that the compiler and runtime environment are not required to "do the right thing", whatever the right thing would be. The code may work as expected, or it may crash immediately, or it may corrupt other data leading to a crash later on, or any of a hundred other outcomes.
C's definition is a bit loose in places. There is a constraint (i.e. a semantic rule) that says if a return statement appears in a function that returns anything other than void, then it must be followed by an expression (the return value). Similarly, there's a constraint that says if a return statement appears in a function returning void, then it must not be followed by an expression. However, there are no constraints that say a return statement must be present in either case.
Believe it or not, knowing the history of C, this makes sense. C didn't originally have a void type, and there wasn't a good way to distinguish between functions that computed and returned a value vs. functions that just executed some statements. It was a bit of a pain to force a return on something whose value would never be used anyway, so the presence of return statements are not enforced by either the grammar or any constraints.
This question already has answers here:
Can code that will never be executed invoke undefined behavior?
(9 answers)
Closed 4 years ago.
Consider
void swap(int* a, int* b)
{
if (a != b){
*a = *a ^ *b;
*b = *a ^ *b;
*a = *a ^ *b;
}
}
int main()
{
int a = 0;
int b = 1;
swap(&a, &b); // after this b is 0 and a is 1
return a > b ? 0 : a / b;
}
swap is an attempt to fool the compiler into not optimising out the program.
Is the behaviour of this program defined? a / b is never reachable, but if it was then you'd get a division by zero.
It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
(C2011, 3.4.3/1; emphasis added)
Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.
On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.
* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).
The behavior of an expression that is not evaluated is irrelevant to the behavior of a program. Behavior that would be undefined if the expression were evaluated has no bearing on the behavior of the program.
If it did, then this code would be useless:
if (p != NULL)
…; // Use pointer p.
(Your XORs could have undefined behavior, as they may produce a trap representation. You can defeat optimization for academic examples like this by declaring an object to be volatile. If an object is volatile, the C implementation cannot know whether its value may change due to external means, so each use of the object requires the implementation to read its value.)
In general, code which would invoke Undefined Behavior if executed must not have any effect if it is not executed. There are, however, a few cases where real-world implementations may behave in contrary fashion and refuse to generate code which, while not a constraint violation, could not possibly execute in defined behavior.
extern struct foo z;
int main(int argc, char **argv)
{
if (argc > 2) z;
return 0;
}
By my reading of the Standard, it explicitly characterizes lvalue conversions on incomplete types as invoking Undefined Behavior (among other things, it's unclear what an implementation could generate code for such a thing), so the Standard would impose no requirements upon behavior if argc is 3 or more. I can't identify any constraint in the Standard that the above code would violate, however, nor any reason behavior should not be fully defined if argc is 2 or less. Nonetheless, many compilers including gcc and clang reject the above code entirely.
This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 9 years ago.
What is undefined behaviour in C?
I am using GCC compiler. In some cases I get correct value though the program's output was supposed to be undefined. I ran those programs several times. But the result was consistent. And for some other programs the result was undefined. So, in which cases I should consider that the program behaviour is really undefined? Is there any kind of RULES for this?
undefined behaviour means the compiler can emit any code it likes. Your program might show results that you expect or it might format your harddrive or it can start sending emails to the taliban. anything can happen
The definition of undefined behavior:
C11(ISO/IEC 9899:201x) §3.4.3
1 undefined behavior
behavior, upon use of a non portable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
There is also a list of undefined behaviors in C11 §J.2 Undefined behavior
A behavior, upon use of a non-portable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements.
Example:
i = ++i;
For more you can read this.
I think, in simple words, if the behavior of the instruction is not guarenteed to be consistant across all compilers or all situation, you can say it as undefined behavior.
This can be illustrated by an example,
#include "stdio.h"
int *ptr;
void func2()
{
int k = 300;
}
void func1()
{
int t = 100;
ptr = &t;
}
int main(int argc, char *argv)
{
func1();
printf("The value of t=%d\r\n",*ptr);
func2();
printf("The value of t=%d\r\n",*ptr);
}
On my machine, I got the following.
joshis1#(none) temp]$ ./ud.out
The value of t=100
The value of t=300
This tells that the value of t was not guaranteed. Once the scope of t was over, the stack space was allocated to k. Thus, ptr was accessing the same address - memory location . But the variable scope was over. You will get the consistent result, in case you don't call func2(); Thus, compiler doesn't guarantee the outcome -> This is called the undefined behaviour.
I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?
I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.
Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.
Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.
If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour
When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)
It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.
I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.
Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.