6.8.5.6
An iteration statement whose controlling expression is not a constant expression,
that performs no input/output operations, does not access volatile objects, and
performs no synchronization or atomic operations in its body, controlling
expression, or (in the case of a for statement) its expression-3, may be assumed
by the implementation to terminate.
Compiler is ready to terminate a loop if the above condition is met. Is that true?
If yes, I was trying to simulate a scenario of this kind but no success.
I tried,
int main()
{
// Some statements...
{
int a = 0;
int b = 100;
int i=0;
while(++i>=0)
{
a = b;
}
}
// Some statements...
return 0;
}
Can anyone help me to simulate this scenario.
Thanks,
Well, the compiler can assume anything there. Integer overflow is Undefined Behavior. If you'd written while (i==0) 6.8.5.6 would apply.
Try with compiler optimization levels like -O2, -O3. It should help you :)
Note:
For GCC family of compilers try -O2 or -O3
For MSVC try /O2 or /O3
Related
For a class, I wanted to demonstrate undefined behavior with goto to the students. I came up with the following program:
#include <stdio.h>
int main()
{
goto x;
for(int i = 0; i < 10; i++)
x: printf("%d\n", i);
return 0;
}
I would expect the compiler (gcc version 4.9.2) to warn me about the access to i being undefined behavior, but there is no warning, not even with:
gcc -std=c99 -Wall -Wextra -pedantic -O0 test.c
When running the program, i is apparently initialized to zero. To understand what is happening, I extended the code with a second variable j:
#include <stdio.h>
int main()
{
goto x;
for(int i = 0, j = 1; i < 10; i++)
x: printf("%d %d\n", i, j);
return 0;
}
Now the compiler warns me that I am accessing j without it being initialized. I understand that, but why is i not uninitialized as well?
Undefined behavior is a run-time phenomenon. Therefore it is quite rare that the compiler will be able to detect it for you. Most cases of undefined behavior are invoked when doing things beyond the scope of the compiler.
To make things even more complicated, the compiler might optimize the code. Suppose it decided to put i in a CPU register but j on the stack or vice versa. And then suppose that during debug build, it sets all stack contents to zero.
If you need to reliably detect undefined behavior, you need a static analysis tool which does checks beyond what is done by a compiler.
Now the compiler warns me that I am accessing j without it being
initialized. I understand that, but why is i not uninitialized as
well?
Thats the point with undefined behavior, it sometimes does work, or not, or partially, or print garbage. The problem is that you can't know what exactly your compiler is doing under the hood to make this, and its not the compiler's fault for producing inconsistent results, since, as you admit yourself, the behavior is undefined.
At that point the only thing thats guaranteed is that nothing is guaranteed as to how this will play out. Different compilers may even give different results, or different optimization levels may.
A compiler is also not required to check for this, and its not required to handle this, so consequently compilers don't. You can't use a compiler to check for undefined behavior reliably, anyways. Thats what unit tests and lots of test cases or statistical analysis is for.
Using "goto" to skip a variable initialization would, per the C Standard, allow a compiler to do anything it wants even on platforms where it would normally yield an Indeterminate Value which may not behave consistently but wouldn't have any other side-effects. The behavior of gcc in this case doesn't seem to have devolved as much as its behavior in case of e.g. integer overflow, but its optimizations may be somewhat interesting though benign. Given:
int test(int x)
{
int y;
if (x) goto SKIP;
y=x+1;
SKIP:
return y*2;
}
int test2(unsigned short y)
{
int q=0;int i;
for (i=0; i<=y; i++)
q+=test(i);
return q;
}
The compiler will observe that in all defined cases, test will return 2, and can thus eliminate the loop by generating code for test2 equivalent to:
int test2(unsigned short y)
{
return (int)y << 1;
}
Such an example, however, may give the impression that compilers treat UB in a benign fashion. Unfortunately, in the case of gcc, that is no longer true in general. It used to be that on machines without hardware traps, compilers would treat uses of Indeterminate Value as simply yielding arbitrary values that may or may not behave in any consistent fashion, but without any other side-effects. I'm not sure of any cases where using goto to skip variable initialization would yet cause side-effects other than having a meaningless value in the variable, but that doesn't mean the authors of gcc won't decide to exploit that freedom in future.
How can we interpret the following program and its success?(Its obvious that there must not be any error message). I mean how does compiler interpret lines 2 and 3 inside main?
#include <stdio.h>
int main()
{
int a,b;
a; //(2)
b; //(3)
return 0;
}
Your
a;
is just an expression statement. As always in C, the full expression in expression statement is evaluated and its result is immediately discarded.
For example, this
a = 2 + 3;
is an expression statement containing full expression a = 2 + 3. That expression evaluates to 5 and also has a side-effect of writing 5 into a. The result is evaluated and discarded.
Expression statement
a;
is treated in the same way, except that is has no side-effects. Since you forgot to initialize your variables, evaluation of the above expression can formally lead to undefined behavior.
Obviously, practical compilers will simply skip such expression statements entirely, since they have no observable behavior.
That's why you should use some compilation warning flags!
-Wall would trigger a "statement with no effect" warning.
If you want to see what the compilation produces, compile using -S.
Try it with your code, with/without -O (optimization) flag...
This is just like you try something like this:
#include <stdio.h>
int main(void){
1;
2;
return 0;
}
As we can see we have here two expressions followed by semicolon (1; and 2;). It is a well formed statement according to the rules of the language.
There is nothing wrong with it, it is just useless.
But if you try to use though statements (a or b) the behavior will be undefined.
Of course that, the compiler will interpret it as a statement with no effect
L.E:
If you run this:
#include <stdio.h>
int main(void){
int a;
int b;
printf("A = %d\n",a);
printf("B = %d\n",b);
if (a < b){
printf("TRUE");
}else{
printf("FALSE");
}
return 0;
}
You wil get:
A = 0
B = 0
FALSE
Because a and b are set to 0;
Sentences in C wich are not control structures (if, switch, for, while, do while) or control statements (break, continue, goto, return) are expressions.
Every expression has a resulting value.
An expression is evaluated for its side effects (change the value of an object, write a file, read volatile objects, and functions doing some of these things).
The final result of such an expression is always discarded.
For example, the function printf() returns an int value, that in general is not used. However this value is produced, and then discarded.
However the function printf() produces side effects, so it has to be processed.
If a sentence has no side effects, then the compiler is free to discard it at all.
I think that for a compiler will not be so hard to check if a sentence has not any side effects. So, what you can expect in this case is that the compiler will choose to do nothing.
Moreover, this will not affect the observable behaviour of the program, so there is no difference in what is obtained in the resulting execution of the program. However, of course, the program will run faster if any computation is ignored at all by the compiler.
Also, note that in some cases the floating point environment can set flags, which are considered side-effects.
The Standard C (C11) says, as part of paragraph 5.1.2.3p.4:
An actual implementation need not evaluate part of an expression if it
can deduce that its value is not used and that no needed side effects
are produced [...]
CONCLUSION: One has to read the documentation of the particular compiler that oneself is using.
I have ran into a problem when I was playing around with the auto parallelization function of the Oracle Solaris Compiler. Let's say I have the following code:
int var = -1;
int i;
for (i = 0; i < 3; i++){
bool flag = false;
// do operations to set the flag
if (flag == true)
var = i;
}
// do other operations with var
when I run this code, the compiler complains that it cannot be parallelized because of unsafe dependences.
Does anyone know what might be wrong here? Is there any way to avoid this but maintain the original functionality of the code?
Any help would be appreciated, thank you!
What the compiler sees is a bunch of loop iterations, all of which can potentially assign to V. If writing to V happens to be atomic, then this will get some random value of i, and one might argue that is OK. Under the assumption that writing to a variable is not atomic, most compilers see this as a data race ... (then what exactly ends up in V?). Thus the complaint.
Context
I was asked the following puzzle by one of my friends:
void fn(void)
{
/* write something after this comment so that the program output is 10 */
/* write something before this comment */
}
int main()
{
int i = 5;
fn();
printf("%d\n", i);
return 0;
}
I know there can be multiple solutions, some involving macro and some assuming something about the implementation and violating C.
One particular solution I was interested in is to make certain assumptions about stack and write following code: (I understand it is undefined behavior, but may work as expected on many implementations)
void fn(void)
{
/* write something after this comment so that the program output is 10 */
int a[1] = {0};
int j = 0;
while(a[j] != 5) ++j; /* Search stack until you find 5 */
a[j] = 10; /* Overwrite it with 10 */
/* write something before this comment */
}
Problem
This program worked fine in MSVC and gcc without optimization. But when I compiled it with gcc -O2 flag or tried on ideone, it loops infinitely in function fn.
My Observation
When I compiled the file with gcc -S vs gcc -S -O2 and compared, it clearly shows gcc kept an infinite loop in function fn.
Question
I understand because the code invokes undefined behavior, one can not call it a bug. But why and how does compiler analyze the behavior and leave an infinite loop at O2?
Many people commented to know the behavior if some of the variables are changed to volatile. The result as expected is:
If i or j is changed to volatile, program behavior remains same.
If array a is made volatile, program does not suffer infinite loop.
Moreover if I apply the following patch
- int a[1] = {0};
+ int aa[1] = {0};
+ int *a = aa;
The program behavior remains same (infinite loop)
If I compile the code with gcc -O2 -fdump-tree-optimized, I get the following intermediate file:
;; Function fn (fn) (executed once)
Removing basic block 3
fn ()
{
<bb 2>:
<bb 3>:
goto <bb 3>;
}
;; Function main (main) (executed once)
main ()
{
<bb 2>:
fn ();
}
Invalid sum of incoming frequencies 0, should be 10000
This verifies the assertions made after the answers below.
This is undefined behavior so the compiler can really do anything at all, we can find a similar example in GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks, where gcc takes a loop with undefined behavior and optimizes it to:
L2:
jmp .L2
The article says (emphasis mine):
Of course this is an infinite loop. Since SATD() unconditionally
executes undefined behavior (it’s a type 3 function), any
translation (or none at all) is perfectly acceptable behavior for a
correct C compiler. The undefined behavior is accessing d[16] just
before exiting the loop. In C99 it is legal to create a pointer to
an element one position past the end of the array, but that pointer
must not be dereferenced. Similarly, the array cell one element past
the end of the array must not be accessed.
which if we examine your program with godbolt we see:
fn:
.L2:
jmp .L2
The logic being used by the optimizer probably goes something like this:
All the elements of a are initialized to zero
a is never modified before or within the loop
So a[j] != 5 is always true -> infinite loop
Because of the infinite, the a[j] = 10; is unreachable and so that can be optimized away, so can a and j since they are no longer needed to determine the loop condition.
which is similar to the case in the article which given:
int d[16];
analyzes the following loop:
for (dd=d[k=0]; k<16; dd=d[++k])
like this:
upon seeing d[++k], is permitted to assume that the incremented value
of k is within the array bounds, since otherwise undefined behavior
occurs. For the code here, GCC can infer that k is in the range 0..15.
A bit later, when GCC sees k<16, it says to itself: “Aha– that
expression is always true, so we have an infinite loop.”
Perhaps an interesting secondary point, is whether an infinite loop is considered observable behavior(w.r.t. to the as-if rule) or not, which effects whether an infinite loop can also be optimized away. We can see from C Compilers Disprove Fermat’s Last Theorem that before C11 there was at least some room for interpretation:
Many knowledgeable people (including me) read this as saying that the
termination behavior of a program must not be changed. Obviously some
compiler writers disagree, or else don’t believe that it matters. The
fact that reasonable people disagree on the interpretation would seem
to indicate that the C standard is flawed.
C11 adds clarification to section 6.8.5 Iteration statements and is covered in more detail in this answer.
In the optimized version, the compiler has decided a few things:
The array a doesn't change before that test.
The array a doesn't contain a 5.
Therefore, we can rewrite the code as:
void fn(void) {
int a[1] = {0};
int j = 0;
while(true) ++j;
a[j] = 10;
}
Now, we can make further decisions:
All the code after the while loop is dead code (unreachable).
j is written but never read. So we can get rid of it.
a is never read.
At this point, your code has been reduced to:
void fn(void) {
int a[1] = {0};
while(true);
}
And we can make the note that a is now never read, so let's get rid of it as well:
void fn(void) {
while(true);
}
Now, the unoptimized code:
In unoptimized generated code, the array will remain in memory. And you'll literally walk it at runtime. And it's possible that there will be a 5 thats readable after it once you walk past the end of the array.
Which is why the unoptimized version sometimes doesn't crash and burn.
If the loop does get optimized out into an infinite loop, it could be due to static code analyzis seeing that your array is
not volatile
contains only 0
never gets written to
and thus it is not possible for it to contain the number 5. Which means an infinite loop.
Even if it didn't do this, your approach could fail easily. For example, it's possible that some compiler would optimize your code without making your loop infinite, but would stuff the contents of i into a register, making it unavailable from the stack.
As a side note, I bet what your friend actually expected was this:
void fn(void)
{
/* write something after this comment so that the program output is 10 */
printf("10\n"); /* Output 10 */
while(1); /* Endless loop, function won't return, i won't be output */
/* write something before this comment */
}
or this (if stdlib.h is included):
void fn(void)
{
/* write something after this comment so that the program output is 10 */
printf("10\n"); /* Output 10 */
exit(0); /* Exit gracefully */
/* write something before this comment */
}
Using GCC (4.0 for me), is this legal:
if(__builtin_expect(setjmp(buf) != 0, 1))
{
// handle error
}
else
{
// do action
}
I found a discussion saying it caused a problem for GCC back in 2003, but I would imagine that they would have fixed it by now. The C standard says that it's illegal to use setjmp unless it's one of four conditions, the relevant one being this:
one operand of a relational or equality operator with the other operand an integer constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement;
But if this is a GCC extension, can I guarantee that it will work under for GCC, since it's already nonstandard functionality? I tested it and it seemed to work, though I don't know how much testing I'd have to do to actually break it. (I'm hiding the call to __builtin_expect behind a macro, which is defined as a no-op for non-GCC, so it would be perfectly legal for other compilers.)
I think that what the standard was talking about was to account for doing something like this:
int x = printf("howdy");
if (setjmp(buf) != x ) {
function_that_might_call_longjmp_with_x(buf, x);
} else {
do_something_about_them_errors();
}
In this case you could not rely on x having the value that it was assigned in the previous line anymore. The compiler may have moved the place where x had been (reusing the register it had been in, or something), so the code that did the comparison would be looking in the wrong spot. (you could save x to another variable, and then reassign x to something else before calling the function, which might make the problem more obvious)
In your code you could have written it as:
int conditional;
conditional = setjump(buf) != 0 ;
if(__builtin_expect( conditional, 1)) {
// handle error
} else {
// do action
}
And I think that we can satisfy ourselves that the line of code that assigns the variable conditional meets that requirement.
But if this is a GCC extension, can I guarantee that it will work under for GCC, since it's already nonstandard functionality? I tested it and it seemed to work, though I don't know how much testing I'd have to do to actually break it. (I'm hiding the call to __builtin_expect behind a macro, which is defined as a no-op for non-GCC, so it would be perfectly legal for other compilers.)
You are correct, __builtin_expect should be a macro no-op for other compilers so the result is still defined.