Why does GCC -O2 and -O3 optimization break this program? - c

I've written this C code for finding the sum of all integers which are equal to the sum of the factorial of their digits. It takes a minute or so to get the job done without any GCC optimization flags, using -O1 decreased that time by about 15-20 seconds but when I tried it with -O2, -O3 or -Os it gets stuck in an infinite loop.
int main()
{
int i, j, factorials[10];
int Result=0;
for(i=0; i<10; i++)
{
factorials[i]=1;
for(j=i; j>0; j--)
{
factorials[i] *= j;
}
}
for(i=3; i>2; i++) //This is the loop the program gets stuck on
{
int Sum=0, number=i;
while(number)
{
Sum += factorials[number % 10];
number /= 10;
}
if(Sum == i)
Result += Sum;
}
printf("%d\n", Result);
return 0;
}
I've pinpointed that for(i=3; i>2; i++) is the cause of the problem. So obviously i is never less than 2?
Does this have anything to do with the fact that integer overflow behavior is undefined? If so, any more info on what exactly is going on with the program in these cases?
EDIT: I guess I should've mentioned, I am aware of other ways of writing that for loop so that it doesn't use overflowing(I was hoping that INT_MAX+1 would be equal to INT_MIN which is <2) but this was just done as a random test to see what would happen and I posted it here to find out what exactly was going on :)

The loop is for (i = 3; i > 2; i++) and it has no break statements or other exit condition.
Eventually i will reach INT_MAX and then i++ will cause integer overflow which causes undefined behaviour.
Possibly Sum or Result would also overflow before i did.
When a program is guaranteed to trigger undefined behaviour , the entire behaviour of the program is undefined.
gcc is well known for aggressively optimizing out paths that trigger UB . You could inspect the assembly code to see what exactly happened in your case. Perhaps the -O2 and higher cases removed the loop end condition check , but -O1 left it in there and "relied" on INT_MAX + 1 resulting in INT_MIN.

The for loop is for(i=3; i>2; i++) and inside this loop i is not modified, nor is there a break or any other way to exit the loop. You are relying on integer overflow to cause the exit condition to occur, but the compiler doesn't take that into consideration.
Instead, the compiler sees that i starts at 3, and i is only ever incremented, and so i>2 is always true. Thus there is no need for i to exist at all in this context, since this must be an infinite loop.
If you change i to be unsigned int and set the condition for the loop exit to match, this "optimization" will no longer occur.

I find very strange the differences between the assembler results of the following code compiled without optimization and with -Os optimization.
#include <stdio.h>
int main(){
int i;
for(i=3;i>2;i++);
printf("%d\n",i);
return 0;
}
Without optimization the code results:
000000000040052d <main>:
40052d: 55 push %rbp
40052e: 48 89 e5 mov %rsp,%rbp
400531: 48 83 ec 10 sub $0x10,%rsp
400535: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
40053c: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
400543: eb 04 jmp 400549 <main+0x1c>
400545: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400549: 83 7d fc 02 cmpl $0x2,-0x4(%rbp)
40054d: 7f f6 jg 400545 <main+0x18>
40054f: 8b 45 fc mov -0x4(%rbp),%eax
400552: 89 c6 mov %eax,%esi
400554: bf f4 05 40 00 mov $0x4005f4,%edi
400559: b8 00 00 00 00 mov $0x0,%eax
40055e: e8 ad fe ff ff callq 400410 <printf#plt>
400563: b8 00 00 00 00 mov $0x0,%eax
400568: c9 leaveq
400569: c3 retq
and the output is: -2147483648 (as I expect on a PC)
With -Os the code results:
0000000000400400 <main>:
400400: eb fe jmp 400400 <main>
I think the second result is an error!!! I think the compiler should have compiled something corresponding to the code:
printf("%d\n",-2147483648);

As you noticed yourself, signed integer overflow is undefined. The compiler decides to reason about your program assuming that you're smart enough to never cause undefined behavior. So it can conclude that since i is initialized to a number larger than 2 and only gets incremented, it will never be lower or equal to 2, which means that i > 2 can never be false. This in turn means that the loop will never terminate and can be optimized into an infinite loop.

I don't know what are you trying, but if you want to handle integer overflow, just include limits.h at your source code and write down this line inside your for loop.
if (i >= INT_MAX) break;
this will make you able to check your variable does not become greater than can it fit in integer.

As you said, it's undefined behavior, so you can't rely on any particular behavior.
The two things you will most likely see are:
The compiler translates more or less directly to machine code, which does whatever it wants to do when the overflow happens (which is usually to roll over to the most negative value) and still includes the test (which, e.g., will fail if the value rolls over)
The compiler observes that the index variable starts at 3 and always increases, and consequently the loop condition always holds, and so it emits an infinite loop that never bothers to test the loop condition

Related

what is stack smashing (C)?

Code:
int str_join(char *a, const char *b) {
int sz =0;
while(*a++) sz++;
char *st = a -1, c;
*st = (char) 32;
while((c = *b++)) *++st = c;
*++st = 0;
return sz;
}
....
char a[] = "StringA";
printf("string-1 length = %d, String a = %s\n", str_join(&a[0],"StringB"), a);
Output:
string-1 length = 7, char *a = StringA StringB
*** stack smashing detected **** : /T02 terminated
Aborted (core dumped)
I don't understand why it's showing stack smashing? and what is *stack smashing? Or is it my compiler's error?
Well, stack smashing or stack buffer overflow is a rather detailed topic to be discussed here, you can refer to this wiki article for more info.
Coming to the code shown here, the problem is, your array a is not large enough to hold the final concatenated result.
Thereby, by saying
while((c = *b++)) *++st = c;
you're essentially accessing out of bound memory which invokes undefined behavior. This is the reason you're getting the "stack smashing" issue because you're trying to access memory which does not belong to your process.
To solve this, you need to make sure that array a contains enough space to hold both the first and second string concatenated together. You have to provide a larger destination array, in short.
Stack smashing means you've written outside of ("smashed" past/through) the function's storage space for local variables (this area is called the "stack", in most systems and programming languages). You may also find this type of error called "stack overflow" and/or "stack underflow".
In your code, C is probably putting the string pointed to by a on the stack. In your case, the place that causes the stack "smash" is when you increment st beyond the original a pointer and write to where it points, you're writing outside the area the C compiler guarantees to have reserved for the original string assigned into a.
Whenever you write outside an area of memory that is already properly "reserved" in C, that's "undefined behavior" (which just means that the C language/standard doesn't say what happens): usually, you end up overwriting something else in your program's memory (programs typically put other information right next to your variables on the stack, like return addresses and other internal details), or your program tries writing outside of the memory the operating system has "allowed" it to use. Either way, the program typically breaks, sometimes immediately and obviously (for example, with a "segmentation fault" error), sometimes in very hidden ways that don't become obvious until way later.
In this case, your compiler is building your program with special protections to detect this problem and so your programs exits with an error message. If the compiler didn't do that, your program would try to continue to run, except it might end up doing the wrong thing and/or crashing.
The solution comes down to needing to explicitly tell your code to have enough memory for your combined string. You can either do this by explicitly specifying the length of the "a" array to be long enough for both strings, but that's usually only sufficient for simple uses where you know in advance how much space you need. For a general-purpose solution, you'd use a function like malloc to get a pointer to a new chunk of memory from the operating system that has the size you need/want once you've calculated what the full size is going to be (just remember to call free on pointers that you get from malloc and similar functions once you're done with them).
Minimal reproduction example with disassembly analysis
main.c
void myfunc(char *const src, int len) {
int i;
for (i = 0; i < len; ++i) {
src[i] = 42;
}
}
int main(void) {
char arr[] = {'a', 'b', 'c', 'd'};
int len = sizeof(arr);
myfunc(arr, len + 1);
return 0;
}
GitHub upstream.
Compile and run:
gcc -fstack-protector-all -g -O0 -std=c99 main.c
ulimit -c unlimited && rm -f core
./a.out
fails as desired:
*** stack smashing detected ***: terminated
Aborted (core dumped)
Tested on Ubuntu 20.04, GCC 10.2.0.
On Ubuntu 16.04, GCC 6.4.0, I could reproduce with -fstack-protector instead of -fstack-protector-all, but it stopped blowing up when I tested on GCC 10.2.0 as per Geng Jiawen's comment. man gcc clarifies that as suggested by the option name, the -all version adds checks more aggressively, and therefore presumably incurs a larger performance loss:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call "alloca", and functions with buffers larger than or equal to 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits. Only variables that are actually allocated on the stack are considered, optimized away variables or variables allocated in registers don't count.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
Disassembly
Now we look at the disassembly:
objdump -D a.out
which contains:
int main (void){
400579: 55 push %rbp
40057a: 48 89 e5 mov %rsp,%rbp
# Allocate 0x10 of stack space.
40057d: 48 83 ec 10 sub $0x10,%rsp
# Put the 8 byte canary from %fs:0x28 to -0x8(%rbp),
# which is right at the bottom of the stack.
400581: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400588: 00 00
40058a: 48 89 45 f8 mov %rax,-0x8(%rbp)
40058e: 31 c0 xor %eax,%eax
char arr[] = {'a', 'b', 'c', 'd'};
400590: c6 45 f4 61 movb $0x61,-0xc(%rbp)
400594: c6 45 f5 62 movb $0x62,-0xb(%rbp)
400598: c6 45 f6 63 movb $0x63,-0xa(%rbp)
40059c: c6 45 f7 64 movb $0x64,-0x9(%rbp)
int len = sizeof(arr);
4005a0: c7 45 f0 04 00 00 00 movl $0x4,-0x10(%rbp)
myfunc(arr, len + 1);
4005a7: 8b 45 f0 mov -0x10(%rbp),%eax
4005aa: 8d 50 01 lea 0x1(%rax),%edx
4005ad: 48 8d 45 f4 lea -0xc(%rbp),%rax
4005b1: 89 d6 mov %edx,%esi
4005b3: 48 89 c7 mov %rax,%rdi
4005b6: e8 8b ff ff ff callq 400546 <myfunc>
return 0;
4005bb: b8 00 00 00 00 mov $0x0,%eax
}
# Check that the canary at -0x8(%rbp) hasn't changed after calling myfunc.
# If it has, jump to the failure point __stack_chk_fail.
4005c0: 48 8b 4d f8 mov -0x8(%rbp),%rcx
4005c4: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
4005cb: 00 00
4005cd: 74 05 je 4005d4 <main+0x5b>
4005cf: e8 4c fe ff ff callq 400420 <__stack_chk_fail#plt>
# Otherwise, exit normally.
4005d4: c9 leaveq
4005d5: c3 retq
4005d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4005dd: 00 00 00
Notice the handy comments automatically added by objdump's artificial intelligence module.
If you run this program multiple times through GDB, you will see that:
the canary gets a different random value every time
the last loop of myfunc is exactly what modifies the address of the canary
The canary randomized by setting it with %fs:0x28, which contains a random value as explained at:
https://unix.stackexchange.com/questions/453749/what-sets-fs0x28-stack-canary
Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value?
How to debug it?
See: Stack smashing detected

Does anyone know why gcc 4.8.4 optimizes this code in a infinite loop?

I find very strange the differences between the assembler results of the following code compiled without optimization and with -Os optimization.
#include <stdio.h>
int main(){
int i;
for(i=3;i>2;i++);
printf("%d\n",i);
return 0;
}
Without optimization the code results:
000000000040052d <main>:
40052d: 55 push %rbp
40052e: 48 89 e5 mov %rsp,%rbp
400531: 48 83 ec 10 sub $0x10,%rsp
400535: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
40053c: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
400543: eb 04 jmp 400549 <main+0x1c>
400545: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400549: 83 7d fc 02 cmpl $0x2,-0x4(%rbp)
40054d: 7f f6 jg 400545 <main+0x18>
40054f: 8b 45 fc mov -0x4(%rbp),%eax
400552: 89 c6 mov %eax,%esi
400554: bf f4 05 40 00 mov $0x4005f4,%edi
400559: b8 00 00 00 00 mov $0x0,%eax
40055e: e8 ad fe ff ff callq 400410 <printf#plt>
400563: b8 00 00 00 00 mov $0x0,%eax
400568: c9 leaveq
400569: c3 retq
and the output is: -2147483648 (as I expect on a PC)
With -Os the code results:
0000000000400400 <main>:
400400: eb fe jmp 400400 <main>
I think the second result is an error!!! I think the compiler should have compiled something corresponding to the code:
printf("%d\n",-2147483648);
Compiler is working as it should.
Signed integer overflow is illegal in C, and results in undefined behaviour. Any program that relies on it is broken.
Compiler replaces for(i=3;i>2;i++); with while(1);, because it sees that i starts from 3 and only increases, so value can never be less than 3.
Only overflow could result in loop exit. But that is illegal and compiler assumes that you would never do such a dirty thing.
Because there is infinite loop, printf is never reached and can be removed.
Unoptimized version worked only by accident. Compiler could have done the same thing there and it would have been equally valid.
Well, the compiler is allowed to assume that the program will never exhibit undefined behaviour.
You get INT_MIN in the first case, because you have an overflow when INT_MAX + 1 gives INT_MIN (*), but this is undefined behaviour. And the C99 draft (n1556) says at 6.5 Expressions §5: If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
So compiler can say:
loop starts with an index value greater than the limit
index is always increased
if no UB occurs, index will always be greater than the limit => this is an infinite loop
With the as-if rule (5.1.2.3 Program execution §3 An
actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced), it can replace your loop with an infinite loop. Following instructions can no longer be reached and can be removed.
You invoked undefined behaviour and got... undefined behaviour.
(*) and even this is plainly implementation dependant, INT_MIN could be -2147483647if you had 1's complement, 8000000 could be a negative 0, or overflow could raise a signal...

time taken by for loop with (2^32-1) or (2^64-1) or more is same

I calculated the time taken by a for loop for (i=0; i<4294967295;i++) in the C language. Surprising, it is very short (80-88 ns) on my node (speed 1600Mhz). Later, I tried to run the two for loop one above the other (i.e. for(j=0; j<4294967295;j++) for(i=0; i<4294967295;i++). Surprisingly, this time is also short and same (i.e., 80ns). Could somebody explain me, why the time too low while running so many i++ in the for loop. Additionally, when I run two or three for loops, why the time taken by these many i++ is same. Many thanks in advance for a reply!
If your loop is without side-effects, probably the compiler is optimizing it away completely. To trick the compiler into generating the loop anyway a common trick is to insert an asm nop inside the loop (compilers usually don't mess with hand-inserted assembly, and its cost is negligible).
I did an experiment with gcc and here is my results. Basically as you can see below in my post compiler removes empty/idle loops in high optimization mode.
Source file:
#include <stdio.h>
int main(void) {
int i;
for (i=0; i<1024; i++);
return 0;
}
Compilation with no optimization:
gcc -O0 main.c
Program dissasembly with no optimization:
00000000004004ed <main>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f8: eb 04 jmp 4004fe <main+0x11>
4004fa: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fe: 81 7d fc ff 03 00 00 cmpl $0x3ff,-0x4(%rbp)
400505: 7e f3 jle 4004fa <main+0xd>
400507: b8 00 00 00 00 mov $0x0,%eax
40050c: 5d pop %rbp
40050d: c3 retq
40050e: 66 90 xchg %ax,%ax
Compilation with maximum optimization level:
gcc -O3 main.c
Program dissasembly with maximum optimization level:
0000000000400400 <main>:
400400: 31 c0 xor %eax,%eax
400402: c3 retq
You can dissasembly the program with the following command line tool:
objdump -d a.out
Besides you can always disable compiler optimization for any function you want with following compiler directives.
Under GCC compiler you can turn off optimization for selected functions manually with compiler directives like in the example below.
#pragma GCC push_options
#pragma GCC optimize ("O0")
static void your_not_optimized_function() {
// your code
}
#pragma GCC pop_options
Under VC compiler you can turn off optimization for selected functions manually with compiler directives like in the example below.
#pragma optimize( "", off )
static void your_not_optimized_function() {
// your code
}
#pragma optimize( "", on )
To prevent your loops from being optimized out by the compiler, you need to do something unpredictable inside the loops. The easiest thing to do is call a random number generator, like this
srand(time(NULL));
int total = 0;
for ( int i = 0; i < 1000; i++ )
for ( int j = 0; j < 1000; j++ )
total += rand();
printf( "%d\n", total );
Note that you also have to do something with the results, e.g. print the total. Otherwise, the compiler can still optimize out the loops.
At least in the case of Microsoft compilers, you can use volatile on a variable to prevent the compiler from optimizing the loop away, but this will force the variable to be in memory instead of a register. You could also write a small test loop in assembler, sort of a very simple processor benchmark.
In a more realistic situation, where the loop is actually doing something, it shouldn't get optimized away and you'll be able to time it.

What happens to this in memory/compilation?

The code:
#include <stdio.h>
int main(int argc, char *argv[])
{
//what happens?
10*10;
//what happens?
printf("%d", 10*10);
return 0;
}
What happens in memory/compilation in this two lines. Does it is stored? (10*10)
The statement
10*10;
has no effect. The compiler may choose to not generate any code at all for this statement. On the other hand,
printf("%d", 10*10);
passes the result of 10*10 to the printf function, which prints the result (100) to the standard output.
Ask your compiler! They'll probably all have an interesting answer.
Here's what gcc -c noop.c -o noop.o -g3 had to say (I ran the object code through objdump --disassemble --source to produce the output below):
#include <stdio.h>
void test_code()
{
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
10*10;
//what happens?
printf("%d", 10*10);
4: b8 00 00 00 00 mov $0x0,%eax
9: be 64 00 00 00 mov $0x64,%esi
e: 48 89 c7 mov %rax,%rdi
11: b8 00 00 00 00 mov $0x0,%eax
16: e8 00 00 00 00 callq 1b <test_code+0x1b>
}
1b: 5d pop %rbp
1c: c3 retq
My compiler took the 10*10 being passed to printf and multiplied it at compile time and then used the result as an immediate ($064, aka 100 in decimal) and put it into a register to be used for printf:
mov $0x64,%esi
The 10*10 expression not assigned to any identifier was elided. Note that it's likely possible to find some compiler somewhere that decides to execute this computation and store it in registers.
In first question nothing, an expression like that is converted to a value by the compiler, and as you are not assigning to a variable it does nothing, the compiler removes it.
In the second one the value 100 is passed to printf.
You must note that it depends on compiler what to does, in ones tu willl be preparsed, in others will be executed the operation.
10*10;
Not stored. My guess is that it should give a compiler warning or error.
printf("%d", 10*10);
Should print: 100. The value of (10*10) is calculated (most likely by the compiler, not at run-time), and then sent to printf() by pushing the value (100) onto the stack. Hence, the value is stored on the stack until the original (pre-call-to-printf()) stack frame is restored upon printf()'s return.
In the first case, since the operation is not used anywhere, the compiler may optimise your code and not execute the instruction at all.
In the second case, the value is calculated using registers (stack) and printed to the console, not stored anywhere else.
The C standard describes what the program does on an abstract machine.
But to really decide what actually happens, you need to always keep in mind one rule: The compiler must only output code with observable behavior if no constraint was violated "as if" it did what you said.
It is explicitly allowed to use any other way to achieve that result it favors.
This rule is known colloquially as the "as-if"-rule.
Thus, your program is equal to e.g:
#include <stdio.h>
int main(void) {
fputs("100", stdout);
}
Or
#include <stdio.h>
int main(void) {
putchar('1');
putchar('0');
putchar('0');
}

for(i=0;i<10000000000;++i) compiles to endless loop?

I was running some tests to see how ++i and i++ translated to asm. I wrote a simple for :
int main()
{
int i;
for(i=0;i<1000000;++i);
return 0;
}
compiled it with gcc test.c -O0 -o test, and checked the asm with objdump -d test:
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) // i=0;
4004f7: eb 04 jmp 4004fd <main+0x11>
4004f9: 83 45 fc 01 addl $0x1,-0x4(%rbp) // ++i;
4004fd: 81 7d fc 3f 42 0f 00 cmpl $0xf423f,-0x4(%rbp) //
400504: 7e f3 jle 4004f9 <main+0xd> //i<1000000;
400506: b8 00 00 00 00 mov $0x0,%eax
40050b: 5d pop %rbp
40050c: c3 retq
so far so good. The weird thing (if i understand asm code correctly) was when instead of i<1000000 i wrote i<10000000000. Exactly same for loop with stopping condition i<10000000000 translated to following assembler code :
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f7: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fb: eb fa jmp 4004f7 <main+0xb>
which is endless loop per my understanding, cause exactly same asm was generated for :
for(i=0;;++i);
The question is, is it really possible that it is compiled to endless loop? Why?
I'm using Ubuntu 13.04, x86_64.
Thanks.
This happens because the maximum value of an int on your architecture can never reach 10000000000. It will overflow at some point before reaching that value. Thus, the condition i < 10000000000 will always evaluate as true, meaning this is an infinite loop.
The compiler is able to deduct this at compile time, which is why it generates appropriate assembly for an infinite loop.
The compiler is able to warn you about this. For that to happen, you can enable the "extra" warning level with:
gcc -Wextra
GCC 4.8.2 for example will tell you:
warning: comparison is always true due to limited range of data type [-Wtype-limits]
for (i = 0; i < 10000000000; ++i);
^
And it even tells you the specific warning option that exactly controls this type of warning (Wtype-limits).
Integer range is: –2,147,483,648 to 2,147,483,647
You are like way above it.
If 10000000000 is outside the range of int, but inside the range of long or long long, for your compiler, then i < 10000000000 casts i to long or long long before making the comparison.
Realising it will always be false, the compiler then removes the redundant comparison.
I should hope there'd been some sort of compiler warning.
It is caused because you are using int for storing such big number.
As a result, the i wraps around itself, and never reaches the termination condition of the for loop.
When you exceed the limit for the data types in C/C++, funny things can happen.
The compiler can detect these things at compile time, and therefore, generates the code for infinite loop in assembly language.
Problem is:
You can't store such a large number in "i".
Look https://en.wikipedia.org/wiki/Integer_%28computer_science%29 for more information.
"i" (the variable) can't reach 10000000000, thus the loop evaluates true always and runs infinite times.
You can either use a smaller number or another container for i, such as the multiprecision library of Boost:
http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/boost_multiprecision/intro.html
This happens because the compiler sees that you are using a condition that can never be false, so the condition is simply never evaluated.
An int can never hold a value that is as large as 10000000000, so the value will always be lower than that. When the variable reaches its maximum value and you try to increase it further, it will wrap around and start from its lowest possible value.
The same removal of the condition happens if you use a literal value of true:
for (i = 0; true; ++i);
The compiler will just make it a loop without a condition, it won't actually evaluate the true value on each iteration to see if it is still true.

Resources