for(i=0;i<10000000000;++i) compiles to endless loop? - c

I was running some tests to see how ++i and i++ translated to asm. I wrote a simple for :
int main()
{
int i;
for(i=0;i<1000000;++i);
return 0;
}
compiled it with gcc test.c -O0 -o test, and checked the asm with objdump -d test:
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) // i=0;
4004f7: eb 04 jmp 4004fd <main+0x11>
4004f9: 83 45 fc 01 addl $0x1,-0x4(%rbp) // ++i;
4004fd: 81 7d fc 3f 42 0f 00 cmpl $0xf423f,-0x4(%rbp) //
400504: 7e f3 jle 4004f9 <main+0xd> //i<1000000;
400506: b8 00 00 00 00 mov $0x0,%eax
40050b: 5d pop %rbp
40050c: c3 retq
so far so good. The weird thing (if i understand asm code correctly) was when instead of i<1000000 i wrote i<10000000000. Exactly same for loop with stopping condition i<10000000000 translated to following assembler code :
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f7: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fb: eb fa jmp 4004f7 <main+0xb>
which is endless loop per my understanding, cause exactly same asm was generated for :
for(i=0;;++i);
The question is, is it really possible that it is compiled to endless loop? Why?
I'm using Ubuntu 13.04, x86_64.
Thanks.

This happens because the maximum value of an int on your architecture can never reach 10000000000. It will overflow at some point before reaching that value. Thus, the condition i < 10000000000 will always evaluate as true, meaning this is an infinite loop.
The compiler is able to deduct this at compile time, which is why it generates appropriate assembly for an infinite loop.
The compiler is able to warn you about this. For that to happen, you can enable the "extra" warning level with:
gcc -Wextra
GCC 4.8.2 for example will tell you:
warning: comparison is always true due to limited range of data type [-Wtype-limits]
for (i = 0; i < 10000000000; ++i);
^
And it even tells you the specific warning option that exactly controls this type of warning (Wtype-limits).

Integer range is: –2,147,483,648 to 2,147,483,647
You are like way above it.

If 10000000000 is outside the range of int, but inside the range of long or long long, for your compiler, then i < 10000000000 casts i to long or long long before making the comparison.
Realising it will always be false, the compiler then removes the redundant comparison.
I should hope there'd been some sort of compiler warning.

It is caused because you are using int for storing such big number.
As a result, the i wraps around itself, and never reaches the termination condition of the for loop.
When you exceed the limit for the data types in C/C++, funny things can happen.
The compiler can detect these things at compile time, and therefore, generates the code for infinite loop in assembly language.

Problem is:
You can't store such a large number in "i".
Look https://en.wikipedia.org/wiki/Integer_%28computer_science%29 for more information.
"i" (the variable) can't reach 10000000000, thus the loop evaluates true always and runs infinite times.
You can either use a smaller number or another container for i, such as the multiprecision library of Boost:
http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/boost_multiprecision/intro.html

This happens because the compiler sees that you are using a condition that can never be false, so the condition is simply never evaluated.
An int can never hold a value that is as large as 10000000000, so the value will always be lower than that. When the variable reaches its maximum value and you try to increase it further, it will wrap around and start from its lowest possible value.
The same removal of the condition happens if you use a literal value of true:
for (i = 0; true; ++i);
The compiler will just make it a loop without a condition, it won't actually evaluate the true value on each iteration to see if it is still true.

Related

Why assembly code is different for simple C program with different gcc version?

I'm understanding the basics of assembly and c programming.
I compiled following simple program in C,
#include <stdio.h>
int main()
{
int a;
int b;
a = 10;
b = 88
return 0;
}
Compiled with following command,
gcc -ggdb -fno-stack-protector test.c -o test
The disassembled code for above program with gcc version 4.4.7 is:
5 push %ebp
89 e5 mov %esp,%ebp
83 ec 10 sub $0x10,%esp
c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
c7 45 fc 58 00 00 00 movl $0x58,-0x4(%ebp)
b8 00 00 00 00 mov $0x0,%eax
c9 leave
c3 ret
90 nop
However disassembled code for same program with gcc version 4.3.3 is:
8d 4c 23 04 lea 0x4(%esp), %ecx
83 e4 f0 and $0xfffffff0, %esp
55 push -0x4(%ecx)
89 e5 mov %esp,%ebp
51 push %ecx
83 ec 10 sub $0x10,%esp
c7 45 f4 0a 00 00 00 00 movl $0xa, -0xc(%ebp)
c7 45 f8 58 00 00 00 00 movl $0x58, -0x8(%ebp)
b8 00 00 00 00 mov $0x0, %eax
83 c4 10 add $0x10,%esp
59 pop %ecx
5d pop %ebp
8d 61 fc lea -0x4(%ecx),%esp
c3 ret
Why there is difference in the assembly code?
As you can see in second assembled code, Why pushing %ecx on stack?
What is significance of and $0xfffffff0, %esp?
note: OS is same
Compilers are not required to produce identical assembly code for the same source code. The C standard allows the compiler to optimize the code as they see fit as long as the observable behaviour is the same. So, different compilers may generate different assembly code.
For your code, GCC 6.2 with -O3 generates just:
xor eax, eax
ret
because your code essentially does nothing. So, it's reduced to a simple return statement.
To give you some idea, how many ways exists to create valid code for particular task, I thought this example may help.
From time to time there are size coding competitions, obviously targetting Assembly programmers, as you can't compete with compiler against hand written assembly at this level at all.
The competition tasks are fairly trivial to make the entry level and total effort reasonable, with precise input and output specifications (down to single byte or pixel perfection).
So you have almost trivial exact task, human produced code (at the moment still outperforming compilers for trivial task), with single simple rule "minimal size" as a goal.
With your logic it's absolutely clear every competitor should produce the same result.
The real world answer to this is for example:
Hugi Size Coding Competition Series - Compo29 - Random Maze Builder
12 entries, size of code (in bytes): 122, 122, 128, 135, 136, 137, 147, ... 278 (!).
And I bet the first two entries, both having 122B are probably different enough (too lazy to actually check them).
Now producing valid machine code from high level programming language and by machine (compiler) is lot more complex task. And compilers can't compete with humans in reasoning, most of the "how good code is produced by c++ compiler" stems from C++ language itself being defined quite close to machine code (easy to compile) and from brute CPU power allowing the compilers to work on thousands of variants for particular code path, searching for near-optimal solution mostly by brute force.
Still the numerical "reasoning" behind the optimizers are state of art in their own way, getting to the point where human are still unreachable, but more like in their own way, just as humans can't achieve the efficiency of compilers within reasonable effort for full-sized app compilation.
At this point reasoning about some debug code being different in few helper prologue/epilogue instructions... Even if you would find difference in optimized code, and the difference being "obvious" to human, it's still quite a feat the compiler can produce at least that, as compiler has to apply universal rules on specific code, without truly understanding the context of task.

Why does GCC -O2 and -O3 optimization break this program?

I've written this C code for finding the sum of all integers which are equal to the sum of the factorial of their digits. It takes a minute or so to get the job done without any GCC optimization flags, using -O1 decreased that time by about 15-20 seconds but when I tried it with -O2, -O3 or -Os it gets stuck in an infinite loop.
int main()
{
int i, j, factorials[10];
int Result=0;
for(i=0; i<10; i++)
{
factorials[i]=1;
for(j=i; j>0; j--)
{
factorials[i] *= j;
}
}
for(i=3; i>2; i++) //This is the loop the program gets stuck on
{
int Sum=0, number=i;
while(number)
{
Sum += factorials[number % 10];
number /= 10;
}
if(Sum == i)
Result += Sum;
}
printf("%d\n", Result);
return 0;
}
I've pinpointed that for(i=3; i>2; i++) is the cause of the problem. So obviously i is never less than 2?
Does this have anything to do with the fact that integer overflow behavior is undefined? If so, any more info on what exactly is going on with the program in these cases?
EDIT: I guess I should've mentioned, I am aware of other ways of writing that for loop so that it doesn't use overflowing(I was hoping that INT_MAX+1 would be equal to INT_MIN which is <2) but this was just done as a random test to see what would happen and I posted it here to find out what exactly was going on :)
The loop is for (i = 3; i > 2; i++) and it has no break statements or other exit condition.
Eventually i will reach INT_MAX and then i++ will cause integer overflow which causes undefined behaviour.
Possibly Sum or Result would also overflow before i did.
When a program is guaranteed to trigger undefined behaviour , the entire behaviour of the program is undefined.
gcc is well known for aggressively optimizing out paths that trigger UB . You could inspect the assembly code to see what exactly happened in your case. Perhaps the -O2 and higher cases removed the loop end condition check , but -O1 left it in there and "relied" on INT_MAX + 1 resulting in INT_MIN.
The for loop is for(i=3; i>2; i++) and inside this loop i is not modified, nor is there a break or any other way to exit the loop. You are relying on integer overflow to cause the exit condition to occur, but the compiler doesn't take that into consideration.
Instead, the compiler sees that i starts at 3, and i is only ever incremented, and so i>2 is always true. Thus there is no need for i to exist at all in this context, since this must be an infinite loop.
If you change i to be unsigned int and set the condition for the loop exit to match, this "optimization" will no longer occur.
I find very strange the differences between the assembler results of the following code compiled without optimization and with -Os optimization.
#include <stdio.h>
int main(){
int i;
for(i=3;i>2;i++);
printf("%d\n",i);
return 0;
}
Without optimization the code results:
000000000040052d <main>:
40052d: 55 push %rbp
40052e: 48 89 e5 mov %rsp,%rbp
400531: 48 83 ec 10 sub $0x10,%rsp
400535: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
40053c: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
400543: eb 04 jmp 400549 <main+0x1c>
400545: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400549: 83 7d fc 02 cmpl $0x2,-0x4(%rbp)
40054d: 7f f6 jg 400545 <main+0x18>
40054f: 8b 45 fc mov -0x4(%rbp),%eax
400552: 89 c6 mov %eax,%esi
400554: bf f4 05 40 00 mov $0x4005f4,%edi
400559: b8 00 00 00 00 mov $0x0,%eax
40055e: e8 ad fe ff ff callq 400410 <printf#plt>
400563: b8 00 00 00 00 mov $0x0,%eax
400568: c9 leaveq
400569: c3 retq
and the output is: -2147483648 (as I expect on a PC)
With -Os the code results:
0000000000400400 <main>:
400400: eb fe jmp 400400 <main>
I think the second result is an error!!! I think the compiler should have compiled something corresponding to the code:
printf("%d\n",-2147483648);
As you noticed yourself, signed integer overflow is undefined. The compiler decides to reason about your program assuming that you're smart enough to never cause undefined behavior. So it can conclude that since i is initialized to a number larger than 2 and only gets incremented, it will never be lower or equal to 2, which means that i > 2 can never be false. This in turn means that the loop will never terminate and can be optimized into an infinite loop.
I don't know what are you trying, but if you want to handle integer overflow, just include limits.h at your source code and write down this line inside your for loop.
if (i >= INT_MAX) break;
this will make you able to check your variable does not become greater than can it fit in integer.
As you said, it's undefined behavior, so you can't rely on any particular behavior.
The two things you will most likely see are:
The compiler translates more or less directly to machine code, which does whatever it wants to do when the overflow happens (which is usually to roll over to the most negative value) and still includes the test (which, e.g., will fail if the value rolls over)
The compiler observes that the index variable starts at 3 and always increases, and consequently the loop condition always holds, and so it emits an infinite loop that never bothers to test the loop condition

time taken by for loop with (2^32-1) or (2^64-1) or more is same

I calculated the time taken by a for loop for (i=0; i<4294967295;i++) in the C language. Surprising, it is very short (80-88 ns) on my node (speed 1600Mhz). Later, I tried to run the two for loop one above the other (i.e. for(j=0; j<4294967295;j++) for(i=0; i<4294967295;i++). Surprisingly, this time is also short and same (i.e., 80ns). Could somebody explain me, why the time too low while running so many i++ in the for loop. Additionally, when I run two or three for loops, why the time taken by these many i++ is same. Many thanks in advance for a reply!
If your loop is without side-effects, probably the compiler is optimizing it away completely. To trick the compiler into generating the loop anyway a common trick is to insert an asm nop inside the loop (compilers usually don't mess with hand-inserted assembly, and its cost is negligible).
I did an experiment with gcc and here is my results. Basically as you can see below in my post compiler removes empty/idle loops in high optimization mode.
Source file:
#include <stdio.h>
int main(void) {
int i;
for (i=0; i<1024; i++);
return 0;
}
Compilation with no optimization:
gcc -O0 main.c
Program dissasembly with no optimization:
00000000004004ed <main>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f8: eb 04 jmp 4004fe <main+0x11>
4004fa: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fe: 81 7d fc ff 03 00 00 cmpl $0x3ff,-0x4(%rbp)
400505: 7e f3 jle 4004fa <main+0xd>
400507: b8 00 00 00 00 mov $0x0,%eax
40050c: 5d pop %rbp
40050d: c3 retq
40050e: 66 90 xchg %ax,%ax
Compilation with maximum optimization level:
gcc -O3 main.c
Program dissasembly with maximum optimization level:
0000000000400400 <main>:
400400: 31 c0 xor %eax,%eax
400402: c3 retq
You can dissasembly the program with the following command line tool:
objdump -d a.out
Besides you can always disable compiler optimization for any function you want with following compiler directives.
Under GCC compiler you can turn off optimization for selected functions manually with compiler directives like in the example below.
#pragma GCC push_options
#pragma GCC optimize ("O0")
static void your_not_optimized_function() {
// your code
}
#pragma GCC pop_options
Under VC compiler you can turn off optimization for selected functions manually with compiler directives like in the example below.
#pragma optimize( "", off )
static void your_not_optimized_function() {
// your code
}
#pragma optimize( "", on )
To prevent your loops from being optimized out by the compiler, you need to do something unpredictable inside the loops. The easiest thing to do is call a random number generator, like this
srand(time(NULL));
int total = 0;
for ( int i = 0; i < 1000; i++ )
for ( int j = 0; j < 1000; j++ )
total += rand();
printf( "%d\n", total );
Note that you also have to do something with the results, e.g. print the total. Otherwise, the compiler can still optimize out the loops.
At least in the case of Microsoft compilers, you can use volatile on a variable to prevent the compiler from optimizing the loop away, but this will force the variable to be in memory instead of a register. You could also write a small test loop in assembler, sort of a very simple processor benchmark.
In a more realistic situation, where the loop is actually doing something, it shouldn't get optimized away and you'll be able to time it.

What happens to this in memory/compilation?

The code:
#include <stdio.h>
int main(int argc, char *argv[])
{
//what happens?
10*10;
//what happens?
printf("%d", 10*10);
return 0;
}
What happens in memory/compilation in this two lines. Does it is stored? (10*10)
The statement
10*10;
has no effect. The compiler may choose to not generate any code at all for this statement. On the other hand,
printf("%d", 10*10);
passes the result of 10*10 to the printf function, which prints the result (100) to the standard output.
Ask your compiler! They'll probably all have an interesting answer.
Here's what gcc -c noop.c -o noop.o -g3 had to say (I ran the object code through objdump --disassemble --source to produce the output below):
#include <stdio.h>
void test_code()
{
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
10*10;
//what happens?
printf("%d", 10*10);
4: b8 00 00 00 00 mov $0x0,%eax
9: be 64 00 00 00 mov $0x64,%esi
e: 48 89 c7 mov %rax,%rdi
11: b8 00 00 00 00 mov $0x0,%eax
16: e8 00 00 00 00 callq 1b <test_code+0x1b>
}
1b: 5d pop %rbp
1c: c3 retq
My compiler took the 10*10 being passed to printf and multiplied it at compile time and then used the result as an immediate ($064, aka 100 in decimal) and put it into a register to be used for printf:
mov $0x64,%esi
The 10*10 expression not assigned to any identifier was elided. Note that it's likely possible to find some compiler somewhere that decides to execute this computation and store it in registers.
In first question nothing, an expression like that is converted to a value by the compiler, and as you are not assigning to a variable it does nothing, the compiler removes it.
In the second one the value 100 is passed to printf.
You must note that it depends on compiler what to does, in ones tu willl be preparsed, in others will be executed the operation.
10*10;
Not stored. My guess is that it should give a compiler warning or error.
printf("%d", 10*10);
Should print: 100. The value of (10*10) is calculated (most likely by the compiler, not at run-time), and then sent to printf() by pushing the value (100) onto the stack. Hence, the value is stored on the stack until the original (pre-call-to-printf()) stack frame is restored upon printf()'s return.
In the first case, since the operation is not used anywhere, the compiler may optimise your code and not execute the instruction at all.
In the second case, the value is calculated using registers (stack) and printed to the console, not stored anywhere else.
The C standard describes what the program does on an abstract machine.
But to really decide what actually happens, you need to always keep in mind one rule: The compiler must only output code with observable behavior if no constraint was violated "as if" it did what you said.
It is explicitly allowed to use any other way to achieve that result it favors.
This rule is known colloquially as the "as-if"-rule.
Thus, your program is equal to e.g:
#include <stdio.h>
int main(void) {
fputs("100", stdout);
}
Or
#include <stdio.h>
int main(void) {
putchar('1');
putchar('0');
putchar('0');
}

why does compiler store variables in register? [duplicate]

This question already exists:
Why are registers needed (why not only use memory)? [duplicate]
Closed 1 year ago.
Hi I have been reading this kind of stuff in various docs
register
Tells the compiler to store the variable being declared in a CPU register.
In standard C dialects, keyword register uses the following syntax:
register data-definition;
The register type modifier tells the compiler to store the variable being declared in a CPU register (if possible), to optimize access. For example,
register int i;
Note that TIGCC will automatically store often used variables in CPU registers when the optimization is turned on, but the keyword register will force storing in registers even if the optimization is turned off. However, the request for storing data in registers may be denied, if the compiler concludes that there is not enough free registers for use at this place.
http://tigcc.ticalc.org/doc/keywords.html#register
My point is not only about register. My point is why would a compiler stores the variables in memory. The compiler business is to just compile and to generate an object file. At run time the actual memory allocation happens. why would compiler does this business. I mean without running the object file just by compiling the file itself does the memory allocation happens in case of C?
The compiler is generating machine code, and the machine code is used to run your program. The compiler decides what machine code it generates, therefore making decisions about what sort of allocation will happen at runtime. It's not executing them when you type gcc foo.c but later, when you run the executable, it's the code GCC generated that's running.
This means that the compiler wants to generate the fastest code possible and makes as many decisions as it can at compile time, this includes how to allocate things.
The compiler doesn't run the code (unless it does a few rounds for profiling and better code execution), but it has to prepare it - this includes how to keep the variables your program defines, whether to use fast and efficient storage as registers, or using the slower (and more prone to side effects) memory.
Initially, your local variables would simply be assigned location on the stack frame (except of course for memory you explicitly use dynamic allocation for). If your function assigned an int, your compiler would likely tell the stack to grow by a few additional bytes and use that memory address for storing that variable and passing it as operand to any operation your code is doing on that variable.
However, since memory is slower (even when cached), and manipulating it causes more restrictions on the CPU, at a later stage the compiler may decide to try moving some variables into registers. This allocation is done through a complicated algorithm that tries to select the most reused and latency critical variables that can fit within the existing number of logical registers your architecture has (While confirming with various restrictions such as some instructions requiring the operand to be in this or that register).
There's another complication - some memory addresses may alias with external pointers in manners unknown at compilation time, in which case you can not move them into registers. Compilers are usually a very cautious bunch and most of them would avoid dangerous optimizations (otherwise they're need to put up some special checks to avoid nasty things).
After all that, the compiler is still polite enough to let you advise which variable it important and critical to you, in case he missed it, and by marking these with the register keyword you're basically asking him to make an attempt to optimize for this variable by using a register for it, given enough registers are available and that no aliasing is possible.
Here's a little example: Take the following code, doing the same thing twice but with slightly different circumstances:
#include "stdio.h"
int j;
int main() {
int i;
for (i = 0; i < 100; ++i) {
printf ("i'm here to prevent the loop from being optimized\n");
}
for (j = 0; j < 100; ++j) {
printf ("me too\n");
}
}
Note that i is local, j is global (and therefore the compiler doesn't know if anyone else might access him during the run).
Compiling in gcc with -O3 produces the following code for main:
0000000000400540 <main>:
400540: 53 push %rbx
400541: bf 88 06 40 00 mov $0x400688,%edi
400546: bb 01 00 00 00 mov $0x1,%ebx
40054b: e8 18 ff ff ff callq 400468 <puts#plt>
400550: bf 88 06 40 00 mov $0x400688,%edi
400555: 83 c3 01 add $0x1,%ebx # <-- i++
400558: e8 0b ff ff ff callq 400468 <puts#plt>
40055d: 83 fb 64 cmp $0x64,%ebx
400560: 75 ee jne 400550 <main+0x10>
400562: c7 05 80 04 10 00 00 movl $0x0,1049728(%rip) # 5009ec <j>
400569: 00 00 00
40056c: bf c0 06 40 00 mov $0x4006c0,%edi
400571: e8 f2 fe ff ff callq 400468 <puts#plt>
400576: 8b 05 70 04 10 00 mov 1049712(%rip),%eax # 5009ec <j> (loads j)
40057c: 83 c0 01 add $0x1,%eax # <-- j++
40057f: 83 f8 63 cmp $0x63,%eax
400582: 89 05 64 04 10 00 mov %eax,1049700(%rip) # 5009ec <j> (stores j back)
400588: 7e e2 jle 40056c <main+0x2c>
40058a: 5b pop %rbx
40058b: c3 retq
As you can see, the first loop counter sits in ebx, and is incremented on each iteration and compared against the limit.
The second loop however was the dangerous one, and gcc decided to pass the index counter through memory (loading it into rax every iteration). This example serves to show how better off you'd be when using registers, as well as how sometimes you can't.
The compiler needs to translate the code into machine instruction, and tell the computer how to run the code. That include how to make operations (like multiply two numbers) and how to store the data (stack, heap or register).

Resources