The code:
#include <stdio.h>
int main(int argc, char *argv[])
{
//what happens?
10*10;
//what happens?
printf("%d", 10*10);
return 0;
}
What happens in memory/compilation in this two lines. Does it is stored? (10*10)
The statement
10*10;
has no effect. The compiler may choose to not generate any code at all for this statement. On the other hand,
printf("%d", 10*10);
passes the result of 10*10 to the printf function, which prints the result (100) to the standard output.
Ask your compiler! They'll probably all have an interesting answer.
Here's what gcc -c noop.c -o noop.o -g3 had to say (I ran the object code through objdump --disassemble --source to produce the output below):
#include <stdio.h>
void test_code()
{
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
10*10;
//what happens?
printf("%d", 10*10);
4: b8 00 00 00 00 mov $0x0,%eax
9: be 64 00 00 00 mov $0x64,%esi
e: 48 89 c7 mov %rax,%rdi
11: b8 00 00 00 00 mov $0x0,%eax
16: e8 00 00 00 00 callq 1b <test_code+0x1b>
}
1b: 5d pop %rbp
1c: c3 retq
My compiler took the 10*10 being passed to printf and multiplied it at compile time and then used the result as an immediate ($064, aka 100 in decimal) and put it into a register to be used for printf:
mov $0x64,%esi
The 10*10 expression not assigned to any identifier was elided. Note that it's likely possible to find some compiler somewhere that decides to execute this computation and store it in registers.
In first question nothing, an expression like that is converted to a value by the compiler, and as you are not assigning to a variable it does nothing, the compiler removes it.
In the second one the value 100 is passed to printf.
You must note that it depends on compiler what to does, in ones tu willl be preparsed, in others will be executed the operation.
10*10;
Not stored. My guess is that it should give a compiler warning or error.
printf("%d", 10*10);
Should print: 100. The value of (10*10) is calculated (most likely by the compiler, not at run-time), and then sent to printf() by pushing the value (100) onto the stack. Hence, the value is stored on the stack until the original (pre-call-to-printf()) stack frame is restored upon printf()'s return.
In the first case, since the operation is not used anywhere, the compiler may optimise your code and not execute the instruction at all.
In the second case, the value is calculated using registers (stack) and printed to the console, not stored anywhere else.
The C standard describes what the program does on an abstract machine.
But to really decide what actually happens, you need to always keep in mind one rule: The compiler must only output code with observable behavior if no constraint was violated "as if" it did what you said.
It is explicitly allowed to use any other way to achieve that result it favors.
This rule is known colloquially as the "as-if"-rule.
Thus, your program is equal to e.g:
#include <stdio.h>
int main(void) {
fputs("100", stdout);
}
Or
#include <stdio.h>
int main(void) {
putchar('1');
putchar('0');
putchar('0');
}
Related
The GNU return address documentation states that __builtin_return_address(1) yields the return address of the caller of the current function.
Could someone expand on what this description means? After doing some testing, I realized it doesn't seem to be doing what I expect it to do (so I may not be understanding it correctly).
For example, I made the following very simple test code to further understand how this function work (as well as the others):
#include <stdio.h>
#include <stdint.h>
void foo(uint64_t x, uint64_t y){
void *ptr = __builtin_extract_return_addr(__builtin_return_address(0));
void *ptr2 = __builtin_extract_return_addr(__builtin_return_address(1));
printf("ret_addr(0)=%p\nret_addr(1)=%p\n", ptr, ptr2);
return;
}
int main(int argc, char **argv)
{
foo(1,1);
foo(1,1);
}
Disassembled code (for reference)
0000000000400536 <main>:
400536: 55 push %rbp
------- skipped -------
40054f: e8 93 ff ff ff callq 4004e7 <foo>
400554: be 01 00 00 00 mov $0x1,%esi
------- skipped -------
40055e: e8 84 ff ff ff callq 4004e7 <foo>
400563: b8 00 00 00 00 mov $0x0,%eax
------- skipped -------
Upon executing this code, the following is outputted:
ret_addr(0)=0x400554
ret_addr(1)=0x7f609c67cb97
ret_addr(0)=0x400563
ret_addr(1)=0x7f609c67cb97
So from this, I was a bit confused and wanted to ask for clarification.
I can see that __builtin_return_address(0) [ret_addr(0)] works fine as it returns the correct return address value of 0x400554 and 0x400563.
However, for __bulitin_return_address(1) [ret_addr(1)], shouldn't the returned value be 0x40054f and 0x40055e? because those two addresses are the callq 4004e7 <foo> instruction as shown in the disassembled code (and this is what I am understanding from the description).
Instead, I get some garbage value of 0x7f609c67cb97, and this value is the same for both foo functions, which even for garbage value, I would expect both to be different.
So to summarize, what is the purpose of __builtin_return_address(1) function? Is it supposed to return the exact address of the caller of the current function? (rather than simply finding the return address). If not, is it possible to find such an address? (I am thinking this may be a bit too difficult)
I believe my question is sort of similar to this: Getting the caller's Return Address.
__builtin_return_address(N) return return address to N-th caller. In your case __builtin_return_address(1) would return the return the address of caller of caller of foo i.e. caller of main i.e. Glibc startup code.
The N = 0 case means the immediate caller and always works, as you see in your example. Other values (N > 0) will normally rely on frame pointers being present which are only available when you compile with -fno-omit-frame-pointer flag. When frame pointers are not available, the code generated for __builtin_return_address(N) will return garbage or even crash for non-zero N's.
Is there a limit in nested calls of functions according to C99?
Example:
result = fn1( fn2( fn3( ... fnN(parN1, parN2) ... ), par2), par1);
NOTE: this code is definitely not a good practice because hard to manage; however, this code is generated automatically from a model, so manageability issues does not apply.
There is not directly a limitation, but a compiler is only required to allow some minimum limits for various categories:
From the C11 standard:
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: 18)
...
63 nesting levels of parenthesized expressions within a full expression
...
4095 characters in a logical source line
18) Implementations should avoid imposing fixed translation limits whenever possible
No. There is no limit.
As a example, this is a C snippet:
int func1(int a){return a;}
int func2(int a){return a;}
int func3(int a){return a;}
void main()
{
func1(func2(func3(16)));
}
The corresponding assembly code is:
0000000000000024 <main>:
24: 55 push %rbp
25: 48 89 e5 mov %rsp,%rbp
28: bf 10 00 00 00 mov $0x10,%edi
2d: e8 00 00 00 00 callq 32 <main+0xe>
32: 89 c7 mov %eax,%edi
34: e8 00 00 00 00 callq 39 <main+0x15>
39: 89 c7 mov %eax,%edi
3b: e8 00 00 00 00 callq 40 <main+0x1c>
40: 90 nop
41: 5d pop %rbp
42: c3 retq
The %edi register stores the result of each function and the %eax register stores the argument. As you can see, there are three callq instructions which correspond to three function calls. In other words, these nested functions are called one by one. There is no need to worry about the stack.
As mentioned in comments, compiler may crash when the code nests too deep.
I write a simple Python script to test this.
nest = 64000
funcs=""
call=""
for i in range(1, nest+1):
funcs += "int func%d(int a){return a;}\n" %i
call += "func%d(" %i
call += str(1) # parameter
call += ")" * nest + ";" # right parenthesis
content = '''
%s
void main()
{
%s
}
''' %(funcs, call)
with open("test.c", "w") as fd:
fd.write(content)
nest = 64000 is OK, but 640000 will cause gcc-5.real: internal compiler error: Segmentation fault (program cc1).
No. Since these functions are executed one by one, there is no issue.
int res;
res = fnN(parN1, parN2);
....
res = fn2(res, par2);
res = fn1(res, par1);
The execution is linear with previous result being used for next function call.
Edit: As explained in comments, there might be a problem with parser and/or compiler to deal with such ugly code.
If this is not a purely theoretical question, the answer is probably "Try to rewrite your code so you don't need to do that, because the limit is more than enough for most sane use cases". If this is purely theoretical, or you really do need to worry about this limit and can't just rewrite, read on.
Section 5.2.4 of the C11 standard (latest draft, which is freely available and almost identical) specifies various limits on what implementations are required to support. If I'm reading that right, you can go up to 63 levels of nesting.
However, implementations are allowed to support more, and in practice they probably do. I had trouble finding the appropriate documentation for GCC (the closest I found was for expressions in the preprocessor), but I expect it doesn't have a hard limit except for system resources when compiling.
I'm trying to tweak the rules a little bit here, and malloc a buffer,
then copy a function to the buffer.
Calling the buffered function works, but the function throws a Segmentation fault when i'm trying to call another function within.
Any thoughts why?
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
int foo(int x)
{
printf("%d\n", x);
}
int bar(int x)
{
}
int main()
{
int foo_size = bar - foo;
void* buf_ptr;
buf_ptr = malloc(1024);
memcpy(buf_ptr, foo, foo_size);
mprotect((void*)(((int)buf_ptr) & ~(sysconf(_SC_PAGE_SIZE) - 1)),
sysconf(_SC_PAGE_SIZE),
PROT_READ|PROT_WRITE|PROT_EXEC);
int (*ptr)(int) = buf_ptr;
printf("%d\n", ptr(3));
return 0;
}
This code will throw a segfault, unless i'll change the foo function to:
int foo(int x)
{
//Anything but calling another function.
x = 4;
return x;
}
NOTE:
The code successfully copies foo into the buffer, i know i made some assumptions, but on my platform they're ok.
Your code is not position independent and even if it were, you don't have the correct relocations to move it to an arbitrary position. Your call to printf (or any other function) will be done with pc-relative addressing (through the PLT, but that's besides the point here). This means that the instruction generated to call printf isn't a call to a static address but rather "call the function X bytes from the current instruction pointer". Since you moved the code the call is done to a bad address. (I'm assuming i386 or amd64 here, but generally it's a safe assumption, people who are on weird platforms usually mention that).
More specifically, x86 has two different instructions for function calls. One is a call relative to the instruction pointer which determines the destination of the function call by adding a value to the current instruction pointer. This is the most commonly used function call. The second instruction is a call to a pointer inside a register or memory location. This is much less commonly used by compilers because it requires more memory indirections and stalls the pipeline. The way shared libraries are implemented (your call to printf will actually go to a shared library) is that for every function call you make outside of your own code the compiler will insert fake functions near your code (this is the PLT I mentioned above). Your code does a normal pc-relative call to this fake function and the fake function will find the real address to printf and call that. It doesn't really matter though. Almost any normal function call you make will be pc-relative and will fail. Your only hope in code like this are function pointers.
You might also run into some restrictions on executable mprotect. Check the return value of mprotect, on my system your code doesn't work for one more reason: mprotect doesn't allow me to do this. Probably because the backend memory allocator of malloc has additional restrictions that prevents executable protections of its memory. Which leads me to the next point:
You will break things by calling mprotect on memory that isn't managed by you. That includes memory you got from malloc. You should only mprotect things you've gotten from the kernel yourself through mmap.
Here's a version that demonstrates how to make this work (on my system):
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
int
foo(int x, int (*fn)(const char *, ...))
{
fn("%d\n", x);
return 42;
}
int
bar(int x)
{
return 0;
}
int
main(int argc, char **argv)
{
size_t foo_size = (char *)bar - (char *)foo;
int ps = getpagesize();
void *buf_ptr = mmap(NULL, ps, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, -1, 0);
if (buf_ptr == MAP_FAILED)
err(1, "mmap");
memcpy(buf_ptr, foo, foo_size);
int (*ptr)(int, int (*)(const char *, ...)) = buf_ptr;
printf("%d\n", ptr(3, printf));
return 0;
}
Here, I abuse the knowledge of how the compiler will generate the code for the function call. By using a function pointer I force it to generate a call instruction that isn't pc-relative. Also, I manage the memory allocation myself so that we get the right permissions from start and not run into any restrictions that brk might have. As a bonus we do error handling that actually helped me find a bug in the first version of this experiment and I also corrected other minor bugs (like missing includes) which allowed me to enable warnings in the compiler and catch another potential problem.
If you want to dig deeper into this you can do something like this. I added two versions of the function:
int
oldfoo(int x)
{
printf("%d\n", x);
return 42;
}
int
foo(int x, int (*fn)(const char *, ...))
{
fn("%d\n", x);
return 42;
}
Compile the whole thing and disassemble it:
$ cc -Wall -o foo foo.c
$ objdump -S foo | less
We can now look at the two generated functions:
0000000000400680 <oldfoo>:
400680: 55 push %rbp
400681: 48 89 e5 mov %rsp,%rbp
400684: 48 83 ec 10 sub $0x10,%rsp
400688: 89 7d fc mov %edi,-0x4(%rbp)
40068b: 8b 45 fc mov -0x4(%rbp),%eax
40068e: 89 c6 mov %eax,%esi
400690: bf 30 08 40 00 mov $0x400830,%edi
400695: b8 00 00 00 00 mov $0x0,%eax
40069a: e8 91 fe ff ff callq 400530 <printf#plt>
40069f: b8 2a 00 00 00 mov $0x2a,%eax
4006a4: c9 leaveq
4006a5: c3 retq
00000000004006a6 <foo>:
4006a6: 55 push %rbp
4006a7: 48 89 e5 mov %rsp,%rbp
4006aa: 48 83 ec 10 sub $0x10,%rsp
4006ae: 89 7d fc mov %edi,-0x4(%rbp)
4006b1: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4006b5: 8b 45 fc mov -0x4(%rbp),%eax
4006b8: 48 8b 55 f0 mov -0x10(%rbp),%rdx
4006bc: 89 c6 mov %eax,%esi
4006be: bf 30 08 40 00 mov $0x400830,%edi
4006c3: b8 00 00 00 00 mov $0x0,%eax
4006c8: ff d2 callq *%rdx
4006ca: b8 2a 00 00 00 mov $0x2a,%eax
4006cf: c9 leaveq
4006d0: c3 retq
The instruction for the function call in the printf case is "e8 91 fe ff ff". This is a pc-relative function call. 0xfffffe91 bytes in front of our instruction pointer. It's treated as a signed 32 bit value, and the instruction pointer used in the calculation is the address of the next instruction. So 0x40069f (next instruction) - 0x16f (0xfffffe91 in front is 0x16f bytes behind with signed math) gives us the address 0x400530, and looking at the disassembled code I find this at the address:
0000000000400530 <printf#plt>:
400530: ff 25 ea 0a 20 00 jmpq *0x200aea(%rip) # 601020 <_GLOBAL_OFFSET_TABLE_+0x20>
400536: 68 01 00 00 00 pushq $0x1
40053b: e9 d0 ff ff ff jmpq 400510 <_init+0x28>
This is the magic "fake function" I mentioned earlier. Let's not get into how this works. It's necessary for shared libraries to work and that's all we need to know for now.
The second function generates the function call instruction "ff d2". This means "call the function at the address stored inside the rdx register". No pc-relative addressing and that's why it works.
The compiler is free to generate the code the way it wants provided the observable results are correct (as if rule). So what you do is just an undefined behaviour invocation.
Visual Studio sometimes uses relays. That means that the address of a function just points to a relative jump. That's perfectly allowed per standard because of the as is rule but it would definitely break that kind of construction. Another possibility is to have local internal functions called with relative jumps but outside of the function itself. In that case, your code would not copy them, and the relative calls will just point to random memory. That means that with different compilers (or even different compilation options on same compiler) it could give expected result, crash, or directly end the program without error which is exactly UB.
I think I can explain a bit. First of all, if both your functions have no return statement within, an undefined behaviour is invoked as per standard §6.9.1/12. Secondly, which is most common on a lot of platforms, and yours apparently as well, is the following: relative addresses of functions are hardcoded into binary code of functions. That means, that if you have a call of "printf" within "foo" and then you move (e.g. execute) from another location, that address, from which "printf" should be called, turns bad.
I've written this C code for finding the sum of all integers which are equal to the sum of the factorial of their digits. It takes a minute or so to get the job done without any GCC optimization flags, using -O1 decreased that time by about 15-20 seconds but when I tried it with -O2, -O3 or -Os it gets stuck in an infinite loop.
int main()
{
int i, j, factorials[10];
int Result=0;
for(i=0; i<10; i++)
{
factorials[i]=1;
for(j=i; j>0; j--)
{
factorials[i] *= j;
}
}
for(i=3; i>2; i++) //This is the loop the program gets stuck on
{
int Sum=0, number=i;
while(number)
{
Sum += factorials[number % 10];
number /= 10;
}
if(Sum == i)
Result += Sum;
}
printf("%d\n", Result);
return 0;
}
I've pinpointed that for(i=3; i>2; i++) is the cause of the problem. So obviously i is never less than 2?
Does this have anything to do with the fact that integer overflow behavior is undefined? If so, any more info on what exactly is going on with the program in these cases?
EDIT: I guess I should've mentioned, I am aware of other ways of writing that for loop so that it doesn't use overflowing(I was hoping that INT_MAX+1 would be equal to INT_MIN which is <2) but this was just done as a random test to see what would happen and I posted it here to find out what exactly was going on :)
The loop is for (i = 3; i > 2; i++) and it has no break statements or other exit condition.
Eventually i will reach INT_MAX and then i++ will cause integer overflow which causes undefined behaviour.
Possibly Sum or Result would also overflow before i did.
When a program is guaranteed to trigger undefined behaviour , the entire behaviour of the program is undefined.
gcc is well known for aggressively optimizing out paths that trigger UB . You could inspect the assembly code to see what exactly happened in your case. Perhaps the -O2 and higher cases removed the loop end condition check , but -O1 left it in there and "relied" on INT_MAX + 1 resulting in INT_MIN.
The for loop is for(i=3; i>2; i++) and inside this loop i is not modified, nor is there a break or any other way to exit the loop. You are relying on integer overflow to cause the exit condition to occur, but the compiler doesn't take that into consideration.
Instead, the compiler sees that i starts at 3, and i is only ever incremented, and so i>2 is always true. Thus there is no need for i to exist at all in this context, since this must be an infinite loop.
If you change i to be unsigned int and set the condition for the loop exit to match, this "optimization" will no longer occur.
I find very strange the differences between the assembler results of the following code compiled without optimization and with -Os optimization.
#include <stdio.h>
int main(){
int i;
for(i=3;i>2;i++);
printf("%d\n",i);
return 0;
}
Without optimization the code results:
000000000040052d <main>:
40052d: 55 push %rbp
40052e: 48 89 e5 mov %rsp,%rbp
400531: 48 83 ec 10 sub $0x10,%rsp
400535: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
40053c: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
400543: eb 04 jmp 400549 <main+0x1c>
400545: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400549: 83 7d fc 02 cmpl $0x2,-0x4(%rbp)
40054d: 7f f6 jg 400545 <main+0x18>
40054f: 8b 45 fc mov -0x4(%rbp),%eax
400552: 89 c6 mov %eax,%esi
400554: bf f4 05 40 00 mov $0x4005f4,%edi
400559: b8 00 00 00 00 mov $0x0,%eax
40055e: e8 ad fe ff ff callq 400410 <printf#plt>
400563: b8 00 00 00 00 mov $0x0,%eax
400568: c9 leaveq
400569: c3 retq
and the output is: -2147483648 (as I expect on a PC)
With -Os the code results:
0000000000400400 <main>:
400400: eb fe jmp 400400 <main>
I think the second result is an error!!! I think the compiler should have compiled something corresponding to the code:
printf("%d\n",-2147483648);
As you noticed yourself, signed integer overflow is undefined. The compiler decides to reason about your program assuming that you're smart enough to never cause undefined behavior. So it can conclude that since i is initialized to a number larger than 2 and only gets incremented, it will never be lower or equal to 2, which means that i > 2 can never be false. This in turn means that the loop will never terminate and can be optimized into an infinite loop.
I don't know what are you trying, but if you want to handle integer overflow, just include limits.h at your source code and write down this line inside your for loop.
if (i >= INT_MAX) break;
this will make you able to check your variable does not become greater than can it fit in integer.
As you said, it's undefined behavior, so you can't rely on any particular behavior.
The two things you will most likely see are:
The compiler translates more or less directly to machine code, which does whatever it wants to do when the overflow happens (which is usually to roll over to the most negative value) and still includes the test (which, e.g., will fail if the value rolls over)
The compiler observes that the index variable starts at 3 and always increases, and consequently the loop condition always holds, and so it emits an infinite loop that never bothers to test the loop condition
I was running some tests to see how ++i and i++ translated to asm. I wrote a simple for :
int main()
{
int i;
for(i=0;i<1000000;++i);
return 0;
}
compiled it with gcc test.c -O0 -o test, and checked the asm with objdump -d test:
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) // i=0;
4004f7: eb 04 jmp 4004fd <main+0x11>
4004f9: 83 45 fc 01 addl $0x1,-0x4(%rbp) // ++i;
4004fd: 81 7d fc 3f 42 0f 00 cmpl $0xf423f,-0x4(%rbp) //
400504: 7e f3 jle 4004f9 <main+0xd> //i<1000000;
400506: b8 00 00 00 00 mov $0x0,%eax
40050b: 5d pop %rbp
40050c: c3 retq
so far so good. The weird thing (if i understand asm code correctly) was when instead of i<1000000 i wrote i<10000000000. Exactly same for loop with stopping condition i<10000000000 translated to following assembler code :
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f7: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fb: eb fa jmp 4004f7 <main+0xb>
which is endless loop per my understanding, cause exactly same asm was generated for :
for(i=0;;++i);
The question is, is it really possible that it is compiled to endless loop? Why?
I'm using Ubuntu 13.04, x86_64.
Thanks.
This happens because the maximum value of an int on your architecture can never reach 10000000000. It will overflow at some point before reaching that value. Thus, the condition i < 10000000000 will always evaluate as true, meaning this is an infinite loop.
The compiler is able to deduct this at compile time, which is why it generates appropriate assembly for an infinite loop.
The compiler is able to warn you about this. For that to happen, you can enable the "extra" warning level with:
gcc -Wextra
GCC 4.8.2 for example will tell you:
warning: comparison is always true due to limited range of data type [-Wtype-limits]
for (i = 0; i < 10000000000; ++i);
^
And it even tells you the specific warning option that exactly controls this type of warning (Wtype-limits).
Integer range is: –2,147,483,648 to 2,147,483,647
You are like way above it.
If 10000000000 is outside the range of int, but inside the range of long or long long, for your compiler, then i < 10000000000 casts i to long or long long before making the comparison.
Realising it will always be false, the compiler then removes the redundant comparison.
I should hope there'd been some sort of compiler warning.
It is caused because you are using int for storing such big number.
As a result, the i wraps around itself, and never reaches the termination condition of the for loop.
When you exceed the limit for the data types in C/C++, funny things can happen.
The compiler can detect these things at compile time, and therefore, generates the code for infinite loop in assembly language.
Problem is:
You can't store such a large number in "i".
Look https://en.wikipedia.org/wiki/Integer_%28computer_science%29 for more information.
"i" (the variable) can't reach 10000000000, thus the loop evaluates true always and runs infinite times.
You can either use a smaller number or another container for i, such as the multiprecision library of Boost:
http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/boost_multiprecision/intro.html
This happens because the compiler sees that you are using a condition that can never be false, so the condition is simply never evaluated.
An int can never hold a value that is as large as 10000000000, so the value will always be lower than that. When the variable reaches its maximum value and you try to increase it further, it will wrap around and start from its lowest possible value.
The same removal of the condition happens if you use a literal value of true:
for (i = 0; true; ++i);
The compiler will just make it a loop without a condition, it won't actually evaluate the true value on each iteration to see if it is still true.