This question already has answers here:
Compiler changes printf to puts
(2 answers)
Closed last year.
I'm trying to see the disassembled binary of a simple C program in gdb.
C program :
int main(){
int i = 2;
if (i == 0){
printf("YES, it's 0!\n");
}else{
printf("NO");
}
return 0;
}
The disassembled instructions :
0x0000000100401080 <+0>: push rbp
0x0000000100401081 <+1>: mov rbp,rsp
0x0000000100401084 <+4>: sub rsp,0x30
0x0000000100401088 <+8>: call 0x1004010e0 <__main>
0x000000010040108d <+13>: mov DWORD PTR [rbp-0x4],0x2
0x0000000100401094 <+20>: cmp DWORD PTR [rbp-0x4],0x0
0x0000000100401098 <+24>: jne 0x1004010ab <main+43>
0x000000010040109a <+26>: lea rax,[rip+0x1f5f] # 0x100403000
0x00000001004010a1 <+33>: mov rcx,rax
0x00000001004010a4 <+36>: call 0x100401100 <puts>
0x00000001004010a9 <+41>: jmp 0x1004010ba <main+58>
0x00000001004010ab <+43>: lea rax,[rip+0x1f5b] # 0x10040300d
0x00000001004010b2 <+50>: mov rcx,rax
0x00000001004010b5 <+53>: call 0x1004010f0 <printf>
0x00000001004010ba <+58>: mov eax,0x0
0x00000001004010bf <+63>: add rsp,0x30
0x00000001004010c3 <+67>: pop rbp
0x00000001004010c4 <+68>: ret
0x00000001004010c5 <+69>: nop
And I suppose the instruction,
0x00000001004010a4 <+36>: call 0x100401100 <puts>
points to
printf("YES, it's 0!\n");
Now let us assume it is,
then my doubt is why <push> is called here , but <printf> is called at 0x00000001004010b5 <+53>: call 0x1004010f0 <printf> ?
Using the semantics defined in the C Standard, printf("YES, it's 0!\n") produces the same output as puts("YES, it's 0!"), which may be more efficient as the string does not need to be analysed for replacements.
Since the return value is not used, the compiler can replace the printf call with the equivalent call to puts.
This type of optimisation was likely introduced as a way to reduce the executable size for the classic K&R program hello.c. Replacing the printf with puts avoids linking the printf code which is substantially larger than that of puts. In your case, this optimisation is counter productive as both puts and printf are linked, but modern systems use dynamic linking, so it is no longer meaningful to try and reduce executable size this way.
You can play with compiler settings on this Godbolt compiler explorer page to observe compiler behavior:
even with -O0, gcc performs the printf / puts substitution, but clang does not and both compilers generate code for both calls, not optimizing the test if (i == 0), which is OK with optimisations disabled. I suspect the gcc team could not resist biassing size benchmarks even with optimisations disabled.
with -O1 and beyond, both compilers only generate code for the else branch, calling printf.
if you change the second string to just "N", printf is converted to a call to putchar, yet another optimisation.
It's an optimization.
Calling printf with a format string that has no format specifiers and a trailing newline is equivalent to calling puts with the same string with the trailing newline removed.
Since printf has a lot of logic for handling format specifiers but puts just writes the string given, the latter will be faster. So in the case of the first call to printf the compiler sees this equivalence and makes the appropriate substitution.
Related
I am trying to learn more about assembly and which optimizations compilers can and cannot do.
I have a test piece of code for which I have some questions.
See it in action here: https://godbolt.org/z/pRztTT, or check the code and assembly below.
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
for (int j = 0; j < 100; j++) {
if (argc == 2 && argv[1][0] == '5') {
printf("yes\n");
}
else {
printf("no\n");
}
}
return 0;
}
The assembly produced by GCC 10.1 with -O3:
.LC0:
.string "no"
.LC1:
.string "yes"
main:
push rbp
mov rbp, rsi
push rbx
mov ebx, 100
sub rsp, 8
cmp edi, 2
je .L2
jmp .L3
.L5:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
.L2:
mov rax, QWORD PTR [rbp+8]
cmp BYTE PTR [rax], 53
jne .L5
mov edi, OFFSET FLAT:.LC1
call puts
sub ebx, 1
jne .L2
.L4:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
jmp .L4
It seems like GCC produces two versions of the loop: one with the argv[1][0] == '5' condition but without the argc == 2 condition, and one without any condition.
My questions:
What is preventing GCC from splitting away the full condition? It is similar to this question, but there is no chance for the code to get a pointer into argv here.
In the loop without any condition (L3 in assembly), why is the loop body duplicated? Is it to reduce the number of jumps while still fitting in some sort of cache?
GCC doesn't know that printf won't modify memory pointed-to by argv, so it can't hoist that check out of the loop.
argc is a local variable (that can't be pointed-to by any pointer global variable), so it knows that calling an opaque function can't modify it. Proving that a local variable is truly private is part of Escape Analysis.
The OP tested this by copying argv[1][0] into a local char variable first: that let GCC hoist the full condition out of the loop.
In practice argv[1] won't be pointing to memory that printf can modify. But we only know that because printf is a C standard library function, and we assume that main is only called by the CRT startup code with the actual command line args. Not by some other function in this program that passes its own args. In C (unlike C++), main is re-entrant and can be called from within the program.
Also, in GNU C, printf can have custom format-string handling functions registered with it. Although in this case, the compiler built-in printf looks at the format string and optimizes it to a puts call.
So printf is already partly special, but I don't think GCC bothers to look for optimizations based on it not modifying any other globally-reachable memory. With a custom stdio output buffer, that might not even be true. printf is slow; saving some spill / reloads around it is generally not a big deal.
Would (theoretically) compiling puts() together with this main() allow the compiler to see puts() isn't touching argv and optimize the loop fully?
Yes, e.g. if you'd written your own write function that uses an inline asm statement around a syscall instruction (with a memory input-only operand to make it safe while avoiding a "memory" clobber) then it could inline and assume that argv[1][0] wasn't changed by the asm statement and hoist a check based on it. Even if you were outputting argv[1].
Or maybe do inter-procedural optimization without inlining.
Re: unrolling: that's odd, -funroll-loops isn't on by default for GCC at -O3, only with -O3 -fprofile-use. Or if enabled manually.
I have an exam comming up, and I'm strugling with assembly. I have written some simple C code, gotten its assembly code, and then trying to comment on the assembly code as practice. The C code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
int x = 10;
char const* y = argv[1];
printf("%s\n",y );
return 0;
}
Its assembly code:
0x00000000000006a0 <+0>: push %rbp # Creating stack
0x00000000000006a1 <+1>: mov %rsp,%rbp # Saving base of stack into base pointer register
0x00000000000006a4 <+4>: sub $0x20,%rsp # Allocate 32 bytes of space on the stack
0x00000000000006a8 <+8>: mov %edi,-0x14(%rbp) # First argument stored in stackframe
0x00000000000006ab <+11>: mov %rsi,-0x20(%rbp) # Second argument stored in stackframe
0x00000000000006af <+15>: movl $0xa,-0xc(%rbp) # Value 10 stored in x's address in the stackframe
0x00000000000006b6 <+22>: mov -0x20(%rbp),%rax # Second argument stored in return value register
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # ??
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # ??
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # ??
0x00000000000006c6 <+38>: mov %rax,%rdi # Return value copied to 1st argument register - why??
0x00000000000006c9 <+41>: callq 0x560 # printf??
0x00000000000006ce <+46>: mov $0x0,%eax # Value 0 is copied to return register
0x00000000000006d3 <+51>: leaveq # Destroying stackframe
0x00000000000006d4 <+52>: retq # Popping return address, and setting instruction pointer equal to it
Can a friendly soul help me out wherever I have "??" (meaning I don't understand what is happening or I'm unsure)?
0x00000000000006ba <+26>: mov 0x8(%rax),%rax # get argv[1] to rax
0x00000000000006be <+30>: mov %rax,-0x8(%rbp) # move argv[1] to local variable
0x00000000000006c2 <+34>: mov -0x8(%rbp),%rax # move local variable to rax (for move to rdi)
0x00000000000006c6 <+38>: mov %rax,%rdi # now rdi has argv[1]
0x00000000000006c9 <+41>: callq 0x560 # it is puts (optimized)
I will try to make a guess:
mov -0x20(%rbp),%rax # retrieve argv[0]
mov 0x8(%rax),%rax # store argv[1] into rax
mov %rax,-0x8(%rbp) # store argv[1] (which now is in rax) into y
mov -0x8(%rbp),%rax # put y back into rax (which might look dumb, but possibly it has its reasons)
mov %rax,%rdi # copy y to rdi, possibly to prepare the context for the printf
When you deal with assembler, please specify which architecture you are using. An Intel processor might use a different set of instructions from an ARM one, the same instructions might be different or they might rely on different assumptions. As you might know, optimisations change the sequence of assembler instructions generated by the compiler, you might want to specify whether you are using that as well (looks like not?) and which compiler you are using as everyone has its own policy for generating assembler.
Maybe we will never know why the compiler must prepare the context for printf by copying from rax, it could be a compiler's choice or an obligation imposed by the specific architecture. For all those annoying reasons, most of people prefer to use a "high level language" such as C, so that the set of instructions is always right although it might look very dumb for a human (as we know computers are dumb by design) and not always the most choice, that's why there are still many compilers around.
I can give you two more tips:
you IDE must have a way to interleave assembler instructions with C code, and to single step within the assembler. Try to find it out and explore it yourself
the IDE should also have a function to explore the memory of your program. If you find that try to enter the 0x560 address and look were it will lead you. It is very likely that that will be the entry point of your printf
I hope that my answer will help you work it out, good luck
There's a series of problems in SPOJ about creating a function in a single line with some constraints. I've already solved the easy, medium and hard ones, but for the impossible one I keep getting Wrong Answer.
To sum it up, the problem requests to fill in the code of the return statement such that if x is 1, the return value should be 2. For other x values, it should return 3. The constraint is that the letter 'x' can't be used, and no more code can be added; one can only code that return statement. Clearly, to solve this, one must create a hack.
So I've used gcc's built in way to get the stack frame, and then decreased the pointer to get a pointer to the first parameter. Other than that, the statement is just a normal comparison.
On my machine it works fine, but for the cluster (Intel Pentinum G860) used by the online judge, it doesn't work, probably due to a different calling convention. I'm not sure I understood the processor's ABI (I'm not sure if the stack frame pointer is saved on the stack or only on a register), or even if I'm reading the correct ABI.
The question is: what would be the correct way to get the first parameter of a function using the stack?
My code is (it must be formatted this way, otherwise it's not accepted):
#include <stdio.h>
int count(int x){
return (*(((int*)__builtin_frame_address(0))-1) == 1) ? 2 : 3;
}
int main(i){
for(i=1;i%1000001;i++)
printf("%d %d\n",i,count(i));
return 0;
}
The question is: what would be the correct way to get the first
parameter of a function using the stack?
There is no way in portable manner. You must assume specific compiler, its settings and ABI, along with calling conventions.
The gcc compiler is likely to "lay down" an int local variable with -0x4 offset (assuming that sizeof(int) == 4). You might observe with most basic definition of count:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: mov %edi,-0x4(%rbp)
5 return x == 1 ? 2 : 3;
0x00000000004004cb <+7>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cf <+11>: jne 0x4004d8 <count+20>
0x00000000004004d1 <+13>: mov $0x2,%eax
0x00000000004004d6 <+18>: jmp 0x4004dd <count+25>
0x00000000004004d8 <+20>: mov $0x3,%eax
6 }
0x00000000004004dd <+25>: leaveq
0x00000000004004de <+26>: retq
You may also see that %edi register holds first parameter. This is the case for AMD64 ABI (%edi is also not preserved between calls).
Now, with that knowledge, you might write something like:
int count(int x)
{
return *((int*)(__builtin_frame_address(0) - sizeof(int))) == 1 ? 2 : 3;
}
which can be obfuscated as:
return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
However, trick is that such optimizing compiler may enthusiastically assume that since x is not referenced in count, it could simply skip moving into stack. For example it produces following assembly with -O flag:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
5 return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
0x00000000004004c8 <+4>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cc <+8>: setne %al
0x00000000004004cf <+11>: movzbl %al,%eax
0x00000000004004d2 <+14>: add $0x2,%eax
6 }
0x00000000004004d5 <+17>: leaveq
0x00000000004004d6 <+18>: retq
As you can see mov %edi,-0x4(%rbp) instruction is now missing, thus the only way1 would be to access value of x from %edi register:
int count(int x)
{
return ({register int edi asm("edi");edi==1?2:3;});
}
but this method lacks of ability to "obfuscate", as whitespaces are needed for variable declaration, that holds value of %edi.
1) Not necessarily. Even if compiler decides to skip mov operation from register to stack, there is still a possibility to "force" it to do so, by asm("mov %edi,-0x4(%rbp)"); inline assembly. Beware though, compiler may have its revenge, sooner or later.
C standard does NOT require a stack in any implementation, so really your problem doesn't make any sense.
in the context of gcc, the behavior is different in x86 and x86-64(and any others).
in x86, parameters reside in stack, but in x86-64, the first 6 parameters(including the implicit ones) reside in registers. so basically you can't do the hacking as you say.
if you want to hack the code, you need to specify the platform you want to run on, otherwise, there is no point to answer your question.
A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.
I'm in the process of trying to understand the stack mechanisms.
From the theory I have seen, before a function is called, its arguments are pushed onto the stack.
However when calling printf in the code below, none of them are pushed:
#include<stdio.h>
int main(){
char *s = " test string";
printf("Print this: %s and this %s \n", s, s);
return 1;
}
I've put a break in gdb to the printf instruction, and when displaying the stack, none of the 3 arguments are pushed onto the stack.
The only thing pushed to the stack is the string address s as can be seen in the disassembled code below:
0x000000000040052c <+0>: push %rbp
0x000000000040052d <+1>: mov %rsp,%rbp
0x0000000000400530 <+4>: sub $0x10,%rsp
0x0000000000400534 <+8>: movq $0x400604,-0x8(%rbp) // variable pushed on the stack
0x000000000040053c <+16>: mov -0x8(%rbp),%rdx
0x0000000000400540 <+20>: mov -0x8(%rbp),%rax
0x0000000000400544 <+24>: mov %rax,%rsi
0x0000000000400547 <+27>: mov $0x400611,%edi
0x000000000040054c <+32>: mov $0x0,%eax
0x0000000000400551 <+37>: callq 0x400410 <printf#plt>
0x0000000000400556 <+42>: mov $0x1,%eax
0x000000000040055b <+47>: leaveq
Actually, the only argument appearing so far in the disassembled code is when "Print this: %s and this %s \n" is put in %edi...
0x0000000000400547 <+27>: mov $0x400611,%edi
SO my question is: why am i not seeing 3 push instructions for each of my three arguments ?
uname -a:
3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
On 64 bits Linux x86-64 systems, the x86-64 ABI (x86-64 Application Binary Interface) does not push arguments on stack, but uses some registers (this calling convention is slightly faster).
If you pass many arguments -e.g. a dozen- some of them gets pushed on the stack.
Perhaps read first the wikipage on x86 calling conventions before reading the x86-64 ABI specifications.
For variadic functions like printf details are a bit scary.
Depending on your compiler, you will need to allocate space on the heap for your pointer 's'.
Instead of
char *s;
use
char s[300];
to allocate 300 bytes of room
Otherwise 's' is simply pointing up the stack - which can be random
This could be partly why you are not seeing PUSH instructions.
Also, I don't see why there should be a PUSH instruction for the pointers required in printf? The assembler is simply copying (MOV) the value of the pointers