When I compile
#include <stdio.h>
int
main () {
return 0;
}
to x86 assembly the result is plain and expected:
$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
But when studying different disassembled binaries, the function prologue is never that simple. Indeed, changing the C source above into
#include <stdio.h>
int
main () {
printf("Hi");
return 0;
}
the result is
$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
subl $12, %esp
call printf
addl $16, %esp
movl $0, %eax
movl -4(%ebp), %ecx
leave
leal -4(%ecx), %esp
ret
In particular, I don't get why these instructions
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
are generated -- specifically why not directly storing %esp into %ecx, instead of into%esp+4?
If main isn't a leaf function, it needs to align the stack for the benefit of any functions it calls. Functions that aren't called main just maintain the stack's alignment.
lea 4(%esp), %ecx # ecx = esp+4
andl $-16, %esp
pushl -4(%ecx) # load from ecx-4 and push that
It's pushing a copy of the return address, so it will be in the right place after aligning the stack. You're right, a different sequence would be more sensible:
mov (%esp), %ecx ; or maybe even pop %ecx
andl $-16, %esp
push %ecx ; push (mem) is slower than push reg
As Youka says in comments, don't expect code from -O0 to be optimized at all. Use -Og for optimizations that don't interfere with debugability. The gcc manual recommends that for compile/debug/edit cycles. -O0 output is harder to read / understand / learn from than optimized code. It's easier to map back to the source, but it's terrible code.
Related
I wrote a single c program that prints input to std output. Then I converted it to assembly language. By the way I am using AT&T Syntax.
This is the simple C code.
#include <stdio.h>
int main()
{
int c;
while ((c = getchar ()) != EOF)
{
putchar(c);
}
return 0;
}
int c is a local variable.
Then I converted it to assembly language.
.file "question_1.c"
.text
.globl main
.type main, #function
//prolog
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp // we add 20 bytes to the stack
jmp .L2
.L3:
subl $12, %esp
pushl -12(%ebp)
call putchar
addl $16, %esp
.L2:
call getchar
movl %eax, -12(%ebp)
cmpl $-1, -12(%ebp)
jne .L3
//assumption this is the epilog
movl $0, %eax
movl -4(%ebp), %ecx
leave
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.9.4-2ubuntu1) 4.9.4"
.section .note.GNU-stack,"",#progbits
normally in the epilog we are supposed to addl 20 because in the prolog we subl 20.
So the is the stack frame still there?
Or am I missing out a crucial point?
I also have a question regarding the main function. Normally functions are normally "called" but where does it happen in the assembly code?
Thank you in advance.
Just after the main label, leal 4(%esp), %ecx saves four plus the stack pointer in %ecx. At the end of the routine, leal -4(%ecx), %esp writes four less than the saved value to the stack pointer. This directly restores the original value, instead of doing it by adding the amount that was subtracted.
I was wondering: programming in C, let's say we have two functions:
int get_a_value();
int calculate_something(int number);
And two versions of a third one:
/* version 1 */
int main()
{
int value = get_a_value();
int result = calculate_something(value);
return result;
}
/* version 2 */
int main()
{
int result = calculate_something(get_a_value());
return result;
}
Teorethically, what would be the difference between these two versions of the same thing, in terms of correctness, memory use and efficiency? Would they generate different instructions? On the other hand, what circumstances would make the possible differences significant in reality?
Thanks in advance.
Copied both versions, compiled each with gcc -S to get the machine language output, used sdiff to compare side-by-side.
Results using gcc version 4.1.2 20070115 (SUSE Linux):
No optimization:
main: main:
.LFB2: .LFB2:
pushq %rbp pushq %rbp
.LCFI0: .LCFI0:
movq %rsp, %rbp movq %rsp, %rbp
.LCFI1: .LCFI1:
subq $16, %rsp subq $16, %rsp
.LCFI2: .LCFI2:
movl $0, %eax movl $0, %eax
call get_a_value call get_a_value
movl %eax, -8(%rbp) | movl %eax, %edi
movl -8(%rbp), %edi <
movl $0, %eax movl $0, %eax
call calculate_something call calculate_something
movl %eax, -4(%rbp) movl %eax, -4(%rbp)
movl -4(%rbp), %eax movl -4(%rbp), %eax
leave leave
ret ret
Basically, one extra move instruction. Both allocate the same amount of stack space (subq $16, %rsp reserves 16 bytes for the stack), so memory-wise there's no difference.
Level 1 optimization (-O1):
main: main:
.LFB2: .LFB2:
subq $8, %rsp subq $8, %rsp
.LCFI0: .LCFI0:
movl $0, %eax movl $0, %eax
call get_a_value call get_a_value
movl %eax, %edi movl %eax, %edi
movl $0, %eax movl $0, %eax
call calculate_something call calculate_something
addq $8, %rsp addq $8, %rsp
ret ret
No differences.
Results using gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2):
No optimization:
main: main:
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
subl $8, %esp subl $8, %esp
> subl $12, %esp
> subl $4, %esp
call get_a_value call get_a_value
> addl $4, %esp
movl %eax, %eax movl %eax, %eax
movl %eax, -4(%ebp) | pushl %eax
subl $12, %esp <
pushl -4(%ebp) <
call calculate_something call calculate_something
addl $16, %esp addl $16, %esp
movl %eax, %eax movl %eax, %eax
movl %eax, -8(%ebp) | movl %eax, -4(%ebp)
movl -8(%ebp), %eax | movl -4(%ebp), %eax
movl %eax, %eax movl %eax, %eax
leave leave
ret ret
Roughly the same number of instructions, ordered slightly differently.
Level 1 optimization (-O1):
main: main:
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
subl $8, %esp | subl $24, %esp
call get_a_value call get_a_value
subl $12, %esp | movl %eax, (%esp)
pushl %eax <
call calculate_something call calculate_something
leave leave
ret ret
Looks like the second version reserves a little more stack space.
So, for this particular example with these particular compilers, there's no huge difference between the two versions. In that case, I'd favor the first version for the following reasons:
Easier to trace in a debugger; you can examine the value returned from get_a_value before passing it to calculate_something;
It gives you a place to do some sanity checking, in case calculate_something isn't well-behaved for certain inputs;
It's a little easier on the eyes.
Just remember that terse doesn't necessarily mean fast or efficient, and what's fast/efficient under one particular compiler/hardware combination may be hopelessly busted under a different compiler/hardware combination. Some compilers actually have an easier time optimizing code that's written in a clear manner.
Your code should be, in order:
Correct - it doesn't matter how fast it is or how little memory it uses if it doesn't meet its requirements;
Secure - it doesn't matter how fast it is or how little memory it uses if it's a malware vector or risks exposing sensitive data to unauthorized parties (yes, I'm talking about Heart-frickin'-bleed);
Robust - it doesn't mattter how fast it is or how little memory it uses if it dumps core because somebody sneezed in a different room;
Maintainable - it doesn't matter how fast it is or how little memory it uses if it has to be scrapped and rewritten because the requirements changed (which they do);
Efficient - now you can start worrying about performance and efficiency.
I performed a little test, generating assembler code for the 2 versions. Simply running a diff command from bash showed that the first version has 2 instructions more than the second one.
If you want to try by yourself simply compile with this command
gcc -S main.c -o asmout.s
gcc -S main2.c -o asmout2.s
and then check differences with
diff asmout.s asmout2.s
I got these 2 instructions more for the first one:
movl %eax, -8(%rbp)
movl -8(%rbp), %eax
EDIT:
As Keith Thompson suggested if compiled with optimization options the generated assembler code is the same for both versions.
It really depends on the platform and the compiler, but with optimization on they should usually generate the same code. At worst version one will allocate space for an extra int. If placing the value of get_a_value in a variable makes your code more readable then I would go ahead and do that. The only time I would advise not doing so is in a deeply recursive function.
I have an assembly function like so
rfact:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $4, %esp
movl 8(%ebp), %ebx
movl $1, %eax
cmpl $1, %ebx
jle .L53
leal -1(%ebx), %eax
movl %eax, (%esp)
call rfact
imull %ebx, %eax
.L53:
addl $4, %esp
popl %ebx
popl %ebp
ret
I understand I can't just save this as rfact.s and compile it. There has to be certain items (such as .text) appended to the top of the assembly. What are these for a linux system? And I'd like to call this function from a main function written in normal c file called rfactmain.c
Here's a 'minimal' prefix of directives - for ELF/SysV i386, and GNU as:
.text
.p2align 4
.globl rfact
.type rfact, #function
I'd also recommend appending a function size directive at the end:
.size rfact, .-rfact
The easiest way to compile is with: gcc [-m32] -c -P rfact.S
With the -P option, you can use C-style comments and not have to worry about line number output, etc. This results in an object file you can link with. The -m32 flag is required if gcc targets x86-64 by default.
Is there any substantial optimization when omitting the frame pointer?
If I have understood correctly by reading this page, -fomit-frame-pointer is used when we want to avoid saving, setting up and restoring frame pointers.
Is this done only for each function call and if so, is it really worth to avoid a few instructions for every function?
Isn't it trivial for an optimization.
What are the actual implications of using this option apart from the debugging limitations?
I compiled the following C code with and without this option
int main(void)
{
int i;
i = myf(1, 2);
}
int myf(int a, int b)
{
return a + b;
}
,
# gcc -S -fomit-frame-pointer code.c -o withoutfp.s
# gcc -S code.c -o withfp.s
.
diff -u 'ing the two files revealed the following assembly code:
--- withfp.s 2009-12-22 00:03:59.000000000 +0000
+++ withoutfp.s 2009-12-22 00:04:17.000000000 +0000
## -7,17 +7,14 ##
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
- pushl %ebp
- movl %esp, %ebp
pushl %ecx
- subl $36, %esp
+ subl $24, %esp
movl $2, 4(%esp)
movl $1, (%esp)
call myf
- movl %eax, -8(%ebp)
- addl $36, %esp
+ movl %eax, 20(%esp)
+ addl $24, %esp
popl %ecx
- popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
## -25,11 +22,8 ##
.globl myf
.type myf, #function
myf:
- pushl %ebp
- movl %esp, %ebp
- movl 12(%ebp), %eax
- addl 8(%ebp), %eax
- popl %ebp
+ movl 8(%esp), %eax
+ addl 4(%esp), %eax
ret
.size myf, .-myf
.ident "GCC: (GNU) 4.2.1 20070719
Could someone please shed light on the key points of the above code where -fomit-frame-pointer did actually make the difference?
Edit: objdump's output replaced with gcc -S's
-fomit-frame-pointer allows one extra register to be available for general-purpose use. I would assume this is really only a big deal on 32-bit x86, which is a bit starved for registers.*
One would expect to see EBP no longer saved and adjusted on every function call, and probably some additional use of EBP in normal code, and fewer stack operations on occasions where EBP gets used as a general-purpose register.
Your code is far too simple to see any benefit from this sort of optimization-- you're not using enough registers. Also, you haven't turned on the optimizer, which might be necessary to see some of these effects.
* ISA registers, not micro-architecture registers.
The only downside of omitting it is that debugging is much more difficult.
The major upside is that there is one extra general purpose register which can make a big difference on performance. Obviously this extra register is used only when needed (probably in your very simple function it isn't); in some functions it makes more difference than in others.
You can often get more meaningful assembly code from GCC by using the -S argument to output the assembly:
$ gcc code.c -S -o withfp.s
$ gcc code.c -S -o withoutfp.s -fomit-frame-pointer
$ diff -u withfp.s withoutfp.s
GCC doesn't care about the address, so we can compare the actual instructions generated directly. For your leaf function, this gives:
myf:
- pushl %ebp
- movl %esp, %ebp
- movl 12(%ebp), %eax
- addl 8(%ebp), %eax
- popl %ebp
+ movl 8(%esp), %eax
+ addl 4(%esp), %eax
ret
GCC doesn't generate the code to push the frame pointer onto the stack, and this changes the relative address of the arguments passed to the function on the stack.
Profile your program to see if there is a significant difference.
Next, profile your development process. Is debugging easier or more difficult? Do you spend more time developing or less?
Optimizations without profiling are a waste of time and money.
im just curious about the following example
#include<stdio.h>
int test();
int test(){
// int a = 5;
// int b = a+1;
return ;
}
int main(){
printf("%u\n",test());
return 0;
}
i compiled it with 'gcc -Wall -o semicolon semicolon.c' to create an executable
and 'gcc -Wall -S semicolon.c' to get the assembler code which is:
.file "semicolon.c"
.text
.globl test
.type test, #function
test:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
leave
ret
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
call test
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
since im not such an assembler pro, i only know that printf prints what is in eax
but i dont fully understand what 'movl %eax, 4(%esp)' means which i assume fills eax before calling test
but what is the value then? what means 4(%esp) and what does the value of esp mean?
if i uncomment the lines in test() printf prints 6 - which is written in eax ^^
Your assembly language annotated:
test:
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
subl $4, %esp # Allocate some local space on the stack.
leave # Restore the old frame pointer/stack
ret
Note that nothing in test touches eax.
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx # Point past the return address.
andl $-16, %esp # Align the stack.
pushl -4(%ecx) # Push the return address.
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
pushl %ecx # save the old top of stack.
subl $20, %esp # Allocate some local space (for printf parameters and ?).
call test # Call test.
Note that at this point, nothing has modified eax. Whatever came into main is still here.
movl %eax, 4(%esp) # Save eax as a printf argument.
movl $.LC0, (%esp) # Send the format string.
call printf # Duh.
movl $0, %eax # Return zero from main.
addl $20, %esp # Deallocate local space.
popl %ecx # Restore the old top of stack.
popl %ebp # And the old frame pointer.
leal -4(%ecx), %esp # Fix the stack pointer,
ret
So, what gets printed out is whatever came in to main. As others have pointed out it is undefined: It depends on what the startup code (or the OS) has done to eax previously.
The semicolon has no return value, what you have there is an "empty return", like the one used to return from void functions - so the function doesn't return anything.
This actually causes a warning when compiling:
warning: `return' with no value, in function returning non-void
And I don't see anything placed in eax before calling test.
About 4(%esp), this means taking the value from the stack pointer (esp) + 4. I.e. the one-before-last word on the stack.
The return value of an int function is passed in the EAX register. The test function does not set the EAX register because no return value is given. The result is therefore undefined.
A semicolon indeed has no value.
I think the correct answer is that a return <nothing> for an int function is an error, or at least has undefined behavor. That's why compiling this with -Wall yields
semi.c: In function ‘test’:
semi.c:6: warning: ‘return’ with no value, in function returning non-void
As for what the %4,esp holds... it's a location on the stack where nothing was (intentionally) stored, so it will likely return whatever junk is found at that location. This could be the last expression evaluated to variables in the function (as in your example) or something completely different. This is what "undefined" is all about. :)