Motivation for useless prologue in gcc-compiled main(), disabling it?

Motivation for useless prologue in gcc-compiled main(), disabling it? - c

Given the following minimal test case:
void exit(int);
int main() {
exit(0);
}
GCC 4.9 and later with 32-bit x86 target produces something like:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
subl $12, %esp
pushl $0
call exit
Note the convoluted stack-realignment code. With the function renamed to anything but main, however, it gives the (much more reasonable):
xmain:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
subl $12, %esp
pushl $0
call exit
The differences are even more pronounced with -O. As main nothing changes; renamed, it yields:
xmain:
subl $24, %esp
pushl $0
call exit
The above was noticed in answering this question:
How do i get rid of call __x86.get_pc_thunk.ax
Is this behavior (and its motivation) documented anywhere, and is there any way to suppress it? GCC has x86 target-specific options to set the preferred/assumed incoming and outgoing stack alignment and enable/disable realignment for arbitrary functions, but they don't seem to be honored for main.

This answer is based on source diving. I do not know what the developers' intentions or motivations were. All of the code involved seems to date to 2008ish, which is after my own time working on GCC, but long enough ago that people's memories have probably gotten fuzzy. (GCC 4.9 was released in 2014; did you go back any farther than that? If I'm right about when this code was introduced, the clumsy stack alignment for main should start happening in version 4.4.)
GCC's x86 back end appears to have been coded to make extra-conservative assumptions about the stack alignment on entry to main, regardless of command-line options. The function ix86_minimum_incoming_stack_boundary is called to compute the expected stack alignment on entry for each function, and the last thing it does ...
12523 /* Stack at entrance of main is aligned by runtime. We use the
12524 smallest incoming stack boundary. */
12525 if (incoming_stack_boundary > MAIN_STACK_BOUNDARY
12526 && DECL_NAME (current_function_decl)
12527 && MAIN_NAME_P (DECL_NAME (current_function_decl))
12528 && DECL_FILE_SCOPE_P (current_function_decl))
12529 incoming_stack_boundary = MAIN_STACK_BOUNDARY;
12530
12531 return incoming_stack_boundary;
... is override the expected stack alignment to a conservative constant, MAIN_STACK_BOUNDARY, if the function being compiled is main. MAIN_STACK_BOUNDARY is 128 (bits) when compiling 64-bit code and 32 when compiling 32-bit code. As far as I can tell, there is no command-line knob that will make it expect the stack to be more aligned than that on entry to main. I can persuade it to skip stack alignment for main by telling it that no additional alignment is needed, compiling your test program with -m32 -mpreferred-stack-boundary=2 gives me
main:
pushl $0
call exit
with GCC 7.3.
The write-only manipulations of %ecx appear to be a missed-optimization bug. They are coming from this part of ix86_expand_prologue:
13695 /* Grab the argument pointer. */
13696 t = plus_constant (Pmode, stack_pointer_rtx, m->fs.sp_offset);
13697 insn = emit_insn (gen_rtx_SET (crtl->drap_reg, t));
13698 RTX_FRAME_RELATED_P (insn) = 1;
13699 m->fs.cfa_reg = crtl->drap_reg;
13700 m->fs.cfa_offset = 0;
13701
13702 /* Align the stack. */
13703 insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
13704 stack_pointer_rtx,
13705 GEN_INT (-align_bytes)));
13706 RTX_FRAME_RELATED_P (insn) = 1;
13707
The intention is to save a pointer to the incoming argument area before realigning the stack, so that it is straightforward to access arguments. Either because this happens fairly late in the pipeline (after register allocation), or because the instructions are marked FRAME_RELATED, nothing manages to delete those instructions again when they turn out to be unnecessary.
I imagine the GCC devs would at least listen to a bug report about this, but they might reasonably consider it low priority, because these are instructions that are executed only once in the lifetime of the whole program, they're only actually dead when main doesn't use its arguments, and they only happen in the traditional 32-bit ABI, which I have the impression is considered a second-class target nowadays.

main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
The above section replicates the invoking stack frame, which since you haven’t defined any arguments to main() consists of just the return address -4(%ecx) and frame pointer, into a $16 byte aligned stack; thus my WAG is that this is to accomodate runtimes (crt0.s) that do not align the stack properly.
The push %ebp was a bit of a giveaway -- it establishes a consistent looking backtrace through crt0.s despite this trampoline.
This is just a ‘normal’ call of exit, with the stack properly aligned...
subl $12, %esp
pushl $0
call exit

Related

Segmentation fault when calling x86 Assembly function from C program

I am writing a C program that calls an x86 Assembly function which adds two numbers. Below are the contents of my C program (CallAssemblyFromC.c):
#include <stdio.h>
#include <stdlib.h>
int addition(int a, int b);
int main(void) {
int sum = addition(3, 4);
printf("%d", sum);
return EXIT_SUCCESS;
}
Below is the code of the Assembly function (my idea is to code from scratch the stack frame prologue and epilogue, I have added comments to explain the logic of my code) (addition.s):
.text
# Here, we define a function addition
.global addition
addition:
# Prologue:
# Push the current EBP (base pointer) to the stack, so that we
# can reset the EBP to its original state after the function's
# execution
push %ebp
# Move the EBP (base pointer) to the current position of the ESP
# register
movl %esp, %ebp
# Read in the parameters of the addition function
# addition(a, b)
#
# Since we are pushing to the stack, we need to obtain the parameters
# in reverse order:
# EBP (return address) | EBP + 4 (return value) | EBP + 8 (b) | EBP + 4 (a)
#
# Utilize advanced indexing in order to obtain the parameters, and
# store them in the CPU's registers
movzbl 8(%ebp), %ebx
movzbl 12(%ebp), %ecx
# Clear the EAX register to store the sum
xorl %eax, %eax
# Add the values into the section of memory storing the return value
addl %ebx, %eax
addl %ecx, %eax
I am getting a segmentation fault error, which seems strange considering that I think I am allocating memory in accordance with the x86 calling conventions (e.x. allocating the correct memory sections to the function's parameters). Furthermore, if any of you have a solution, it would be greatly appreciated if you could provide some advice as to how to debug an Assembly program embedded with C (I have been using the GDB debugger but it simply points to the line of the C program where the segmentation fault happens instead of the line in the Assembly program).

Your function has no epilogue. You need to restore %ebp and pop the stack back to where it was, and then ret. If that's really missing from your code, then that explains your segfault: the CPU will go on executing whatever garbage happens to be after the end of your code in memory.
You clobber (i.e. overwrite) the %ebx register which is supposed to be callee-saved. (You mention following the x86 calling conventions, but you seem to have missed that detail.) That would be the cause of your next segfault, after you fixed the first one. If you use %ebx, you need to save and restore it, e.g. with push %ebx after your prologue and pop %ebx before your epilogue. But in this case it is better to rewrite your code so as not to use it at all; see below.
movzbl loads an 8-bit value from memory and zero-extends it into a 32-bit register. Here the parameters are int so they are already 32 bits, so plain movl is correct. As it stands your function would give incorrect results for any arguments which are negative or larger than 255.
You're using an unnecessary number of registers. You could move the first operand for the addition directly into %eax rather than putting it into %ebx and adding it to zero. And on x86 it is not necessary to get both operands into registers before adding; arithmetic instructions have a mem, reg form where one operand can be loaded directly from memory. With this approach we don't need any registers other than %eax itself, and in particular we don't have to worry about %ebx anymore.
I would write:
.text
# Here, we define a function addition
.global addition
addition:
# Prologue:
push %ebp
movl %esp, %ebp
# load first argument
movl 8(%ebp), %eax
# add second argument
addl 12(%ebp), %eax
# epilogue
movl %ebp, %esp # redundant since we haven't touched esp, but will be needed in more complex functions
pop %ebp
ret
In fact, you don't need a stack frame for this function at all, though I understand if you want to include it for educational value. But if you omit it, the function can be reduced to
.text
.global addition
addition:
movl 4(%esp), %eax
addl 8(%esp), %eax
ret

You are corrupting the stacke here:
movb %al, 4(%ebp)
To return the value, simply put it in eax. Also why do you need to clear eax? that's inefficient as you can load the first value directly into eax and then add to it.
Also EBX must be saved if you intend to use it, but you don't really need it anyway.

How do i get rid of call __x86.get_pc_thunk.ax

I tried to compile and convert a very simple C program to assembly language.
I am using Ubuntu and the OS type is 64 bit.
This is the C Program.
void add();
int main() {
add();
return 0;
}
if i use gcc -S -m32 -fno-asynchronous-unwind-tables -o simple.S simple.c this is how my assembly source code File should look like:
.file "main1.c"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call add
movl $0, %eax
movl %ebp, %esp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Debian 4.4.5-8) 4.4.5" // this part should say Ubuntu instead of Debian
.section .note.GNU-stack,"",#progbits
but instead it looks like this:
.file "main0.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl %eax, %ebx
call add#PLT
movl $0, %eax
popl %ecx
popl %ebx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.section
.text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
movl (%esp), %eax
ret
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
At my University they told me to use the Flag -m32 if I am using a 64 bit Linux version. Can somebody tell me what I am doing wrong?
Am I even using the correct Flag?
edit after -fno-pie
.file "main0.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
call add
movl $0, %eax
addl $4, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
it looks better but it's not exactly the same.
for example what does leal mean?

As a general rule, you cannot expect two different compilers to generate the same assembly code for the same input, even if they have the same version number; they could have any number of extra "patches" to their code generation. As long as the observable behavior is the same, anything goes.
You should also know that GCC, in its default -O0 mode, generates intentionally bad code. It's tuned for ease of debugging and speed of compilation, not for either clarity or efficiency of the generated code. It is often easier to understand the code generated by gcc -O1 than the code generated by gcc -O0.
You should also know that the main function often needs to do extra setup and teardown that other functions do not need to do. The instruction leal 4(%esp),%ecx is part of that extra setup. If you only want to understand the machine code corresponding to the code you wrote, and not the nitty details of the ABI, name your test function something other than main.
(As pointed out in the comments, that setup code is not as tightly tuned as it could be, but it doesn't normally matter, because it's only executed once in the lifetime of the program.)
Now, to answer the question that was literally asked, the reason for the appearance of
call __x86.get_pc_thunk.ax
is because your compiler defaults to generating "position-independent" executables. Position-independent means the operating system can load the program's machine code at any address in (virtual) memory and it'll still work. This allows things like address space layout randomization, but to make it work, you have to take special steps to set up a "global pointer" at the beginning of every function that accesses global variables or calls another function (with some exceptions). It's actually easier to explain the code that's generated if you turn optimization on:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
This is all just setting up main's stack frame and saving registers that need to be saved. You can ignore it.
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
The special function __x86.get_pc_thunk.bx loads its return address -- which is the address of the addl instruction that immediately follows -- into the EBX register. Then we add to that address the value of the magic constant _GLOBAL_OFFSET_TABLE_, which, in position-independent code, is the difference between the address of the instruction that uses _GLOBAL_OFFSET_TABLE_ and the address of the global offset table. Thus, EBX now points to the global offset table.
call add#PLT
Now we call add#PLT, which means call add, but jump through the "procedure linkage table" to do it. The PLT takes care of the possibility that add is defined in a shared library rather than the main executable. The code in the PLT uses the global offset table and assumes that you have already set EBX to point to it, before calling an #PLT symbol.  That's why main has to set up EBX even though nothing appears to use it. If you had instead written something like
extern int number;
int main(void) { return number; }
then you would see a direct use of the GOT, something like
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
movl number#GOT(%ebx), %eax
movl (%eax), %eax
We load up EBX with the address of the GOT, then we can load the address of the global variable number from the GOT, and then we actually dereference the address to get the value of number.
If you compile 64-bit code instead, you'll see something different and much simpler:
movl number(%rip), %eax
Instead of all this mucking around with the GOT, we can just load number from a fixed offset from the program counter. PC-relative addressing was added along with the 64-bit extensions to the x86 architecture. Similarly, your original program, in 64-bit position-independent mode, will just say
call add#PLT
without setting up EBX first. The call still has to go through the PLT, but the PLT uses PC-relative addressing itself and doesn't need any help from its caller.
The only difference between __x86.get_pc_thunk.bx and __x86.get_pc_thunk.ax is which register they store their return address in: EBX for .bx, EAX for .ax. I have also seen GCC generate .cx and .dx variants. It's just a matter of which register it wants to use for the global pointer -- it must be EBX if there are going to be calls through the PLT, but if there aren't any then it can use any register, so it tries to pick one that isn't needed for anything else.
Why does it call a function to get the return address? Older compilers would do this instead:
call 1f
1: pop %ebx
but that screws up return-address prediction, so nowadays the compiler goes to a little extra trouble to make sure every call is paired with a ret.

The extra junk you're seeing is due to your version of GCC special-casing main to compensate for possibly-broken entry point code starting it with a misaligned stack. I'm not sure how to disable this or if it's even possible, but renaming the function to something other than main will suppress it for the sake of your reading.
After renaming to xmain I get:
xmain:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
call add
movl $0, %eax
leave
ret

Disassembling simple C function

I'm trying to understand the underlying assembly for a simple C function.
program1.c
void function() {
char buffer[1];
}
=>
push %ebp
mov %esp, %ebp
sub $0x10, %esp
leave
ret
Not sure how it's arriving at 0x10 here? Isn't a character 1 byte, which is 8 bits, so it should be 0x08?
program2.c
void function() {
char buffer[4];
}
=>
push %ebp
mov %esp, %ebp
sub $0x18, %esp
mov ...
mov ...
[a bunch of random instructions]
Not sure how it's arriving at 0x18 here either? Also, why are there so many additional instructions after the SUB instruction? All I did was change the length of the array from 1 to 4.

gcc uses -mpreferred-stack-boundary=4 by default for x86 32 and 64bit ABIs, so it keeps %esp 16B-aligned.
I was able to reproduce your output with gcc 4.8.2 -O0 -m32 on the Godbolt Compiler Explorer
void f1() { char buffer[1]; }
pushl %ebp
movl %esp, %ebp # make a stack frame (`enter` is super slow, so gcc doesn't use it)
subl $16, %esp
leave # `leave` is not terrible compared to mov/pop
ret
You must be using a version of gcc with -fstack-protector enabled by default. Newer gcc isn't usually configured to do that, so you don't get the same sentinel value and check written to the stack. (Try a newer gcc in that godbolt link)
void f4() { char buffer[4]; }
pushl %ebp #
movl %esp, %ebp # make a stack frame
subl $24, %esp # IDK why it reserves 24, rather than 16 or 32B, but prob. has something to do with aligning the stack for the possible call to __stack_chk_fail
movl %gs:20, %eax # load a value from thread-local storage
movl %eax, -12(%ebp) # store it on the stack
xorl %eax, %eax # tmp59
movl -12(%ebp), %eax # D.1377, tmp60
xorl %gs:20, %eax # check that the sentinel value matches what we stored
je .L3 #,
call __stack_chk_fail #
.L3:
leave
ret
Apparently gcc considers char buffer[4] a "vulnerable object", but not char buffer[1]. Without -fstack-protector, there'd be little to no difference in the asm even at -O0.

Isn't a character 1 byte, which is 8 bits, so it should be 0x08?
This values are not bits, they are bytes.
Not sure how it's arriving at 0x10 here?
This lines:
push %ebp
mov %esp, %ebp
sub $0x10, %esp
Are allocating space on the stack, 16 bytes of memory are being reserved for the execution of this function.
All those bytes are needed to store information like:
A 4 byte memory address for the instruction that will be jumped to in the ret instruction
The local variables of the functions
Data structure alignment
Other stuff i can't remember right now :)
In your example, 16 bytes were allocated. 4 of them are for the address of the next instruction that will be called, so we have 12 bytes left. 1 byte is for the char array of size 1, which is probably optimized by the compiler to a single char. The last 11 bytes are probably to store some of the stuff i can't remember and the padding's added by the compiler.
Not sure how it's arriving at 0x18 here either?
Each of the additional bytes in your second example increased the stack size in 2 bytes, 1 byte for the char, and 1 likely for memory alignment purposes.
Also, why are there so many additional instructions after the SUB instruction?
Please update the question with the instructions.

This code is just setting up the stack frame. This is used as scratch space for local variables, and will have some kind of alignment requirement.
You haven't mentioned your platform, so I can't tell you exactly what the requirements are for your system, but obviously both values are at least 8-byte aligned (so the size of your local variables is rounded up so %esp is still a multiple of 8).
Search for "c function prolog epilog" or "c function call stack" to find more resources in this area.
Edit - Peter Cordes' answer explains the discrepancy and the mysterious extra instructions.
And for completeness, although Fábio already answered this part:
Not sure how it's arriving at 0x10 here? Isn't a character 1 byte, which is 8 bits, so it should be 0x08?
On x86, %esp is the stack pointer, and pointers store addresses, and these are addresses of bytes. Sub-byte addressing is rarely used (cf. Peter's comment). If you want to examine individual bits inside a byte, you'd usually use bitwise (&,|,~,^) operations on the value, but not change the address.
(You could equally argue that sub-cache-line addressing is a convenient fiction, but we're rapidly getting off-topic).

Whenever you allocate memory, your operating system almost never actually gives you exactly that amount, unless you use a function like pvalloc, which gives you a page-aligned amount of bytes (usually 4K). Instead, your operating system assumes that you might need more in the future, so goes ahead and gives you a bit more.
To disable this behavior, use a lower-level system call that doesn't do buffering, like sbrk(). These lecture notes are an excellent resource:
http://web.eecs.utk.edu/~plank/plank/classes/cs360/360/notes/Malloc1/lecture.html

x86 calling convention: should arguments passed by stack be read-only?

It seems state-of-art compilers treat arguments passed by stack as read-only. Note that in the x86 calling convention, the caller pushes arguments onto the stack and the callee uses the arguments in the stack. For example, the following C code:
extern int goo(int *x);
int foo(int x, int y) {
goo(&x);
return x;
}
is compiled by clang -O3 -c g.c -S -m32 in OS X 10.10 into:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _foo
.align 4, 0x90
_foo: ## #foo
## BB#0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
movl %eax, -4(%ebp)
leal -4(%ebp), %eax
movl %eax, (%esp)
calll _goo
movl -4(%ebp), %eax
addl $8, %esp
popl %ebp
retl
.subsections_via_symbols
Here, the parameter x(8(%ebp)) is first loaded into %eax; and then stored in -4(%ebp); and the address -4(%ebp) is stored in %eax; and %eax is passed to the function goo.
I wonder why Clang generates code that copy the value stored in 8(%ebp) to -4(%ebp), rather than just passing the address 8(%ebp) to the function goo. It would save memory operations and result in a better performance. I observed a similar behaviour in GCC too (under OS X). To be more specific, I wonder why compilers do not generate:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _foo
.align 4, 0x90
_foo: ## #foo
## BB#0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
leal 8(%ebp), %eax
movl %eax, (%esp)
calll _goo
movl 8(%ebp), %eax
addl $8, %esp
popl %ebp
retl
.subsections_via_symbols
I searched for documents if the x86 calling convention demands the passed arguments to be read-only, but I couldn't find anything on the issue. Does anybody have any thought on this issue?

The rules for C are that parameters must be passed by value. A compiler converts from one language (with one set of rules) to a different language (potentially with a completely different set of rules). The only limitation is that the behaviour remains the same. The rules of the C language do not apply to the target language (e.g. assembly).
What this means is that if a compiler feels like generating assembly language where parameters are passed by reference and are not passed by value; then this is perfectly legal (as long as the behaviour remains the same).
The real limitation has nothing to do with C at all. The real limitation is linking. So that different object files can be linked together, standards are needed to ensure that whatever the caller in one object file expects matches whatever the callee in another object file provides. This is what's known as the ABI. In some cases (e.g. 64-bit 80x86) there are multiple different ABIs for the exact same architecture.
You can even invent your own ABI that's radically different (and implement your own tools that support your own radically different ABI) and that's perfectly legal as far as the C standards go; even if your ABI requires "pass by reference" for everything (as long as the behaviour remains the same).

Actually, I just compiled this function using GCC:
int foo(int x)
{
goo(&x);
return x;
}
And it generated this code:
_foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
leal 8(%ebp), %eax
movl %eax, (%esp)
call _goo
movl 8(%ebp), %eax
leave
ret
This is using GCC 4.9.2 (on 32-bit cygwin if it matters), no optimizations. So in fact, GCC did exactly what you thought it should do and used the argument directly from where the caller pushed it on the stack.

The C programming language mandates that arguments are passed by value. So any modification of an argument (like an x++; as the first statement of your foo) is local to the function and does not propagate to the caller.
Hence, a general calling convention should require copying of arguments at every call site. Calling conventions should be general enough for unknown calls, e.g. thru a function pointer!
Of course, if you pass an address to some memory zone, the called function is free to dereference that pointer, e.g. as in
int goo(int *x) {
static int count;
*x = count++;
return count % 3;
}
BTW, you might use link-time optimizations (by compiling and linking with clang -flto -O2 or gcc -flto -O2) to perhaps enable the compiler to improve or inline some calls between translation units.
Notice that both Clang/LLVM and GCC are free software compilers. Feel free to propose an improvement patch to them if you want to (but since both are very complex pieces of software, you'll need to work some months to make that patch).
NB. When looking into produced assembly code, pass -fverbose-asm to your compiler!

Understanding empty main()'s translation into assembly

Could somebody please explain what GCC is doing for this piece of code? What is it initializing? The original code is:
#include <stdio.h>
int main()
{
}
And it was translated to:
.file "test1.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
leave
ret
I would be grateful if a compiler/assembly guru got me started by explaining the stack, register and the section initializations. I cant make head or tail out of the code.
EDIT:
I am using gcc 3.4.5. and the command line argument is gcc -S test1.c
Thank You,
kunjaan.

I should preface all my comments by saying, I am still learning assembly.
I will ignore the section initialization. A explanation for the section initialization and basically everything else I cover can be found here:
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
The ebp register is the stack frame base pointer, hence the BP. It stores a pointer to the beginning of the current stack.
The esp register is the stack pointer. It holds the memory location of the top of the stack. Each time we push something on the stack esp is updated so that it always points to an address the top of the stack.
So ebp points to the base and esp points to the top. So the stack looks like:
esp -----> 000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
If you push e4 on the stack this is what happens:
esp -----> 000a2 e4
000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
Notice that the stack grows towards lower addresses, this fact will be important below.
The first two steps are known as the procedure prolog or more commonly as the function prolog. They prepare the stack for use by local variables (See procedure prolog quote at the bottom).
In step 1 we save the pointer to the old stack frame on the stack by calling
pushl %ebp. Since main is the first function called, I have no idea what the previous value of %ebp points too.
Step 2, We are entering a new stack frame because we are entering a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp to be the beginning of our stack frame.
Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows toward lower addresses thus, subtracting by 8, moves the top of the stack by 8 bytes.
Step 4; Aligns the stack, I've found different opinions on this. I'm not really sure exactly what this is done. I suspect it is done to allow large instructions (SIMD) to be allocated on the stack,
http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html
This code "and"s ESP with 0xFFFF0000,
aligning the stack with the next
lowest 16-byte boundary. An
examination of Mingw's source code
reveals that this may be for SIMD
instructions appearing in the "_main"
routine, which operate only on aligned
addresses. Since our routine doesn't
contain SIMD instructions, this line
is unnecessary.
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Steps 5 through 11 seem to have no purpose to me. I couldn't find any explanation on google. Could someone who really knows this stuff provide a deeper understanding. I've heard rumors that this stuff is used for C's exception handling.
Step 5, stores the return value of main 0, in eax.
Step 6 and 7 we add 15 in hex to eax for unknown reason. eax = 01111 + 01111 = 11110
Step 8 we shift the bits of eax 4 bits to the right. eax = 00001 because the last bits are shift off the end 00001 | 111.
Step 9 we shift the bits of eax 4 bits to the left, eax = 10000.
Steps 10 and 11 moves the value in the first 4 allocated bytes on the stack into eax and then moves it from eax back.
Steps 12 and 13 setup the c library.
We have reached the function epilogue. That is, the part of the function which returns the stack pointers, esp and ebp to the state they were in before this function was called.
Step 14, leave sets esp to the value of ebp, moving the top of stack to the address it was before main was called. Then it sets ebp to point to the address we saved on the top of the stack during step 1.
Leave can just be replaced with the following instructions:
mov %ebp, %esp
pop %ebp
Step 15, returns and exits the function.
1. pushl %ebp
2. movl %esp, %ebp
3. subl $8, %esp
4. andl $-16, %esp
5. movl $0, %eax
6. addl $15, %eax
7. addl $15, %eax
8. shrl $4, %eax
9. sall $4, %eax
10. movl %eax, -4(%ebp)
11. movl -4(%ebp), %eax
12. call __alloca
13. call ___main
14. leave
15. ret
Procedure Prolog:
The first thing a function has to do
is called the procedure prolog. It
first saves the current base pointer
(ebp) with the instruction pushl %ebp
(remember ebp is the register used for
accessing function parameters and
local variables). Now it copies the
stack pointer (esp) to the base
pointer (ebp) with the instruction
movl %esp, %ebp. This allows you to
access the function parameters as
indexes from the base pointer. Local
variables are always a subtraction
from ebp, such as -4(%ebp) or
(%ebp)-4 for the first local variable,
the return value is always at 4(%ebp)
or (%ebp)+4, each parameter or
argument is at N*4+4(%ebp) such as
8(%ebp) for the first argument while
the old ebp is at (%ebp).
http://www.milw0rm.com/papers/52
A really great stack overflow thread exists which answers much of this question.
Why are there extra instructions in my gcc output?
A good reference on x86 machine code instructions can be found here:
http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html
This a lecture which contains some of the ideas used below:
http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm
Here is another take on answering your question:
http://www.phiral.net/linuxasmone.htm
None of these sources explain everything.

Here's a good step-by step breakdown of a simple main() function as compiled by GCC, with lots of detailed info: GAS Syntax (Wikipedia)
For the code you pasted, the instructions break down as follows:
First four instructions (pushl through andl): set up a new stack frame
Next five instructions (movl through sall): generating a weird value for eax, which will become the return value (I have no idea how it decided to do this)
Next two instructions (both movl): store the computed return value in a temporary variable on the stack
Next two instructions (both call): invoke the C library init functions
leave instruction: tears down the stack frame
ret instruction: returns to caller (the outer runtime function, or perhaps the kernel function that invoked your program)

Well, dont know much about GAS, and i'm a little rusty on Intel assembly, but it looks like its initializing main's stack frame.
if you take a look, __main is some kind of macro, must be executing initializations.
Then, as main's body is empty, it calls leave instruction, to return to the function that called main.
From http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax#.22hello.s.22_line-by-line:
This line declares the "_main" label, marking the place that is called from the startup code.
pushl %ebp
movl %esp, %ebp
subl $8, %esp
These lines save the value of EBP on the stack, then move the value of ESP into EBP, then subtract 8 from ESP. The "l" on the end of each opcode indicates that we want to use the version of the opcode that works with "long" (32-bit) operands;
andl $-16, %esp
This code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. (neccesary when using simd instructions, not useful here)
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
This code moves zero into EAX, then moves EAX into the memory location EBP-4, which is in the temporary space we reserved on the stack at the beginning of the procedure. Then it moves the memory location EBP-4 back into EAX; clearly, this is not optimized code.
call __alloca
call ___main
These functions are part of the C library setup. Since we are calling functions in the C library, we probably need these. The exact operations they perform vary depending on the platform and the version of the GNU tools that are installed.
Here's a useful link.
http://unixwiz.net/techtips/win32-callconv-asm.html

It would really help to know what gcc version you are using and what libc. It looks like you have a very old gcc version or a strange platform or both. What's going on is some strangeness with calling conventions. I can tell you a few things:
Save the frame pointer on the stack according to convention:
pushl %ebp
movl %esp, %ebp
Make room for stuff at the old end of the frame, and round the stack pointer down to a multiple of 4 (why this is needed I don't know):
subl $8, %esp
andl $-16, %esp
Through an insane song and dance, get ready to return 1 from main:
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
Recover any memory allocated with alloca (GNU-ism):
call __alloca
Announce to libc that main is exiting (more GNU-ism):
call ___main
Restore the frame and stack pointers:
leave
Return:
ret
Here's what happens when I compile the very same source code with gcc 4.3 on Debian Linux:
.file "main.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
And I break it down this way:
Tell the debugger and other tools the source file:
.file "main.c"
Code goes in the text section:
.text
Beats me:
.p2align 4,,15
main is an exported function:
.globl main
.type main, #function
main's entry point:
main:
Grab the return address, align the stack on a 4-byte address, and save the return address again (why I can't say):
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
Save frame pointer using standard convention:
pushl %ebp
movl %esp, %ebp
Inscrutable madness:
pushl %ecx
popl %ecx
Restore the frame pointer and the stack pointer:
popl %ebp
leal -4(%ecx), %esp
Return:
ret
More info for the debugger?:
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
By the way, main is special and magical; when I compile
int f(void) {
return 17;
}
I get something slightly more sane:
.file "f.c"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
movl $17, %eax
movl %esp, %ebp
popl %ebp
ret
.size f, .-f
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
There's still a ton of decoration, and we're still saving the frame pointer, moving it, and restoring it, which is utterly pointless, but the rest of the code make sense.

It looks like GCC is acting like it is ok to edit main() to include CRT initialization code. I just confirmed that I get the exact same assembly listing from MinGW GCC 3.4.5 here, with your source text.
The command line I used is:
gcc -S emptymain.c
Interestingly, if I change the name of the function to qqq() instead of main(), I get the following assembly:
.file "emptymain.c"
.text
.globl _qqq
.def _qqq; .scl 2; .type 32; .endef
_qqq:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
which makes much more sense for an empty function with no optimizations turned on.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Motivation for useless prologue in gcc-compiled main(), disabling it? - c

Related

Segmentation fault when calling x86 Assembly function from C program

How do i get rid of call __x86.get_pc_thunk.ax

Disassembling simple C function

x86 calling convention: should arguments passed by stack be read-only?

Understanding empty main()'s translation into assembly

Categories

Resources