Buffer overflow/overrun explanation?

Buffer overflow/overrun explanation? - c

In the given url this function is given:
http://insecure.org/stf/smashstack.html
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret) += 8;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
The corresponding assembly code for main function is:
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
In the variable ret, they are pointing ret to the address of the next instruction to be run. I cannot understand that just by keeping the next instruction in the ret variable, how is the program going to jump to this next location?
I know how buffer overflow works, but by changing the ret variable, how is this doing buffer overflow?
Even by considering that this is a dummy program and is just supposed to let us understand how buffer overflow works, changing the ret variable seems wrong.

Explanation of how this is an example of a buffer overrun:
The local variables of function, including buffer1, are on the stack, along with the return address, which is calculated as being 12 bytes beyond buffer1. This is an example of a buffer overrun because writing to an address 12 bytes beyond buffer1 is writing outside the proper bounds of buffer1. By replacing the return address by a number 8 larger than it was, when function finishes, rather than popping off a return to the statement following the function call as usual (x = 1;, in this case), the return address will be 8 bytes later (at the printf statement, in this case).
Skipping the x = 1; statement is not the buffer overflow -- it's the effect of the buffer overflow which modified the return address.
Note on the calculation of 8 as the proper offset for skipping x = 1; statement:
See also FrankH's careful reevaluation of the calculation of 8 as the proper offset to add to the return address to achieve the intent of skipping x = 1;. His findings contradict the GDB-based analysis of the insecure.org source article. Regardless of this detail, the explanation of how a buffer overrun is used to change the return address remains the same -- it's just a question of what to write into the overrun.
For completeness, here is the GDB-based analysis of the insecure.org source article:
What we have done is add 12 to buffer1[]'s address. This new
address is where the return address is stored. We want to skip pass
the assignment to the printf call. How did we know to add 8 to the
return address? We used a test value first (for example 1), compiled
the program, and then started gdb:
[aleph1]$ gdb example3
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(no debugging symbols found)...
(gdb) disassemble main
Dump of assembler code for function main:
0x8000490 <main>: pushl %ebp
0x8000491 <main+1>: movl %esp,%ebp
0x8000493 <main+3>: subl $0x4,%esp
0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp)
0x800049d <main+13>: pushl $0x3
0x800049f <main+15>: pushl $0x2
0x80004a1 <main+17>: pushl $0x1
0x80004a3 <main+19>: call 0x8000470 <function>
0x80004a8 <main+24>: addl $0xc,%esp
0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp)
0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax
0x80004b5 <main+37>: pushl %eax
0x80004b6 <main+38>: pushl $0x80004f8
0x80004bb <main+43>: call 0x8000378 <printf>
0x80004c0 <main+48>: addl $0x8,%esp
0x80004c3 <main+51>: movl %ebp,%esp
0x80004c5 <main+53>: popl %ebp
0x80004c6 <main+54>: ret
0x80004c7 <main+55>: nop
We can see that when calling function() the RET will be 0x8004a8,
and we want to jump past the assignment at 0x80004ab. The next
instruction we want to execute is the at 0x8004b2. A little math
tells us the distance is 8 bytes.
A little better math tells us that the distance is 0x8004a8 - 0x8004b2 = 0xA or 10 bytes, not 8 bytes.

The layout on the stack is like this (addresses downwards - as stacks grow):
buffer + ... value found description
=================================================================================
+24 3 # from main, pushl $0x3
+20 2 # from main, pushl $0x2
+16 1 # from main, pushl $0x1
+12 <main+24> # from main, call 0x8000470 <function>
+8 <frameptr main> # from function, pushl %ebp
+4 %ebp(function) padding (3 bytes) # ABI - compiler will not _pack_ vars
+0 buffer[5];
... buffer1[12]; # might be optimized out (unused)
... int *ret # might be optimized out (reg used instead)
The tricky thing is that buffer starts at a four-byte-aligned address even though it's not sized a multiple of four bytes. The "effective size" is eight bytes, so if you add eight bytes to the start of it, you find the saved framepointer, and if you go another four bytes down, the saved return address (which, according to your disassembly, is main+0x24 / 0x80004a8. Adding 8 to that jumps "into the middle" of two intructions, the result is garbage - you're not skipping the x = 1 statement.

Related

How does ESP and EBP registers act when a new program is executed?

I recently went through an Assembly language book by Richard Blum wherein there was a subject on the C program to assembly conversion.
Consider the following C program:
#include <stdio.h>
int main(){
int a=100;
int b=25;
if (a>b)
printf("The higher value is %d\n", a);
else
printf("The higher value is %d\n", b);
return 0;
}
when I compiled the above program using -S parameter as:
gcc -S abc.c
I got the following result:
.file "abc.c"
.section .rodata
.LC0:
.string "The higher value is %d\n"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x7c,0x6
subl $20, %esp
movl $100, -16(%ebp)
movl $25, -12(%ebp)
movl -16(%ebp), %eax
cmpl -12(%ebp), %eax
jle .L2
subl $8, %esp
pushl -16(%ebp)
pushl $.LC0
call printf
addl $16, %esp
jmp .L3
.L2:
subl $8, %esp
pushl -12(%ebp)
pushl $.LC0
call printf
addl $16, %esp
.L3:
movl $0, %eax
movl -4(%ebp), %ecx
.cfi_def_cfa 1, 0
leave
.cfi_restore 5
leal -4(%ecx), %esp
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
.section .note.GNU-stack,"",#progbits
What I cant understand is this:
Snippet
.LFB0:
.cfi_startproc
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x7c,0x6
subl $20, %esp
I am unable to predict what is happening with the ESP and EBP register. About EBP, I can understand to an extent that it is used as a local stack and so it's value is saved by pushing onto stack.
Can you please elaborate the above snippet?

This is a special form of function entry-sequence suitable for the main()
function. The compiler knows that main() really is called as main(int argc, char **argv, char **envp), and compiles this function according to that very special behavior. So what's sitting on the stack when this code is reached is four long-size values, in this order: envp, argv, argc, return_address.
So that means that the entry-sequence code is doing something like this
(rewritten to use Intel syntax, which frankly makes a lot more sense
than AT&T syntax):
; Copy esp+4 into ecx. The value at [esp] has the return address,
; so esp+4 is 'argc', or the start of the function's arguments.
lea ecx, [esp+4]
; Round esp down (align esp down) to the nearest 16-byte boundary.
; This ensures that regardless of what esp was before, esp is now
; starting at an address that can store any register this processor
; has, from the one-byte registers all the way up to the 16-byte xmm
; registers
and esp, 0xFFFFFFF0
; Since we copied esp+4 into ecx above, that means that [ecx] is 'argc',
; [ecx+4] is 'argv', and [ecx+8] is 'envp'. For whatever reason, the
; compiler decided to push a duplicate copy of 'argv' onto the function's
; new local frame.
push dword ptr [ecx+4]
; Preserve 'ebp'. The C ABI requires us not to damage 'ebp' across
; function calls, so we save its old value on the stack before we
; change it.
push ebp
; Set 'ebp' to the current stack pointer to set up the function's
; stack frame for real. The "stack frame" is the place on the stack
; where this function will store all its local variables.
mov ebp, esp
; Preserve 'ecx'. Ecx tells us what 'esp' was before we munged 'esp'
; in the 'and'-instruction above, so we'll need it later to restore
; 'esp' before we return.
push ecx
; Finally, allocate space on the stack frame for the local variables,
; 20 bytes worth. 'ebp' points to 'esp' plus 24 by this point, and
; the compiler will use 'ebp-16' and 'ebp-12' to store the values of
; 'a' and 'b', respectively. (So under 'ebp', going down the stack,
; the values will look like this: [ecx, unused, unused, a, b, unused].
; Those unused slots are probably used by the .cfi pseudo-ops for
; something related to exception handling.)
sub esp, 20
At the other end of the function, the inverse operations are used to put
the stack back the way it was before the function was called; it may be
helpful to examine what they're doing as well to understand what's happening
at the beginning:
; Return values are always passed in 'eax' in the x86 C ABI, so set
; 'eax' to the return value of 0.
mov eax, 0
; We pushed 'ecx' onto the stack a while back to save it. This
; instruction pulls 'ecx' back off the stack, but does so without
; popping (which would alter 'esp', which doesn't currently point
; to the right location).
mov ecx, [ebp+4]
; Magic instruction! The 'leave' instruction is designed to shorten
; instruction sequences by "undoing" the stack in a single op.
; So here, 'leave' means specifically to do the following two
; operations, in order: esp = ebp / pop ebp
leave
; 'esp' is now set to what it was before we pushed 'ecx', and 'ebp'
; is back to the value that was used when this function was called.
; But that's still not quite right, so we set 'esp' specifically to
; 'ecx+4', which is the exact opposite of the very first instruction
; in the function.
lea esp, [ecx+4]
; Finally, the stack is back to the way it was when we were called,
; so we can just return.
ret

Buffer overflow buffer length

I have a buffer overflow problem that I need to solve. Below is the problem, at the bottom is my question:
#include <stdio.h>
#include <string.h>
void lan(void) {
printf("Your loyalty to your captors is touching.\n");
}
void vulnerable(char *str) {
char buf[LENGTH]; //Length is not given
strcpy(buf, str); //str to fixed size buf (uh-oh)
}
int main(int argc, char **argv) {
if (argc < 2)
return -1;
vulnerable(argv[1]);
return 0;
}
(gdb) disass vulnerable
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
End of assembler dump.
(gdb) disass lan
0x080483f4: push %ebp
0x080483f5: mov %esp, %ebp
0x080483f7: sub $0x4, %esp
0x080483fa: movl $0x8048514, (%esp)
0x08048401: call 0x8048324 <puts>
0x08048406: leave
0x08048407: ret
End of assembler dump.
Then we have the following info:
(gdb) break *0x08048420
Breakpoint 1 at 0x8048420
(gdb) run 'perl -e' print "\x90" x Length' 'AAAABBBBCCCCDDDDEEEE'
Breakpoint 1, 0x08048420 in vulnerable
(gdb) info reg $ebp
ebp 0xffffd61c 0xffffd61c
(gdb) # QUESTION: Where in memory does the buf buffer start?
(gdb) cont
Program received signal SIGSEGV, Segmentation fault.
And finally, the perl command is a shorthand for writing out LENGTH copies of the character 0x90.
I've done a couple of problems of this sort before, but what stops me here is the following question: "By looking at the assembly code, what is the value of LENGTH?"
I'm not sure how to find that from the given assembly code. What I do know is.. the buffer that we're writing into is on the stack at the location -128(%ebp) (where -128 is a decimal number). However, I'm not sure where to go from here to get the length of the buffer.

Let's look at your vulnerable function.
First the compiler creates a frame and reserves 0x88 bytes on the stack:
0x08048408: push %ebp
0x08048409: mov %esp, %ebp
0x0804840b: sub $0x88, %esp
Then it puts two values onto the stack:
0x0804840e: mov 0x8(%ebp), %eax
0x08048411: mov %eax, 0x4(%esp)
0x08048415: lea -0x80(%ebp), %eax
0x08048418: mov %eax, (%esp)
And the last thing it does before returning is calling strcpy(buf, str):
0x0804841b: call 0x8048314 <strcpy>
0x08048420: leave
0x08048421: ret
So we can deduce that the two values it put on the stack are the arguments to strcpy.
mov 0x8(%ebp) would be char *str and lea -0x80(%ebp) would be a pointer to char buf[LENGTH].
Therefore, we know that your buffer starts at -0x80(%ebp), so it has a length of 0x80 = 128 bytes assuming the compiler didn't waste any space.

What I do know is.. the buffer that we're writing into is on the stack
at the location -128(%ebp)
Since the local variables end at %ebp, and you only have a single local variable which is buffer itself, you can conclude that it has length at most 128. It may be shorter, if the compiler added some padding for alignment.

Using "lea ($30, %edx), %eax to add edx and 30 and put it into eax

I'm studying for a midterm tomorrow and one of the questions on a previous midterm is:
Consider the following C function. Write the corresponding assembly language function to perform the same operation.
int myFunction (int a)
{
return (a + 30);
}
What I wrote down is:
.global _myFunction
_myFunction:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
lea ($30, %edx), %eax
leave
ret
where a is edx and a+30 would be eax. Is the use of lea correct in this case? Would it instead need to be
lea ($30, %edx, 1), %eax
Thanks.

If you are looking to simply add 30 using leal then you should do it this way:
leal 30(%edx), %eax
The notation is displacement(baseregister, offsetregister, scalarmultiplier). The displacement is placed on the outside. 30 is added to edx and stored in eax. In AT&T/GAS notation you can leave off both the offset and multiplier. In our example this leaves us with the equivalent of base + displacement or edx + 30 in this example.
cHao also brings up a good point. Let us say the professor asks you to optimize your code. There are some inefficiencies in the fact that myFunction uses no local variables and doesn't need stack space for itself. Because of that all the stack frame creation and destruction can be removed. If you remove the stack frame then you no longer push %ebp as well. That means your first parameter int a is at 4(%esp) . With that in mind your function can be reduced to something this simple:
.global _myFunction
_myFunction:
movl 4(%esp), %eax
addl $30, %eax
ret
Of course the moment you change your function so that it needs to store things on the stack you would have to put the stack frame code back in (pushl %ebp, pushl %ebp, leave etc)

C Code represented as Assembler Code - How to interpret?

I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret

Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;

Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames

I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame

Understanding empty main()'s translation into assembly

Could somebody please explain what GCC is doing for this piece of code? What is it initializing? The original code is:
#include <stdio.h>
int main()
{
}
And it was translated to:
.file "test1.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
leave
ret
I would be grateful if a compiler/assembly guru got me started by explaining the stack, register and the section initializations. I cant make head or tail out of the code.
EDIT:
I am using gcc 3.4.5. and the command line argument is gcc -S test1.c
Thank You,
kunjaan.

I should preface all my comments by saying, I am still learning assembly.
I will ignore the section initialization. A explanation for the section initialization and basically everything else I cover can be found here:
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
The ebp register is the stack frame base pointer, hence the BP. It stores a pointer to the beginning of the current stack.
The esp register is the stack pointer. It holds the memory location of the top of the stack. Each time we push something on the stack esp is updated so that it always points to an address the top of the stack.
So ebp points to the base and esp points to the top. So the stack looks like:
esp -----> 000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
If you push e4 on the stack this is what happens:
esp -----> 000a2 e4
000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
Notice that the stack grows towards lower addresses, this fact will be important below.
The first two steps are known as the procedure prolog or more commonly as the function prolog. They prepare the stack for use by local variables (See procedure prolog quote at the bottom).
In step 1 we save the pointer to the old stack frame on the stack by calling
pushl %ebp. Since main is the first function called, I have no idea what the previous value of %ebp points too.
Step 2, We are entering a new stack frame because we are entering a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp to be the beginning of our stack frame.
Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows toward lower addresses thus, subtracting by 8, moves the top of the stack by 8 bytes.
Step 4; Aligns the stack, I've found different opinions on this. I'm not really sure exactly what this is done. I suspect it is done to allow large instructions (SIMD) to be allocated on the stack,
http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html
This code "and"s ESP with 0xFFFF0000,
aligning the stack with the next
lowest 16-byte boundary. An
examination of Mingw's source code
reveals that this may be for SIMD
instructions appearing in the "_main"
routine, which operate only on aligned
addresses. Since our routine doesn't
contain SIMD instructions, this line
is unnecessary.
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Steps 5 through 11 seem to have no purpose to me. I couldn't find any explanation on google. Could someone who really knows this stuff provide a deeper understanding. I've heard rumors that this stuff is used for C's exception handling.
Step 5, stores the return value of main 0, in eax.
Step 6 and 7 we add 15 in hex to eax for unknown reason. eax = 01111 + 01111 = 11110
Step 8 we shift the bits of eax 4 bits to the right. eax = 00001 because the last bits are shift off the end 00001 | 111.
Step 9 we shift the bits of eax 4 bits to the left, eax = 10000.
Steps 10 and 11 moves the value in the first 4 allocated bytes on the stack into eax and then moves it from eax back.
Steps 12 and 13 setup the c library.
We have reached the function epilogue. That is, the part of the function which returns the stack pointers, esp and ebp to the state they were in before this function was called.
Step 14, leave sets esp to the value of ebp, moving the top of stack to the address it was before main was called. Then it sets ebp to point to the address we saved on the top of the stack during step 1.
Leave can just be replaced with the following instructions:
mov %ebp, %esp
pop %ebp
Step 15, returns and exits the function.
1. pushl %ebp
2. movl %esp, %ebp
3. subl $8, %esp
4. andl $-16, %esp
5. movl $0, %eax
6. addl $15, %eax
7. addl $15, %eax
8. shrl $4, %eax
9. sall $4, %eax
10. movl %eax, -4(%ebp)
11. movl -4(%ebp), %eax
12. call __alloca
13. call ___main
14. leave
15. ret
Procedure Prolog:
The first thing a function has to do
is called the procedure prolog. It
first saves the current base pointer
(ebp) with the instruction pushl %ebp
(remember ebp is the register used for
accessing function parameters and
local variables). Now it copies the
stack pointer (esp) to the base
pointer (ebp) with the instruction
movl %esp, %ebp. This allows you to
access the function parameters as
indexes from the base pointer. Local
variables are always a subtraction
from ebp, such as -4(%ebp) or
(%ebp)-4 for the first local variable,
the return value is always at 4(%ebp)
or (%ebp)+4, each parameter or
argument is at N*4+4(%ebp) such as
8(%ebp) for the first argument while
the old ebp is at (%ebp).
http://www.milw0rm.com/papers/52
A really great stack overflow thread exists which answers much of this question.
Why are there extra instructions in my gcc output?
A good reference on x86 machine code instructions can be found here:
http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html
This a lecture which contains some of the ideas used below:
http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm
Here is another take on answering your question:
http://www.phiral.net/linuxasmone.htm
None of these sources explain everything.

Here's a good step-by step breakdown of a simple main() function as compiled by GCC, with lots of detailed info: GAS Syntax (Wikipedia)
For the code you pasted, the instructions break down as follows:
First four instructions (pushl through andl): set up a new stack frame
Next five instructions (movl through sall): generating a weird value for eax, which will become the return value (I have no idea how it decided to do this)
Next two instructions (both movl): store the computed return value in a temporary variable on the stack
Next two instructions (both call): invoke the C library init functions
leave instruction: tears down the stack frame
ret instruction: returns to caller (the outer runtime function, or perhaps the kernel function that invoked your program)

Well, dont know much about GAS, and i'm a little rusty on Intel assembly, but it looks like its initializing main's stack frame.
if you take a look, __main is some kind of macro, must be executing initializations.
Then, as main's body is empty, it calls leave instruction, to return to the function that called main.
From http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax#.22hello.s.22_line-by-line:
This line declares the "_main" label, marking the place that is called from the startup code.
pushl %ebp
movl %esp, %ebp
subl $8, %esp
These lines save the value of EBP on the stack, then move the value of ESP into EBP, then subtract 8 from ESP. The "l" on the end of each opcode indicates that we want to use the version of the opcode that works with "long" (32-bit) operands;
andl $-16, %esp
This code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. (neccesary when using simd instructions, not useful here)
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
This code moves zero into EAX, then moves EAX into the memory location EBP-4, which is in the temporary space we reserved on the stack at the beginning of the procedure. Then it moves the memory location EBP-4 back into EAX; clearly, this is not optimized code.
call __alloca
call ___main
These functions are part of the C library setup. Since we are calling functions in the C library, we probably need these. The exact operations they perform vary depending on the platform and the version of the GNU tools that are installed.
Here's a useful link.
http://unixwiz.net/techtips/win32-callconv-asm.html

It would really help to know what gcc version you are using and what libc. It looks like you have a very old gcc version or a strange platform or both. What's going on is some strangeness with calling conventions. I can tell you a few things:
Save the frame pointer on the stack according to convention:
pushl %ebp
movl %esp, %ebp
Make room for stuff at the old end of the frame, and round the stack pointer down to a multiple of 4 (why this is needed I don't know):
subl $8, %esp
andl $-16, %esp
Through an insane song and dance, get ready to return 1 from main:
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
Recover any memory allocated with alloca (GNU-ism):
call __alloca
Announce to libc that main is exiting (more GNU-ism):
call ___main
Restore the frame and stack pointers:
leave
Return:
ret
Here's what happens when I compile the very same source code with gcc 4.3 on Debian Linux:
.file "main.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
And I break it down this way:
Tell the debugger and other tools the source file:
.file "main.c"
Code goes in the text section:
.text
Beats me:
.p2align 4,,15
main is an exported function:
.globl main
.type main, #function
main's entry point:
main:
Grab the return address, align the stack on a 4-byte address, and save the return address again (why I can't say):
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
Save frame pointer using standard convention:
pushl %ebp
movl %esp, %ebp
Inscrutable madness:
pushl %ecx
popl %ecx
Restore the frame pointer and the stack pointer:
popl %ebp
leal -4(%ecx), %esp
Return:
ret
More info for the debugger?:
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
By the way, main is special and magical; when I compile
int f(void) {
return 17;
}
I get something slightly more sane:
.file "f.c"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
movl $17, %eax
movl %esp, %ebp
popl %ebp
ret
.size f, .-f
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
There's still a ton of decoration, and we're still saving the frame pointer, moving it, and restoring it, which is utterly pointless, but the rest of the code make sense.

It looks like GCC is acting like it is ok to edit main() to include CRT initialization code. I just confirmed that I get the exact same assembly listing from MinGW GCC 3.4.5 here, with your source text.
The command line I used is:
gcc -S emptymain.c
Interestingly, if I change the name of the function to qqq() instead of main(), I get the following assembly:
.file "emptymain.c"
.text
.globl _qqq
.def _qqq; .scl 2; .type 32; .endef
_qqq:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
which makes much more sense for an empty function with no optimizations turned on.