I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret
Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;
Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame
Related
I'm studying for a midterm tomorrow and one of the questions on a previous midterm is:
Consider the following C function. Write the corresponding assembly language function to perform the same operation.
int myFunction (int a)
{
return (a + 30);
}
What I wrote down is:
.global _myFunction
_myFunction:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
lea ($30, %edx), %eax
leave
ret
where a is edx and a+30 would be eax. Is the use of lea correct in this case? Would it instead need to be
lea ($30, %edx, 1), %eax
Thanks.
If you are looking to simply add 30 using leal then you should do it this way:
leal 30(%edx), %eax
The notation is displacement(baseregister, offsetregister, scalarmultiplier). The displacement is placed on the outside. 30 is added to edx and stored in eax. In AT&T/GAS notation you can leave off both the offset and multiplier. In our example this leaves us with the equivalent of base + displacement or edx + 30 in this example.
cHao also brings up a good point. Let us say the professor asks you to optimize your code. There are some inefficiencies in the fact that myFunction uses no local variables and doesn't need stack space for itself. Because of that all the stack frame creation and destruction can be removed. If you remove the stack frame then you no longer push %ebp as well. That means your first parameter int a is at 4(%esp) . With that in mind your function can be reduced to something this simple:
.global _myFunction
_myFunction:
movl 4(%esp), %eax
addl $30, %eax
ret
Of course the moment you change your function so that it needs to store things on the stack you would have to put the stack frame code back in (pushl %ebp, pushl %ebp, leave etc)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing assembly language in C. I was given the following assembly code:
fn:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
movl %eax, -4(%ebp)
subl $1, 8(%ebp)
.f2:
cmpl $1, 8(%ebp)
jg .f3
movl -4(%ebp), %eax
leave
ret
I have written my code in C below:
fn (int x)
{
int y = 1;
if(x > 1):
int z = y *= x
x – 1;
return z;
}
Can someone tell me if I am on the right track with my C code? or if I am off and if I am can you point me in the right direction.
Thank You in advanced
fn:
pushl %ebp
movl %esp, %ebp
This is the usual function startup used in the calling convertion __cdecl, save existing %ebp to the stack then save %esp (current stack address) to %ebp. %ebp will be from now on the reference to the stack call point of this function used by the assembly code to access both parameters and local variables. To conclude this function, the opposite will be done in the future, as well as setting the "return" value to %eax.
At this point you can assume %ebp points to the current stack position where the previous %ebp is stored, %ebp + 4 is the function's return point, and everything from %ebp + 8 on is the data of the function parameters. Negative values will be accessing unused stack space, where the function can store it's local variables.
subl $16, %esp
This is reserving 22 bytes of information in the stack for local variables. It can be 22 variables of 1 byte or 1 variable of 22 bytes, there is no way to know. It can reserve unused bytes to make stack alignment.
Its important to change the value of %esp at this point so any call to push, pop and call won't overwrite this function's local variables.
movl $1, -4(%ebp)
Here it is possible to assume that the first of the local variables present in this function is a 32-bits integer at the address %ebp - 4. It's value was just set to 1. Lets call this variable i.
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
Based on what this imull is doing we can assume the function takes at least one parameter at the address %ebp + 8, and it seems its a 32-bits integer being multiplied by i. Lets call this parameter x.
movl %eax, -4(%ebp)
So far we can assume i = x * i.
subl $1, 8(%ebp)
And now x = x - 1.
.f2:
cmpl $1, 8(%ebp)
jg .f3
Here we are checking if x is greater than 1. If true, jumps to f3, that is a backward jump therefore it seems we have a loop going on.
movl -4(%ebp), %eax
leave
ret
Here outside the loop we see it set %eax to i and terminate the function, we can interpret it as return i.
Compiling the analyzed information together, we can assume the original function must have looked somewhat like this:
int fn(int x)
{
int i = 1;
while (x > 1)
{
i = i * x; // or just i *= x, same thing
x = x - 1; // or just x--, same thing
}
return i;
}
I am trying to get the max of three numbers using C to call a method in Assembly 32 bit AT & T. When the program runs, I get a segmentation fault(core dumped) error and cannot figure out why. My input has been a mix of positive/negative numbers and 1,2,3, both with the same error as a result.
Assembly
# %eax - first parameter
# %ecx - second parameter
# %edx - third parameter
.code32
.file "maxofthree.S"
.text
.global maxofthree
.type maxofthree #function
maxofthree:
pushl %ebp # old ebp
movl %esp, %ebp # skip over
movl 8(%ebp), %eax # grab first value
movl 12(%ebp), %ecx # grab second value
movl 16(%ebp), %edx # grab third value
#test for first
cmpl %ecx, %eax # compare first and second
jl firstsmaller # first smaller than second, exit if
cmpl %edx, %eax # compare first and third
jl firstsmaller # first smaller than third, exit if
leave # reset the stack pointer and pop the old base pointer
ret # return since first > second and first > third
firstsmaller: # first smaller than second or third, resume comparisons
#test for second and third against each other
cmpl %edx, %ecx # compare second and third
jg secondgreatest # second is greatest, so jump to end
movl %eax, %edx # third is greatest, move third to eax
leave # reset the stack pointer and pop the old base pointer
ret # return third
secondgreatest: # second > third
movl %ecx, %eax #move second to eax
leave # reset the stack pointer and pop the old base pointer
ret # return second
C code
#include <stdio.h>
#include <inttypes.h>
long int maxofthree(long int, long int, long int);
int main(int argc, char *argv[]) {
if (argc != 4) {
printf("Missing command line arguments. Instructions to"
" execute this program:- .\a.out <num1> <num2> <num3>");
return 0;
}
long int x = atoi(argv[1]);
long int y = atoi(argv[2]);
long int z = atoi(argv[3]);
printf("%ld\n", maxofthree(x, y, z)); // TODO change back to (x, y, z)
}
The code is causing a segmentation fault because it is trying to jump back to an invalid return address when the ret instruction is executed. This happens for all three different ret instructions.
The reason why it is occurring is because you don't pop the old base pointer before returning. A small change to the code will remove the fault. Change each ret instruction to:
leave
ret
The leave instruction will do the following:
movl %ebp, %esp
popl %ebp
Which will reset the stack pointer and pop the old base pointer that you saved.
Also, your comparisons are not doing what they are specified to do in the comments. When you do:
cmp %eax, %edx
jl firstsmaller
The jump will happen when %edx is smaller than %eax. So you want the code be
cmpl %edx, %eax
jl firstsmaller
which will jump when %eax is smaller than %edx, as specified in the comment.
Reference this this page for details on the cmp instruction in AT&T/GAS syntax.
You forgot to pop ebp before returning from the function.
Also, cmpl %eax, %ecx compares ecx to eax not the other way. So the code
cmpl %eax, %ecx
jl firstsmaller
will jump if ecx is smaller than eax.
I have this problem, I am recursively calling a function in C and C is lexically scoped, so I can only access the current stack frame. I want to extract the arguments and the local variables from the previous stack frame which was created under the previous function call while im on the current stack frame
I know that the values from the previous recursive call are still on the stack, but I cant access access these values because they're "buried" under the active stack frame?
I want to extract the arguments and local variables from the previous stack and copy them to copy_of_buried_arg and copy_of_buried_loc;
It is a requirement to use inline assembly using GAS to extract the variables, this is what I have so far, and I tried all day, I cant seem to figure it out, I drew the stack on paper and did the calculations but nothing is working, I also tried deleting calls to printf so the stack will be cleaner but I cant figure out the right arithmetic. Here is the code so far, my function halts on the second iteration
#include <stdio.h>
char glo = 97; // just for fun 97 is ascii lowercase 'a'
int copy_of_buried_arg;
char copy_of_buried_loc;
void rec(int arg) {
char loc;
loc = glo + arg * 2; // just for fun, some char arithmetic
printf("inside rec() arg=%d loc='%c'\n", arg, loc);
if (arg != 0) {
// after this assembly code runs, the copy_of_buried_arg and
// copy_of_buried_loc variables will have arg, loc values from
// the frame of the previous call to rec().
__asm__("\n\
movl 28(%esp), %eax #moving stack pointer to old ebp (pointing it to old ebp)\n\
addl $8, %eax #now eax points to the first argument for the old ebp \n\
movl (%eax), %ecx #copy the value inside eax to ecx\n\
movl %ecx, copy_of_buried_arg # copies the old argument\n\
\n\
");
printf("copy_of_buried_arg=%u copy_of_buried_loc='%c'\n",
copy_of_buried_arg, copy_of_buried_loc);
} else {
printf("there is no buried stack frame\n");// runs if argument = 0 so only the first time
}
if (arg < 10) {
rec(arg + 1);
}
}
int main (int argc, char **argv) {
rec(0);
return 0;
}
I can try to help, but don't have Linux or assembly in GAS. But the calculations should be similar:
Here's the stack after a couple of calls. A typical stack frame setup creates a linked list of stack frames, where EBP is the current stack frame and points to its old value for the previous stack frame.
+-------+
ESP-> |loc='c'| <- ESP currently points here.
+-------+
EBP-> |oldEBP |--+ <- rec(0)'s call frame
+-------+ |
|retaddr| | <- return value of rec(1)
+-------+ |
|arg=1 | | <- pushed argument of rec(1)
+-------+ |
|loc='a'| | <- local variable of rec(0)
+-------+ |
+--|oldEBP |<-+ <- main's call frame
| +-------+
| |retaddr| <- return value of rec(0)
| +-------+
| |arg=0 | <- pushed argument of rec(0)
| +-------+
\|/
to main's call frame
This is created by the following sequence:
Push arguments last arg first.
Call the function, pushing a return address.
Push soon-to-be old EBP, preserving previous stack frame.
Move ESP (top of stack, containing oldEBP) into EBP, creating new stack frame.
Subtract space for local variables.
This has the effect on a 32-bit stack that EBP+8 will always be the first parameter of the call, EBP+12 the 2nd parameter, etc. EBP-n is always an offset to a local variable.
The code to get the previous loc and arg is then (in MASM):
mov ecx,[ebp] // get previous stack frame
mov edx,[ecx]+8 // get first argument
mov copy_of_buried_arg,edx // save it
mov dl,[ecx]-1 // get first char-sized local variable.
mov copy_of_buried_loc,dl // save it
or my best guess in GAS (I don't know it but know it is backwards to MASM):
movl (%ebp),ecx
movl 8(%ecx),edx
movl edx,copy_of_buried_arg
movb -1(%ecx),dl
movb dl,copy_of_buried_loc
Output of your code with my MASM using VS2010 on Windows:
inside rec() arg=0 loc='a'
there is no buried stack frame
inside rec() arg=1 loc='c'
copy_of_buried_arg=0 copy_of_buried_loc='a'
inside rec() arg=2 loc='e'
copy_of_buried_arg=1 copy_of_buried_loc='c'
inside rec() arg=3 loc='g'
copy_of_buried_arg=2 copy_of_buried_loc='e'
inside rec() arg=4 loc='i'
copy_of_buried_arg=3 copy_of_buried_loc='g'
inside rec() arg=5 loc='k'
copy_of_buried_arg=4 copy_of_buried_loc='i'
inside rec() arg=6 loc='m'
copy_of_buried_arg=5 copy_of_buried_loc='k'
inside rec() arg=7 loc='o'
copy_of_buried_arg=6 copy_of_buried_loc='m'
inside rec() arg=8 loc='q'
copy_of_buried_arg=7 copy_of_buried_loc='o'
inside rec() arg=9 loc='s'
copy_of_buried_arg=8 copy_of_buried_loc='q'
inside rec() arg=10 loc='u'
copy_of_buried_arg=9 copy_of_buried_loc='s'
With my compiler (gcc 3.3.4) I ended up with this:
#include <stdio.h>
char glo = 97; // just for fun 97 is ascii lowercase 'a'
int copy_of_buried_arg;
char copy_of_buried_loc;
void rec(int arg) {
char loc;
loc = glo + arg * 2; // just for fun, some char arithmetic
printf("inside rec() arg=%d loc='%c'\n", arg, loc);
if (arg != 0) {
// after this assembly code runs, the copy_of_buried_arg and
// copy_of_buried_loc variables will have arg, loc values from
// the frame of the previous call to rec().
__asm__ __volatile__ (
"movl 40(%%ebp), %%eax #\n"
"movl %%eax, %0 #\n"
"movb 31(%%ebp), %%al #\n"
"movb %%al, %1 #\n"
: "=m" (copy_of_buried_arg), "=m" (copy_of_buried_loc)
:
: "eax"
);
printf("copy_of_buried_arg=%u copy_of_buried_loc='%c'\n",
copy_of_buried_arg, copy_of_buried_loc);
} else {
printf("there is no buried stack frame\n");// runs if argument = 0 so only the first time
}
if (arg < 10) {
rec(arg + 1);
}
}
int main (int argc, char **argv) {
rec(0);
return 0;
}
Here's the disassembly of the relevant part (get it with gcc file.c -S -o file.s):
_rec:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
addl %eax, %eax
addb _glo, %al
movb %al, -1(%ebp)
subl $4, %esp
movsbl -1(%ebp),%eax
pushl %eax
pushl 8(%ebp)
pushl $LC0
call _printf
addl $16, %esp
cmpl $0, 8(%ebp)
je L2
/APP
movl 40(%ebp), %eax #
movl %eax, _copy_of_buried_arg #
movb 31(%ebp), %al #
movb %al, _copy_of_buried_loc #
/NO_APP
subl $4, %esp
movsbl _copy_of_buried_loc,%eax
pushl %eax
pushl _copy_of_buried_arg
pushl $LC1
call _printf
addl $16, %esp
jmp L3
L2:
subl $12, %esp
pushl $LC2
call _printf
addl $16, %esp
L3:
cmpl $9, 8(%ebp)
jg L1
subl $12, %esp
movl 8(%ebp), %eax
incl %eax
pushl %eax
call _rec
addl $16, %esp
L1:
leave
ret
Those offsets from ebp (40 and 31) initially were set to an arbitrary guess value (e.g. 0) and then refined through observation of the disassembly and some simple calculations.
Note that the function uses extra 12+4=16 bytes of stack for the alignment and the parameter when it calls itself recursively:
subl $12, %esp
movl 8(%ebp), %eax
incl %eax
pushl %eax
call _rec
addl $16, %esp
There are also 4 bytes of the return address.
And then the function uses 4+8=12 bytes for the old ebp and its local variables:
_rec:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
So, in total the stack grows by 16+4+12=32 bytes with each recursive call.
Now, we know how to get our local arg and loc through ebp:
movl 8(%ebp), %eax ; <- arg
addl %eax, %eax
addb _glo, %al
movb %al, -1(%ebp) ; <- loc
So, we just add 32 to those offsets 8 and -1 and arrive at 40 and 31.
Do the same and you'll get your "buried" variables.
Wanting to see the output of the compiler (in assembly) for some C code, I wrote a simple program in C and generated its assembly file using gcc.
The code is this:
#include <stdio.h>
int main()
{
int i = 0;
if ( i == 0 )
{
printf("testing\n");
}
return 0;
}
The generated assembly for it is here (only the main function):
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
leave
ret
I am at an absolute loss to correlate the C code and assembly code. All that the code has to do is store 0 in a register and compare it with a constant 0 and take suitable action. But what is going on in the assembly?
Since main is special you can often get better results by doing this type of thing in another function (preferably in it's own file with no main). For example:
void foo(int x) {
if (x == 0) {
printf("testing\n");
}
}
would probably be much more clear as assembly. Doing this would also allow you to compile with optimizations and still observe the conditional behavior. If you were to compile your original program with any optimization level above 0 it would probably do away with the comparison since the compiler could go ahead and calculate the result of that. With this code part of the comparison is hidden from the compiler (in the parameter x) so the compiler can't do this optimization.
What the extra stuff actually is
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
This is setting up a stack frame for the current function. In x86 a stack frame is the area between the stack pointer's value (SP, ESP, or RSP for 16, 32, or 64 bit) and the base pointer's value (BP, EBP, or RBP). This is supposedly where local variables live, but not really, and explicit stack frames are optional in most cases. The use of alloca and/or variable length arrays would require their use, though.
This particular stack frame construction is different than for non-main functions because it also makes sure that the stack is 16 byte aligned. The subtraction from ESP increases the stack size by more than enough to hold local variables and the andl effectively subtracts from 0 to 15 from it, making it 16 byte aligned. This alignment seems excessive except that it would force the stack to also start out cache aligned as well as word aligned.
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
I don't know what all this does. alloca increases the stack frame size by altering the value of the stack pointer.
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
I think you know what this does. If not, the movl just befrore the call is moving the address of your string into the top location of the stack so that it may be retrived by printf. It must be passed on the stack so that printf can use it's address to infer the addresses of printf's other arguments (if any, which there aren't in this case).
leave
This instruction removes the stack frame talked about earlier. It is essentially movl %ebp, %esp followed by popl %ebp. There is also an enter instruction which can be used to construct stack frames, but gcc didn't use it. When stack frames aren't explicitly used, EBP may be used as a general puropose register and instead of leave the compiler would just add the stack frame size to the stack pointer, which would decrease the stack size by the frame size.
ret
I don't need to explain this.
When you compile with optimizations
I'm sure you will recompile all fo this with different optimization levels, so I will point out something that may happen that you will probably find odd. I have observed gcc replacing printf and fprintf with puts and fputs, respectively, when the format string did not contain any % and there were no additional parameters passed. This is because (for many reasons) it is much cheaper to call puts and fputs and in the end you still get what you wanted printed.
Don't worry about the preamble/postamble - the part you're interested in is:
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
It should be pretty self-evident as to how this correlates with the original C code.
The first part is some initialization code, which does not make any sense in the case of your simple example. This code would be removed with an optimization flag.
The last part can be mapped to C code:
movl $0, -4(%ebp) // put 0 into variable i (located at -4(%ebp))
cmpl $0, -4(%ebp) // compare variable i with value 0
jne L2 // if they are not equal, skip to after the printf call
movl $LC0, (%esp) // put the address of "testing\n" at the top of the stack
call _printf // do call printf
L2:
movl $0, %eax // return 0 (calling convention: %eax has the return code)
Well, much of it is the overhead associated with the function. main() is just a function like any other, so it has to store the return address on the stack at the start, set up the return value at the end, etc.
I would recommend using GCC to generate mixed source code and assembler which will show you the assembler generated for each sourc eline.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
See http://www.delorie.com/djgpp/v2faq/faq8_20.html
On linux, just use gcc. On Windows down load Cygwin http://www.cygwin.com/
Edit - see also this question Using GCC to produce readable assembly?
and http://oprofile.sourceforge.net/doc/opannotate.html
You need some knowledge about Assembly Language to understand assembly garneted by C compiler.
This tutorial might be helpful
See here more information. You can generate the assembly code with C comments for better understanding.
gcc -g -Wa,-adhls your_c_file.c > you_asm_file.s
This should help you a little.