C to Assembly code - what does it mean [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to figure out exactly what is going on with the following assembly code. Can someone go down line by line and explain what is happening? I input what I think is happening (see comments) but need clarification.
.file "testcalc.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "x=%d, y=%d, z=%d, result=%d\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx // establish stack frame
andl $-16, %esp // decrement %esp by 16, align stack
pushl -4(%ecx) // push original stack pointer
pushl %ebp // save base pointer
movl %esp, %ebp // establish stack frame
pushl %ecx // save to ecx
subl $36, %esp // alloc 36 bytes for local vars
movl $11, 8(%esp) // store 11 in z
movl $6, 4(%esp) // store 6 in y
movl $2, (%esp) // store 2 in x
call calc // function call to calc
movl %eax, 20(%esp) // %esp + 20 into %eax
movl $11, 16(%esp) // WHAT
movl $6, 12(%esp) // WHAT
movl $2, 8(%esp) // WHAT
movl $.LC0, 4(%esp) // WHAT?!?!
movl $1, (%esp) // move result into address of %esp
call __printf_chk // call printf function
addl $36, %esp // WHAT?
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
Original code:
#include <stdio.h>
int calc(int x, int y, int z);
int main()
{
int x = 2;
int y = 6;
int z = 11;
int result;
result = calc(x,y,z);
printf("x=%d, y=%d, z=%d, result=%d\n",x,y,z,result);
}

You didn't show the compilation command, that could be useful, but it seems that you have optimizations enabled, so there are actually no space for local variables, they are optimized out:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
All this code above set the stack frame. Since it is the main it is a bit different from a standard stack frame: it ensures the alignment of the stack with andl $-16, %esp, just in case.
pushl %ecx
It saves the original value of esp before the alignment correction, to restore it at the end.
subl $36, %esp
It allocates 36 bytes of stack space, not for local variables but for calling parameters.
movl $11, 8(%esp)
movl $6, 4(%esp)
movl $2, (%esp)
It sets the arguments for calling calc from right to left, that is, the constants, (2, 6, 11).
call calc // function call to calc
It calls function calc with the arguments pointed to by esp.
movl %eax, 20(%esp)
movl $11, 16(%esp)
movl $6, 12(%esp)
movl $2, 8(%esp)
movl $.LC0, 4(%esp)
movl $1, (%esp)
These are the arguments for calling __printf_chk, from right to left: (1, .LC0, 2, 6, 11, %eax), where %eax is the return value of calc() (remember, no local variables!) and .LC0 is the address of the literal string, look at these lines at the top of the assembly:
.LC0:
.string "x=%d, y=%d, z=%d, result=%d\n"
But what about that mysterious 1?. Well, in Ubuntu the standard compilation options (-D_FORTIFY_SOURCE) will make printf an inline function that forwards to __printf_chk(1, ...) or something like that, that does extra checks to the arguments.
call __printf_chk
This is the call to the printf substitute function.
addl $36, %esp
This removes the 36 bytes added to the stack with subl $36, %esp.
popl %ecx
This restores the possibly unaligned stack pointer into ecx.
popl %ebp
leal -4(%ecx), %esp
This restores the previous stack frame.
ret
And this returns without a value, because you didn't write a return for main.

Related

stackframe dosen't get eliminated from the stack?

I wrote a single c program that prints input to std output. Then I converted it to assembly language. By the way I am using AT&T Syntax.
This is the simple C code.
#include <stdio.h>
int main()
{
int c;
while ((c = getchar ()) != EOF)
{
putchar(c);
}
return 0;
}
int c is a local variable.
Then I converted it to assembly language.
.file "question_1.c"
.text
.globl main
.type main, #function
//prolog
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp // we add 20 bytes to the stack
jmp .L2
.L3:
subl $12, %esp
pushl -12(%ebp)
call putchar
addl $16, %esp
.L2:
call getchar
movl %eax, -12(%ebp)
cmpl $-1, -12(%ebp)
jne .L3
//assumption this is the epilog
movl $0, %eax
movl -4(%ebp), %ecx
leave
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.9.4-2ubuntu1) 4.9.4"
.section .note.GNU-stack,"",#progbits
normally in the epilog we are supposed to addl 20 because in the prolog we subl 20.
So the is the stack frame still there?
Or am I missing out a crucial point?
I also have a question regarding the main function. Normally functions are normally "called" but where does it happen in the assembly code?
Thank you in advance.
Just after the main label, leal 4(%esp), %ecx saves four plus the stack pointer in %ecx. At the end of the routine, leal -4(%ecx), %esp writes four less than the saved value to the stack pointer. This directly restores the original value, instead of doing it by adding the amount that was subtracted.

Why does GCC produce the following asm output?

I don't understand why gcc -S -m32 produces these particular lines of code:
movl %eax, 28(%esp)
movl $desc, 4(%esp)
movl 28(%esp), %eax
movl %eax, (%esp)
call sort_gen_asm
My question is why %eax is pushed and then popped? And why movl used instead of pushl and popl respectively? Is it faster? Is there some coding convention I don't yet know? I've just started looking at asm-output closely, so I don't know much.
The C code:
void print_array(int *data, size_t sz);
void sort_gen_asm(array_t*, comparer_t);
int main(int argc, char *argv[]) {
FILE *file;
array_t *array;
file = fopen("test", "rb");
if (file == NULL) {
err(EXIT_FAILURE, NULL);
}
array = array_get(file);
sort_gen_asm(array, desc);
print_array(array->data, array->sz);
array_destroy(array);
fclose(file);
return 0;
}
It gives this output:
.file "main.c"
.section .rodata
.LC0:
.string "rb"
.LC1:
.string "test"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
movl $.LC0, 4(%esp)
movl $.LC1, (%esp)
call fopen
movl %eax, 24(%esp)
cmpl $0, 24(%esp)
jne .L2
movl $0, 4(%esp)
movl $1, (%esp)
call err
.L2:
movl 24(%esp), %eax
movl %eax, (%esp)
call array_get
movl %eax, 28(%esp)
movl $desc, 4(%esp)
movl 28(%esp), %eax
movl %eax, (%esp)
call sort_gen_asm
movl 28(%esp), %eax
movl 4(%eax), %edx
movl 28(%esp), %eax
movl (%eax), %eax
movl %edx, 4(%esp)
movl %eax, (%esp)
call print_array
movl 28(%esp), %eax
movl %eax, (%esp)
call array_destroy
movl 24(%esp), %eax
movl %eax, (%esp)
call fclose
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu8) 4.8.1"
.section .note.GNU-stack,"",#progbits
The save / load of eax is because you did not compile with optimizations. So any read/write of a variable will emit a read/write of a memory address.
Actually, for (almost) any line of code you will be able to identify the exact piece of assembler code resulting from it (let me advise you to compile with gcc -g -c -O0 and then objdump -S file.o):
#array = array_get(file);
call array_get
movl %eax, 28(%esp) #write array
#sort_gen_asm(array, desc);
movl 28(%esp), %eax #read array
movl %eax, (%esp)
...
About not pushing/poping, it is a standard zero-cost optimization. Instead of push/pop every time you want to call a function you just substract the maximum needed space to esp at the beginning of the function and then save your function arguments at the bottom of the empty space. There are a lot of advantages: faster code (no changing esp), it doesn't need to compute the argument in any particular order, and the esp will need to be substracted anyway for the local variables space.
Some things have to do with calling conventions. Others with optimisations.
sort_gen_asm seems to use cdecl calling convention which requires it's arguments to be pushed onto the stack in reverse order. thus:
movl $desc, 4(%esp)
movl %eax, (%esp)
The other moves are partially unoptimised compiler routines:
movl %eax, 28(%esp) # save contents of %eax on the stack before calling
movl 28(%esp), %eax # retrieve saved 28(%esp) in order to prepare it as an argument
# Unoptimised compiler seems to have forgotten that it's
# still in the register

assembly code of the c function

I'm trying to understand the assembly code of the C function. I could not understand why andl -16 is done at the main. Is it for allocating space for the local variables. If so why subl 32 is done for main.
I could not understand the disassembly of the func1. As read the stack grows from higher order address to low order address for 8086 processors. So here why is the access on positive side of the ebp(for parameters offset) and why not in the negative side of ebp. The local variables inside the func1 is 3 + return address + saved registers - So it has to be 20, but why is it 24? (subl $24,esp)
#include<stdio.h>
int add(int a, int b){
int res = 0;
res = a + b;
return res;
}
int func1(int a){
int s1,s2,s3;
s1 = add(a,a);
s2 = add(s1,a);
s3 = add(s1,s2);
return s3;
}
int main(){
int a,b;
a = 1;b = 2;
b = func1(a);
printf("\n a : %d b : %d \n",a,b);
return 0;
}
assembly code :
.file "sample.c"
.text
.globl add
.type add, #function
add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $0, -4(%ebp)
movl 12(%ebp), %eax
movl 8(%ebp), %edx
leal (%edx,%eax), %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
leave
ret
.size add, .-add
.globl func1
.type func1, #function
func1:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %eax
movl %eax, 4(%esp)
movl 8(%ebp), %eax
movl %eax, (%esp)
call add
movl %eax, -4(%ebp)
movl 8(%ebp), %eax
movl %eax, 4(%esp)
movl -4(%ebp), %eax
movl %eax, (%esp)
call add
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl %eax, 4(%esp)
movl -4(%ebp), %eax
movl %eax, (%esp)
call add
movl %eax, -12(%ebp)
movl -12(%ebp), %eax
leave
ret
.size func1, .-func1
.section .rodata
.LC0:
.string "\n a : %d b : %d \n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $1, 28(%esp)
movl $2, 24(%esp)
movl 28(%esp), %eax
movl %eax, (%esp)
call func1
movl %eax, 24(%esp)
movl $.LC0, %eax
movl 24(%esp), %edx
movl %edx, 8(%esp)
movl 28(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5"
.section .note.GNU-stack,"",#progbits
The andl $-16, %esp aligns the stack pointer to a multiple of 16 bytes, by clearing the low four bits.
The only places where positive offsets are used with (%ebp) are parameter accesses.
You did not state what your target platform is or what switches you used to compile with. The assembly code shows some Ubuntu identifier has been inserted, but I am not familiar with the ABI it uses, beyond that it is probably similar to ABIs generally used with the Intel x86 architecture. So I am going to guess that the ABI requires 8-byte alignment at routine calls, and so the compiler makes the stack frame of func1 24 bytes instead of 20 so that 8-byte alignment is maintained.
I will further guess that the compiler aligned the stack to 16 bytes at the start of main as a sort of “preference” in the compiler, in case it uses SSE instructions that prefer 16-byte alignment, or other operations that prefer 16-byte alignment.
So, we have:
In main, the andl $-16, %esp aligns the stack to a multiple of 16 bytes as a compiler preference. Inside main, 28(%esp) and 24(%esp) refer to temporary values the compiler saves on the stack, while 8(%esp), 4(%esp), and (%esp) are used to pass parameters to func1 and printf. We see from the fact that the assembly code calls printf but it is commented out in your code that you have pasted C source code that is different from the C source code used to generate the assembly code: This is not the correct assembly code generated from the C source code.
In func1, 24 bytes are allocated on the stack instead of 20 to maintain 8-byte alignment. Inside func1, parameters are accessed through 8(%ebp) and 4(%ebp). Locations from -12(%ebp) to -4(%ebp) are used to hold values of your variables. 4(%esp) and (%esp) are used to pass parameters to add.
Here is the stack frame of func1:
- 4(%ebp) = 20(%esp): s1.
- 8(%ebp) = 16(%esp): s2.
-12(%ebp) = 12(%esp): s3.
-16(%ebp) = 8(%esp): Unused padding.
-20(%ebp) = 4(%esp): Passes second parameter of add.
-24(%ebp) = 0(%esp): Passes first parameter of add.
I would suggest working through this with the output of objdump -S which will give you interlisting with the C source.

i have a question about stack frame structure?

i'm a new one to learn assembly. i write a c file:
#include <stdlib.h>
int max( int c )
{
int d;
d = c + 1;
return d;
}
int main( void )
{
int a = 0;
int b;
b = max( a );
return 0;
}
and i use gcc -S as01.c and create a assembly file.
.file "as01.c"
.text
.globl max
.type max, #function
max:
pushl %ebp
movl %esp, %ebp
subl $32, %esp
movl $0, -4(%ebp)
movl $1, -24(%ebp)
movl $2, -20(%ebp)
movl $3, -16(%ebp)
movl $4, -12(%ebp)
movl $6, -8(%ebp)
movl 8(%ebp), %eax
addl $1, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
leave
ret
.size max, .-max
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
subl $20, %esp
movl $0, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, (%esp)
call max
movl %eax, -8(%ebp)
"as01.s" 38L, 638C
i' confused, beacuse movl %eax, -4(%ebp) movl -4(%ebp), %eax in max(),
i know that %eax is used for returning the value of any function.
I think %eax is a temporarily register for store the c + 1.
This is right?
thank you for your answer.
You don't have optimisation turned on, so the compiler is generating really bad code. The primary storage for all your values is in the stack frame, and values are loaded into registers only long enough to do the calculations.
The code actually breaks down into:
pushl %ebp
movl %esp, %ebp
subl $32, %esp
Standard function prologue, setting up a new stack frame, and reserving 50 bytes for the stack frame.
movl $0, -4(%ebp)
movl $1, -24(%ebp)
movl $2, -20(%ebp)
movl $3, -16(%ebp)
movl $4, -12(%ebp)
movl $6, -8(%ebp)
Fill the stack frame with dummy values (presumably as a debugging aid).
movl 8(%ebp), %eax
addl $1, %eax
movl %eax, -4(%ebp)
Read the parameter c out of the stack frame, add one to it, store it into a (different) stack slot.
movl -4(%ebp), %eax
leave
ret
Read the value back out of the stack slot and return it.
If you compile this with optimisation, you'll see most of the code vanish. If you use -fomit-frame-pointer -Os, you should end up with this:
max:
movl 4(%esp), %eax
incl %eax
ret
movl %eax, -4(%ebp)
Here the value computed for d (now stored in eax) is saved in d's memory cell.
movl -4(%ebp), %eax
While here the return value (d's) gets loaded into eax, because, as you know, eax holds functions' return value.
As #David said, you're compiling without optimization, so gcc generates easy-to-debug code, which is quite inefficient and repetitive sometimes.

C - what is the return value of a semicolon?

im just curious about the following example
#include<stdio.h>
int test();
int test(){
// int a = 5;
// int b = a+1;
return ;
}
int main(){
printf("%u\n",test());
return 0;
}
i compiled it with 'gcc -Wall -o semicolon semicolon.c' to create an executable
and 'gcc -Wall -S semicolon.c' to get the assembler code which is:
.file "semicolon.c"
.text
.globl test
.type test, #function
test:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
leave
ret
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
call test
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
since im not such an assembler pro, i only know that printf prints what is in eax
but i dont fully understand what 'movl %eax, 4(%esp)' means which i assume fills eax before calling test
but what is the value then? what means 4(%esp) and what does the value of esp mean?
if i uncomment the lines in test() printf prints 6 - which is written in eax ^^
Your assembly language annotated:
test:
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
subl $4, %esp # Allocate some local space on the stack.
leave # Restore the old frame pointer/stack
ret
Note that nothing in test touches eax.
.size test, .-test
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx # Point past the return address.
andl $-16, %esp # Align the stack.
pushl -4(%ecx) # Push the return address.
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
pushl %ecx # save the old top of stack.
subl $20, %esp # Allocate some local space (for printf parameters and ?).
call test # Call test.
Note that at this point, nothing has modified eax. Whatever came into main is still here.
movl %eax, 4(%esp) # Save eax as a printf argument.
movl $.LC0, (%esp) # Send the format string.
call printf # Duh.
movl $0, %eax # Return zero from main.
addl $20, %esp # Deallocate local space.
popl %ecx # Restore the old top of stack.
popl %ebp # And the old frame pointer.
leal -4(%ecx), %esp # Fix the stack pointer,
ret
So, what gets printed out is whatever came in to main. As others have pointed out it is undefined: It depends on what the startup code (or the OS) has done to eax previously.
The semicolon has no return value, what you have there is an "empty return", like the one used to return from void functions - so the function doesn't return anything.
This actually causes a warning when compiling:
warning: `return' with no value, in function returning non-void
And I don't see anything placed in eax before calling test.
About 4(%esp), this means taking the value from the stack pointer (esp) + 4. I.e. the one-before-last word on the stack.
The return value of an int function is passed in the EAX register. The test function does not set the EAX register because no return value is given. The result is therefore undefined.
A semicolon indeed has no value.
I think the correct answer is that a return <nothing> for an int function is an error, or at least has undefined behavor. That's why compiling this with -Wall yields
semi.c: In function ‘test’:
semi.c:6: warning: ‘return’ with no value, in function returning non-void
As for what the %4,esp holds... it's a location on the stack where nothing was (intentionally) stored, so it will likely return whatever junk is found at that location. This could be the last expression evaluated to variables in the function (as in your example) or something completely different. This is what "undefined" is all about. :)

Resources