Let's say I have this simple program in C.
int my_func(int a, int b, int c) //0x4000
{
int d = 0;
int e = 0;
return e+d;
}
int main()
{
my_func(1,2,3); // 0x5000
return 0;
}
Ignoring the fact that it is essentially all dead code which can be completely optimized away. We'll say that my_func() lives at address 0x4000 and it is being called at address 0x5000.
From my understanding, a c compiler (I understand they can operate differently by vendor) may:
push c to the stack
push b to the stack
push a to the stack
push 0x5000 to the stack (return address)
call 0x4000
Then I'm assuming to access a it uses sp (stack pointer) + 1. b is sp+2 and c is sp+3.
Since d and e are on the stack, I'm guessing our stack would now look like this?
c
b
a
0x5000
d
e
When we get to the end of the function.
Does it then pop e and d off the stack?
Then... push e+d? Or save it to a register to be used after return?
Return to 0x5000 because it's the top of the stack?
Then pop the return address (0x5000) and a, b and c?
I'm guessing this is why old c required all the variables to be declared at the top of a function so that the compiler could count the number of pops it needed to perform at the end of the function?
I understand that it could have stored 0x5000 in a register, but a C program is able to go multiple levels deep into many functions and there are only so many registers...
Thanks!
In default calling convention for C, caller frees function argument after return from function. But function itself manages its own variables on stack. For example here is your code in assembly without any optimization:
my_func:
push ebp // +
mov ebp, esp // These 2 lines prepare function stack
sub esp, 16 // reserve memory for local variables
mov DWORD PTR [ebp-4], 0
mov DWORD PTR [ebp-8], 0
mov edx, DWORD PTR [ebp-8]
mov eax, DWORD PTR [ebp-4]
add eax, edx // <--return value in eax
leave // return esp to what it was at start of function
ret // return to caller
main:
push ebp
mov ebp, esp
push 3
push 2
push 1
call my_func
add esp, 12 // <- return esp to what it was before pushing arguments
mov eax, 0
leave
ret
As you see, there is a add esp, 12 in main for returning esp as it was before pushing arguments. In my_func there is a pair like this:
push ebp
mov ebp, esp
sub esp, 16 // <--- size of stack
...
leave
ret
This pair set is used for reserving some memory as stack. leave reverses the effect of push ebp/move ebp,esp. And function used ebp for accessing its arguments and stack-allocated variables. Return value is always in eax.
A quick allocated stack size note:
As you see, in function, there is a add esp, 16 instruction even though you only keep 2 variable of type int on stack which has a total size of 8 bytes. It is because stack size is aligned to specific boundaries (At least with default compile options). If you add 2 more int variables to my_func, this instruction is still add esp, 16, because total stack is still in 16 byte alignment. But if you add a 3rd variable of int, this instruction becomes add esp, 32. This alignment can be configured by -mpreferred-stack-boundary option in GCC.
By the way, all of these are for 32-bit compilation of code.In contrast, you normally never pass argument via stack pushing in 64-bit and you pass them through registers. As mentioned in comment, in 64-bit arguments are only passed through stack starting 5th argument(on microsoft x64 calling convention).
Update:
From default calling convention, In mean cdecl which is normally used when you compile your code for x86, without any compiler options or specific function attributes. If you change function call to stdcall as an example, all these will change.
Related
Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.
Assume this code:
int add(int a, int b){
int c = a+b;
return c;
}
int main(){
printf("%d\n", add(3,4));
}
The following is usually how this is implemented in assembly:
- push 4 to stack
- push 3 to stack
- push return address which is the address of the next instruction, `print()` to stack
- call add
- do addition and push c on the stack
- pop c from stack ??
- return to main
So what happens to the return value, it can't be on the add frame as it will be cleared at the end. Does it get put onto the stack of main?
Let's assume the values are pushed to the tack and not a register.
It depends on the architecture and calling convention. In x86-32 just about every calling convention has the return value in eax or edx:eax for 64-bit results. So your add function might have the instructions:
mov eax, dword ptr [esp+4] ; put 1st arg in eax
add eax, dword ptr [esp+8] ; add eax with 2nd arg
ret ; return
No extra work is needed since the return value is already supposed to be in eax.
That said you aren't going to find a "general case" answer for this unless you are asking about a specific architecture, and even then, there can be multiple different calling conventions on it.
I am trying to understand some basic assembly code concepts and am getting stuck on how the assembly code determines where to place things on the stack and how much space to give it.
To start playing around with it, I entered this simple code in godbolt.org's compiler explorer.
int main(int argc, char** argv) {
int num = 1;
num++;
return num;
}
and got this assembly code
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
mov DWORD PTR [rbp-4], 1
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
So a couple questions here:
Shouldn't the parameters have been placed on the stack before the call? Why are argc and argv placed at offset 20 and 32 from the base pointer of the current stack frame? That seems really far down to put them if we only need room for the one local variable num. Is there a reason for all of this extra space?
The local variable is stored at 4 below the base pointer. So if we were visualizing this in the stack and say the base pointer currently pointed at 0x00004000 (just making this up for an example, not sure if that's realistic), then we place the value at 0x00003FFC, right? And an integer is size 4 bytes, so does it take up the memory space from 0x00003FFC downward to 0x00003FF8, or does it take up the memory space from 0x00004000 to 0x00003FFC?
It looks like stack pointer was never moved down to allow room for this local variable. Shouldn't we have done something like sub rsp, 4 to make room for the local int?
And then if I modify this to add more locals to it:
int main(int argc, char** argv) {
int num = 1;
char *str1 = {0};
char *str2 = "some string";
num++;
return num;
}
Then we get
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-36], edi
mov QWORD PTR [rbp-48], rsi
mov DWORD PTR [rbp-4], 1
mov QWORD PTR [rbp-16], 0
mov QWORD PTR [rbp-24], OFFSET FLAT:.LC0
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
So now the main arguments got pushed down even further from base pointer. Why is the space between the first two locals 12 bytes but the space between the second two locals 8 bytes? Is that because of the sizes of the types?
I'm only going to answer this part of the question:
Shouldn't the parameters have been placed on the stack before the call? Why are argc and argv placed at offset 20 and 32 from the base pointer of the current stack frame?
The parameters to main are indeed set up by the code that calls main.
This appears to be code compiled according to the 64-bit ELF psABI for x86, in which the first several parameters to any function are passed in registers, not on the stack. When control reaches the main: label, argc will be in edi, argv will be in rsi, and a third argument conventionally called envp will be in rdx. (You didn't declare that argument, so you can't use it, but the code that calls main is generic and always sets it up.)
The instructions I believe you are referring to
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
are what compiler nerds call spill instructions: they are copying the initial values of the argc and argv parameters from their original registers to the stack, just in case those registers are needed for something else. As several other people pointed out, this is unoptimized code; these instructions are unnecessary and would not have been emitted if you had turned optimization on. Of course, if you'd turned optimization on you'd have gotten code that doesn't touch the stack at all:
main:
mov eax, 2
ret
In this ABI, the compiler is allowed to put the "spill slots," to which register values are saved, wherever it wants within the stack frame. Their locations do not have to make sense, and may vary from compiler to compiler, from patchlevel to patchlevel of the same compiler, or with apparently-unconnected changes to the source code.
(Some ABIs do specify stack frame layout in some detail, e.g. IIRC the 32-bit Windows ABI does this, to facilitate "unwinding", but that's not important right now.)
(To underline that the arguments to main are in registers, this is the assembly I get at -O1 from
int main(int argc) { return argc + 1; }
:
main:
lea eax, [rdi+1]
ret
Still doesn't do anything with the stack! (Besides ret.))
This is "compiler 101" and what you want to research is "calling convention" and "stack frame". The details are compiler/OS/optimizations dependent. Briefly, incoming parameters may be in registers or on stack. When a function is entered, it may create a stack frame to save some of the registers. And then it may define a "frame pointer" to reference stack locals and stack parameters off the frame pointer. Sometimes the stack pointer is used as a frame pointer as well.
As for registers, usually someone (company) would define a calling convention and specifies which registers are "volatile", meaning that they can be used by a routine without issues, and "preserved", meaning that if a routine uses them, they will have to be saved and restored on function entry and exit. The calling convention also specifies which registers (if any) are used for parameter passing and function return.
I am currently trying to understand Writing buffer overflow exploits - a tutorial for beginners.
The C code, compiled with cc -ggdb exploitable.c -o exploitable
#include <stdio.h>
void exploitableFunction (void) {
char small[30];
gets (small);
printf("%s\n", small);
}
main() {
exploitableFunction();
return 0;
}
seems to have the assembly code
0x000000000040063b <+0>: push %rbp
0x000000000040063c <+1>: mov %rsp,%rbp
0x000000000040063f <+4>: callq 0x4005f6 <exploitableFunction>
0x0000000000400644 <+9>: mov $0x0,%eax
0x0000000000400649 <+14>: pop %rbp
0x000000000040064a <+15>: retq
I think it does the following, but I'm really not sure about it and I would like to hear from somebody who is experienced with assembly code if I'm right / what is right.
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
400644: Copy the value from the address $0x0 to the EAX register
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
You want to save the base pointer because it is probably used by the calling function.
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
This gives you a fixed position into the stack, which might contain parameters for the function. It can also be used as a base address for any local variables.
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
"call" means pushing the return address (address of the next instruction) onto the stack, and then jumping to the start of the called function.
400644: Copy the value from the address $0x0 to the EAX register
It is actually the value 0 from the return statement.
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
This restores the base pointer we saved at the top. The calling function might assume that we do.
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
It was the constant from return 0. Using EAX for a small return value is a common convention.
I found a Link which have similar code to your own with full explenation.
40063b: push the old base pointer onto the stack to save it for later. It's pushed because this is not the only process in the code. some other process call it.
40063c: copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.
40063f: call the function in address 0x4005f6 which push the program counter into stack and load address 0x4005f6 into program conter, when the function returns, pop operation is happened to return the saved address in the stack to program counter which is 0x400644 here
400644: This instruction copies 0 into %eax, The x86 calling convention dictates that a function’s return value is stored in %eax
400649: We pop the old base pointer off the stack and store it back in %rbp
40064a: jumps back to return address, which is also stored in the stack frame. which specify the end of the program.
Also you didn't mention the assembly code for the function exploitableFunction. here is only main function
The function entry saves bp and moves sp into bp. All parameters of the function will now be addressed using bp. This is a standard cdecl convention (in Intel assembler):
; int example(char *s, int i)
push bp ; save the caller's value of bp
mov bp,sp ; set-up our base pointer to the stack-frame
sub sp, 16 ; room for automatic variables
mov ax,dword ptr [bp+8] ; ax has *s
mov bx,dword ptr [bp+12] ; bx has i
... ; do your thing
mov ax, dword ptr[result] ; function return in ax
pop bp ; restore caller's base-pointer
ret
When calling this function, the compiler pushes the parameters onto the stack and then calls the function. Upon return, it cleans up the stack:
; i= example(myString, k);
mov ax, [bp+16] ; this gets a parameter of the curent function
push ax ; this will be parameter i
mov ax, [bp-16] ; this gets a local variable
push ax ; this is parameter s
call example
add sp,8 ; remove the pushed parameters from the stack
mov dword ptr [i], ax ; save return value - always in ax
Different compilers can use different conventions about passing parameters in registers, but I think the above is the basics of calls in C (using cdecl).
I made a simple program which will just push a number and display it on the screen but
don't know what is going wrong
section .data
value db 10
section .text
global main
extern printf
main:
push 10 //can we push value directly on stack?
call printf
add esp,4
ret
Getting Segmentation fault for above.
section .data
value db 10
section .text
global main
extern printf
main:
push [value]
call printf
add esp,4
ret
In second version will be pushing value pointed to by value variable on to stack
But getting "operation size not specified"
Yes, you can push any DWORD value (in 32-bit assembler) onto the stack.
The problem in the first code fragment is that printf expects the first argument to be a format string (in C, you'd write printf("%d\n", 10);). So something like
section .data
fmt db "%d", 10, 0
...
push 10
push fmt
call printf
add esp, 8
will work.
In the second code fragment, instead of push [value] you should write push dword [value], but that's not correct if your value variable is a single byte. Either declare it as a DWORD (dd), or perform
movsx eax, byte [value] ; if it's a signed integer; movzx for unsigned
push eax
And one more thing. When calling printf (or any of the C library functions), beware of stack alignment. Some platforms require that stack is 16-byte aligned at the time of a function call (this is necessary for correct execution of optimized CPU instructions like SSE). So, to make the stack aligned:
push ebp
mov ebp, esp
sub esp, 8 ; reserve 8 bytes for parameters
and esp, -16 ; align the stack (the reserved space can increase)
mov dword [esp], fmt ; put parameters into stack
mov dword [esp+4], 10
call printf
mov esp, ebp ; restore stack
pop ebp