My question is related to Stack allocation, padding, and alignment. Consider the following function:
void func(int a,int b)
{
char buffer[5];
}
At assembly level, the function looks like this:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
I want to know how the 24 bytes on the stack is allocated. I understand that 16 bytes is allocated for the char buffer[5]. I don't understand why the extra 8 bytes are for and how they are allocated. The top answer in the above link says that it is for ret and leave. Can someone please expand on that?
I'm thinking that the stack structure looks like this:
[bottom] b , a , return address , frame pointer , buffer1 [top]
But this could be wrong because i'm writing a simple buffer overflow and trying to change the return address. But for some reason the return address is not changing. Is something else present on the stack?
There are a couple of reasons for extra space. One is for alignment of variables. A second is to introduce padding for checking the stack (typically a debug build rather than a release build use of space). A third is to have additional space for temporary storage of registers or compiler generated temporary variables.
In the C calling sequence, the way it is normally done is there will be a series of push instructions pushing the arguments onto the stack and then a call instruction is used to call the function. The call instruction will push the return address onto the stack.
When the function returns, the calling function will then remove the pushed on arguments. For instance the call to a function (this is Visual Studio 2005 with a C++ program) will look like:
push OFFSET ?pHead##3VPerson##A ; pHead
call ?exterminateStartingFrom##YAXPAVPerson###Z ; exterminateStartingFrom
add esp, 4
This is pushing the address of a variable onto the stack, calling the function (the function name is mangled per C++), and then after the called function returns, it readjusts the stack by adding to the stack pointer the number of bytes used for the address.
The following is the entry part of the called function. What this does is to allocate space on the stack for the local variables. Notice that after setting up the entry environment, it then gets the function argument from the stack.
push ebp
mov ebp, esp
sub esp, 232 ; 000000e8H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-232]
When the function returns, it basically adjusts the stack back to where it was at the time the function was called. Each function is responsible for cleaning up whatever changes it has made to the stack before it returns.
pop edi
pop esi
pop ebx
add esp, 232 ; 000000e8H
pop ebp
ret 0
You mention that you are trying to change the return address. From these examples what you can see is that the return address is after the last argument that was pushed onto the stack.
Here is a brief writeup on function call conventions. Also take a look at this document on Intel assembler instructions.
Doing some example work with Visual Studio 2005 what I see is that if I do the following code, I can access the return for this example function.
void MyFunct (unsigned short arg) {
unsigned char *retAddress = (unsigned char *)&arg;
retAddress -=4;
printf ("Return address is 0x%2.2x%2.2x%2.2x%2.2x\n", retAddress[3], retAddress[2], retAddress[1], retAddress[0]);
}
Notice that the call assembler instruction for this Windows 32 bit addressing appears to put the return address in a byte order in which the return address is stored from low byte to high byte.
The extra space is for the stack alignment, which is usually done for better performance.
Related
I am having an issue with some inline assembly. I am writing a compiler, and it is compiling to assembly, and for portability i made it add the main function in C and just use inline assembly. Though even the simplest inline assembly is giving me a segfault. Thanks for your help
int main(int argc, char** argv) {
__asm__(
"push $1\n"
);
return 0;
}
TLDR at bottom. Note: everything here is assuming x86_64.
The issue here is that compilers will effectively never use push or pop in a function body (except for prologues/epilogues).
Consider this example.
When the function begins, room is made on the stack in the prologue with:
push rbp
mov rbp, rsp
sub rsp, 32
This creates 32 bytes of room for main. Then notice how throughout the function, instead of pushing items to the stack, they are mov'd to the stack through offsets from rbp:
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
mov DWORD PTR [rbp-4], 2
mov DWORD PTR [rbp-8], 5
The reason for this is it allows for variables to be stored anywhere at anytime, and loaded from anywhere at anytime without requiring a huge amount of push/pops.
Consider the case where variables are stored using push and pop. Say a variable is stored early on in the function, let's call this foo. 8 variables on the stack later, you need foo, how should you access it?
Well, you can pop everything until foo, and then push everything back, but that's costly.
It also doesn't work when you have conditional statements. Say a variable is only ever stored if foo is some certain value. Now you have a conditional where the stack pointer could be at one of two locations after it!
For this reason, compilers always prefer to use rbp - N to store variables, as at any point in the function, the variable will still live at rbp - N.
NB: On different ABIs (such as i386 system V), parameters to arguments may be passed on the stack, but this isn't too much of an issue, as ABIs will generally specify how this should be handled. Again, using i386 system V as an example, the calling convention for a function will go something like:
push edi ; 2nd argument to the function.
push eax ; 1st argument to the function.
call my_func
; here, it can be assumed that the stack has been corrected
So, why does push actually cause an issue?
Well, I'll add a small asm snippet to the code
At the end of the function, we now have the following:
push 64
mov eax, 0
leave
ret
There's 2 things that fail now due to pushing to the stack.
The first is the leave instruction (see this thread)
The leave instruction will attempt to pop the value of rbp that was stored at the beginning of the function (notice the only push that the compiler generates is at the start: push rbp).
This is so that the stack frame of the caller is preserved following main. By pushing to the stack, in our case rbp is now going to be set to 64, since the last value pushed is 64. When the callee of main resumes it's execution, and tries to access a value at say, rbp - 8, a crash will occur, as rbp - 8 is 0x38 in hex, which is an invalid address.
But that assumes the callee even get's execution back!
After rbp has it's value restored with the invalid value, the next thing on the stack will be the original value of rbp.
The ret instruction will pop a value from the stack, and return to that address...
Notice how this might be slightly problematic?
The CPU is going to try and jump to the value of rbp stored at the start of the function!
On nearly every modern program, the stack is a "no execute" zone (see here), and attempting to execute code from there will immediately cause a crash.
So, TLDR: Pushing to the stack violates assumptions made by the compiler, most importantly about the return address of the function. This violation causes program execution to end up on the stack (generally), which will cause a crash
Let's say I have this simple program in C.
int my_func(int a, int b, int c) //0x4000
{
int d = 0;
int e = 0;
return e+d;
}
int main()
{
my_func(1,2,3); // 0x5000
return 0;
}
Ignoring the fact that it is essentially all dead code which can be completely optimized away. We'll say that my_func() lives at address 0x4000 and it is being called at address 0x5000.
From my understanding, a c compiler (I understand they can operate differently by vendor) may:
push c to the stack
push b to the stack
push a to the stack
push 0x5000 to the stack (return address)
call 0x4000
Then I'm assuming to access a it uses sp (stack pointer) + 1. b is sp+2 and c is sp+3.
Since d and e are on the stack, I'm guessing our stack would now look like this?
c
b
a
0x5000
d
e
When we get to the end of the function.
Does it then pop e and d off the stack?
Then... push e+d? Or save it to a register to be used after return?
Return to 0x5000 because it's the top of the stack?
Then pop the return address (0x5000) and a, b and c?
I'm guessing this is why old c required all the variables to be declared at the top of a function so that the compiler could count the number of pops it needed to perform at the end of the function?
I understand that it could have stored 0x5000 in a register, but a C program is able to go multiple levels deep into many functions and there are only so many registers...
Thanks!
In default calling convention for C, caller frees function argument after return from function. But function itself manages its own variables on stack. For example here is your code in assembly without any optimization:
my_func:
push ebp // +
mov ebp, esp // These 2 lines prepare function stack
sub esp, 16 // reserve memory for local variables
mov DWORD PTR [ebp-4], 0
mov DWORD PTR [ebp-8], 0
mov edx, DWORD PTR [ebp-8]
mov eax, DWORD PTR [ebp-4]
add eax, edx // <--return value in eax
leave // return esp to what it was at start of function
ret // return to caller
main:
push ebp
mov ebp, esp
push 3
push 2
push 1
call my_func
add esp, 12 // <- return esp to what it was before pushing arguments
mov eax, 0
leave
ret
As you see, there is a add esp, 12 in main for returning esp as it was before pushing arguments. In my_func there is a pair like this:
push ebp
mov ebp, esp
sub esp, 16 // <--- size of stack
...
leave
ret
This pair set is used for reserving some memory as stack. leave reverses the effect of push ebp/move ebp,esp. And function used ebp for accessing its arguments and stack-allocated variables. Return value is always in eax.
A quick allocated stack size note:
As you see, in function, there is a add esp, 16 instruction even though you only keep 2 variable of type int on stack which has a total size of 8 bytes. It is because stack size is aligned to specific boundaries (At least with default compile options). If you add 2 more int variables to my_func, this instruction is still add esp, 16, because total stack is still in 16 byte alignment. But if you add a 3rd variable of int, this instruction becomes add esp, 32. This alignment can be configured by -mpreferred-stack-boundary option in GCC.
By the way, all of these are for 32-bit compilation of code.In contrast, you normally never pass argument via stack pushing in 64-bit and you pass them through registers. As mentioned in comment, in 64-bit arguments are only passed through stack starting 5th argument(on microsoft x64 calling convention).
Update:
From default calling convention, In mean cdecl which is normally used when you compile your code for x86, without any compiler options or specific function attributes. If you change function call to stdcall as an example, all these will change.
I am currently trying to understand Writing buffer overflow exploits - a tutorial for beginners.
The C code, compiled with cc -ggdb exploitable.c -o exploitable
#include <stdio.h>
void exploitableFunction (void) {
char small[30];
gets (small);
printf("%s\n", small);
}
main() {
exploitableFunction();
return 0;
}
seems to have the assembly code
0x000000000040063b <+0>: push %rbp
0x000000000040063c <+1>: mov %rsp,%rbp
0x000000000040063f <+4>: callq 0x4005f6 <exploitableFunction>
0x0000000000400644 <+9>: mov $0x0,%eax
0x0000000000400649 <+14>: pop %rbp
0x000000000040064a <+15>: retq
I think it does the following, but I'm really not sure about it and I would like to hear from somebody who is experienced with assembly code if I'm right / what is right.
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
400644: Copy the value from the address $0x0 to the EAX register
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
You want to save the base pointer because it is probably used by the calling function.
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
This gives you a fixed position into the stack, which might contain parameters for the function. It can also be used as a base address for any local variables.
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
"call" means pushing the return address (address of the next instruction) onto the stack, and then jumping to the start of the called function.
400644: Copy the value from the address $0x0 to the EAX register
It is actually the value 0 from the return statement.
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
This restores the base pointer we saved at the top. The calling function might assume that we do.
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
It was the constant from return 0. Using EAX for a small return value is a common convention.
I found a Link which have similar code to your own with full explenation.
40063b: push the old base pointer onto the stack to save it for later. It's pushed because this is not the only process in the code. some other process call it.
40063c: copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.
40063f: call the function in address 0x4005f6 which push the program counter into stack and load address 0x4005f6 into program conter, when the function returns, pop operation is happened to return the saved address in the stack to program counter which is 0x400644 here
400644: This instruction copies 0 into %eax, The x86 calling convention dictates that a function’s return value is stored in %eax
400649: We pop the old base pointer off the stack and store it back in %rbp
40064a: jumps back to return address, which is also stored in the stack frame. which specify the end of the program.
Also you didn't mention the assembly code for the function exploitableFunction. here is only main function
The function entry saves bp and moves sp into bp. All parameters of the function will now be addressed using bp. This is a standard cdecl convention (in Intel assembler):
; int example(char *s, int i)
push bp ; save the caller's value of bp
mov bp,sp ; set-up our base pointer to the stack-frame
sub sp, 16 ; room for automatic variables
mov ax,dword ptr [bp+8] ; ax has *s
mov bx,dword ptr [bp+12] ; bx has i
... ; do your thing
mov ax, dword ptr[result] ; function return in ax
pop bp ; restore caller's base-pointer
ret
When calling this function, the compiler pushes the parameters onto the stack and then calls the function. Upon return, it cleans up the stack:
; i= example(myString, k);
mov ax, [bp+16] ; this gets a parameter of the curent function
push ax ; this will be parameter i
mov ax, [bp-16] ; this gets a local variable
push ax ; this is parameter s
call example
add sp,8 ; remove the pushed parameters from the stack
mov dword ptr [i], ax ; save return value - always in ax
Different compilers can use different conventions about passing parameters in registers, but I think the above is the basics of calls in C (using cdecl).
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main(int argc, char *argv[]){
char a[5];
char b[10];
strcpy(a,"nop");
gets(b);
printf("Hello there %s. Value in a is %s.\n",b,a);
exit(0);
}
The first few lines of assembly output show:
push %ebp
mov %esp,%ebp
sub $0x28,%esp
mov $0x80c5b08,%edx
lea -0xd(%ebp),%eax
mov (%edx),%edx
mov %edx,(%eax)
lea -0x17(%ebp),%eax
mov %eax,(%esp)
call 0x8049c60 <gets>
I'm confused for a few reason. First, why do we do sub $0x28,%esp which accounts for 40 bytes if char *argv[] accounts for 8 bytes, int argc accounts for 4, a accounts for 8, and b accounts for 12 -> 8+4+8+12 = 32?
I'm also struggling to see where strcpy happens and what accounts for the two memory addresses $0x80c5b08 and 0x8049c60.
All right, I'll give you what I can, but my assembly is a bit rusty. First, let's start with what you are looking at. With AT&T syntax, you basically have to read the address operation backwards (operation data register) compared to Intel syntax (operation register data) which is part of the reason some people prefer to read Intel.
The assembly calls are not too difficult to digest from an overview standpoint. If you look at the first two commands, the first pushes the previous base pointer address onto the stack to save it. (when this program exits, the previous base pointer address will be restored and that is where execution in the calling routine will pick back up). The second line moves the base pointer address for this program to the current stack pointer address (top of the stack) to start executing your program. Both lines are known as the assembly prolog.
push %ebp
mov %esp,%ebp
The next line subtracts 40 bytes (28 hex) from the stack pointer (the stack grows lower) to create space for the local variables a and b where the "nop" data and the resuts of gets will be copied. I'm not sure what precice alignment it is trying to achieve, but the storage for a will be 5 bytes and b 10.
sub $0x28,%esp
The following line moves the pointer address 0x80c5b08 to the general purpose dx register (edx for 80386 32-bit registers). In assembly, you put the address of the data you want to manipulate into one of the CPU registers, before you do something with it. Here it looks to be putting the memory address for "nop" in edx.
mov $0x80c5b08,%edx
The next call lea loads the effective address copies the memory address (at offset) base pointer - 14 (0xd hex) bytes into the eax register. The beginning address to a so that the string "nop" can be copied there.
lea -0xd(%ebp),%eax
The following calls to mov copy the data pointed to by edx to the memory location specified in eax. copying "nop" to a.
mov (%edx),%edx
mov %edx,(%eax)
The next lea loads the memory address for base pointer - 23 (0x17 hex) b into eax and the mov places the address on the stack before the call to gets fills the memory at that location.
lea -0x17(%ebp),%eax
mov %eax,(%esp)
call 0x8049c60 <gets>
Afterwards, there with be instructions to load the memory addresses for a and b along with the address for the static part of the string to string before calling printf. Hopefully this will help.
There may be some padding after the local variables
as there needs to be (32-bit aligned) room for the parameter for gets()
and the PC register that will be saved via the call instruction.
Note: the ebp register has to point to the next available stack address
after the local stack frame.
Note: the gets() function should never be used,
for several reasons. Use fgets() instead.
The strcpy() was replaced by the compiler, with a macro invocation.
that macro produced the following:
mov $0x80c5b08,%edx
lea -0xd(%ebp),%eax
mov (%edx),%edx
I'm also struggling to see where strcpy() happens and what accounts for the two memory addresses $0x80c5b08 and 0x8049c60"
The 0x80c5b08 is the address of the literal that is to be copied into the variable.
The 0x8049c60 is the linked address of the gets() function.
Some time ago I was experimenting with writing assembly
routines and linking it with C programs and I found that
I just can skip standard C-call prologue epilogue
push ebp
mov ebp, esp
(sub esp, 4
...
mov esp, ebp)
pop ebp
just skip it all and adress just by esp, like
mov eax, [esp+4] ;; take argument
mov [esp-4], eax ;; use some local variable storage
It seem to work quite good. Why this ebp is used - is maybe
addressing through ebp faster or what ?
There's no requirement to use a stack frame, but there are certainly some advantages:
Firstly, if every function has uses this same process, we can use this knowledge to easily determine a sequence of calls (the call stack) by reversing the process. We know that after a call instruction, ESP points to the return address, and that the first thing the called function will do is push the current EBP and then copy ESP into EBP. So, at any point we can look at the data pointed to by EBP which will be the previous EBP and that EBP+4 will be the return address of the last function call. We can therefore print the call stack (assuming 32bit) using something like (excuse the rusty C++):
void LogStack(DWORD ebp)
{
DWORD prevEBP = *((DWORD*)ebp);
DWORD retAddr = *((DWORD*)(ebp+4));
if (retAddr == 0) return;
HMODULE module;
GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (const char*)retAddr, &module);
char* fileName = new char[256];
fileName[255] = 0;
GetModuleFileNameA(module, fileName, 255);
printf("0x%08x: %s\n", retAddr, fileName);
delete [] fileName;
if (prevEBP != 0) LogStack(prevEBP);
}
This will then print out the entire sequence of calls (well, their return addresses) up until that point.
Furthermore, since EBP doesn't change unless you explicitly update it (unlike ESP, which changes when you push/pop), it's usually easier to reference data on the stack relative to EBP, rather than relative to ESP, since with the latter, you have to be aware of any push/pop instructions that might have been called between the start of the function and the reference.
As others have mentioned, you should avoid using stack addresses below ESP as any calls you make to other functions are likely to overwrite the data at these addresses. You should instead reserve space on the stack for use by your function by the usual:
sub esp, [number of bytes to reserve]
After this, the region of the stack between the initial ESP and ESP - [number of bytes reserved] is safe to use.
Before exiting your function you must release the reserved stack space using a matching:
add esp, [number of bytes reserved]
The use of EBP is of great help when debugging code, as it allows debuggers to traverse the stack frames in a call chain.
It [creates] a singly linked list that linked the frame pointer for each of the callers to a function. From the EBP for a routine, you could recover the entire call stack for a function.
See http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
And in particular the page it links to which covers your question: http://blogs.msdn.com/b/larryosterman/archive/2007/03/12/fpo.aspx
It works, However, once you'll get an interrupt, the processor will push all it's registers and flags into the stack, overwriting your value.
The stack is there for a reason, use it...