I want to allocate on the heap a stack frame for each function call, but for that I need to modify the stack base pointer for each function.
Is there any way to get that pointer and modify it in C under Linux?
The only (practical) way to manually modify ebp is inline assembly or just write a function in assembly. If you have an assembly function you could do something like this (I am not really experienced with x86 assembly fyi)
; void function(void* heapPtr)
function:
push ebp
mov reg, esp
mov esp, [esp+4]
mov ebp, esp
...
mov esp, reg
pop ebp
ret
Have a look at the function alloca(). Its use is not recommended, but most compilers has an implementation for it.
Related
I am trying to understand some assembly code so that I can work with it in C. Assume the following NASM code
get_entry_point:
%define STACK_LIMIT 8
%define return_val [ebp-4]
%define base_ptr [ebp-8]
Is the ebp register simply accessing the parameters to a function and storing them inside base_ptr and return_val, or is it doing something else?
void __stdcall get_entry_point(unsigned long return_val, unsigned long base_ptr);
And with regards to this following code
push ebp
mov ebp, esp
sub esp, STACK_LIMIT
pushad
What would this look like inside the prototyped C function above?
Is the ebp instruction
ebp is not an instruction, it's a register. It's the assembly equivalent of a variable.
%define return_val [ebp-4] is also not an instruction. It just means, that wherever the code says return_val, the assembler will treat it as if you wrote [ebp-4].
And with regards to this following code ... What would this look like inside the prototyped C function above?
This is standard code which the compiler adds to the start of every function if it needs it.
It looks like the { at the start of the function, because it is automatically inserted at the start of the function.
Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.
original c file:
#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}
This was decompiled using the following command:
C:> objdump -d -Mintel HelloStudents.exe > disasm.txt
decompliation (assembly):
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret
Im having issues mapping this output from the decompliation, to the original C file can someone help?
Thank you very much!
The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.
call 401e80, <___main>
This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.
That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:
mov DWORD PTR [esp], 0x404000
This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.
call 4025f8 <_puts>
This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.
mov eax, 0x0
The return value of main is loaded into the eax register
leave
ret
Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.
I am currently trying to understand Writing buffer overflow exploits - a tutorial for beginners.
The C code, compiled with cc -ggdb exploitable.c -o exploitable
#include <stdio.h>
void exploitableFunction (void) {
char small[30];
gets (small);
printf("%s\n", small);
}
main() {
exploitableFunction();
return 0;
}
seems to have the assembly code
0x000000000040063b <+0>: push %rbp
0x000000000040063c <+1>: mov %rsp,%rbp
0x000000000040063f <+4>: callq 0x4005f6 <exploitableFunction>
0x0000000000400644 <+9>: mov $0x0,%eax
0x0000000000400649 <+14>: pop %rbp
0x000000000040064a <+15>: retq
I think it does the following, but I'm really not sure about it and I would like to hear from somebody who is experienced with assembly code if I'm right / what is right.
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
400644: Copy the value from the address $0x0 to the EAX register
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
You want to save the base pointer because it is probably used by the calling function.
40063c: Copy the value from the stack pointer register into the base pointer register (why?)
This gives you a fixed position into the stack, which might contain parameters for the function. It can also be used as a base address for any local variables.
40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
"call" means pushing the return address (address of the next instruction) onto the stack, and then jumping to the start of the called function.
400644: Copy the value from the address $0x0 to the EAX register
It is actually the value 0 from the return statement.
400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
This restores the base pointer we saved at the top. The calling function might assume that we do.
40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
It was the constant from return 0. Using EAX for a small return value is a common convention.
I found a Link which have similar code to your own with full explenation.
40063b: push the old base pointer onto the stack to save it for later. It's pushed because this is not the only process in the code. some other process call it.
40063c: copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.
40063f: call the function in address 0x4005f6 which push the program counter into stack and load address 0x4005f6 into program conter, when the function returns, pop operation is happened to return the saved address in the stack to program counter which is 0x400644 here
400644: This instruction copies 0 into %eax, The x86 calling convention dictates that a function’s return value is stored in %eax
400649: We pop the old base pointer off the stack and store it back in %rbp
40064a: jumps back to return address, which is also stored in the stack frame. which specify the end of the program.
Also you didn't mention the assembly code for the function exploitableFunction. here is only main function
The function entry saves bp and moves sp into bp. All parameters of the function will now be addressed using bp. This is a standard cdecl convention (in Intel assembler):
; int example(char *s, int i)
push bp ; save the caller's value of bp
mov bp,sp ; set-up our base pointer to the stack-frame
sub sp, 16 ; room for automatic variables
mov ax,dword ptr [bp+8] ; ax has *s
mov bx,dword ptr [bp+12] ; bx has i
... ; do your thing
mov ax, dword ptr[result] ; function return in ax
pop bp ; restore caller's base-pointer
ret
When calling this function, the compiler pushes the parameters onto the stack and then calls the function. Upon return, it cleans up the stack:
; i= example(myString, k);
mov ax, [bp+16] ; this gets a parameter of the curent function
push ax ; this will be parameter i
mov ax, [bp-16] ; this gets a local variable
push ax ; this is parameter s
call example
add sp,8 ; remove the pushed parameters from the stack
mov dword ptr [i], ax ; save return value - always in ax
Different compilers can use different conventions about passing parameters in registers, but I think the above is the basics of calls in C (using cdecl).
That's what I understood by reading some memory segmentation documents: when a function is called, there are a few instructions (called function prologue) that save the frame pointer on the stack, copy the value of the stack pointer into the base pointer and save some memory for local variables.
Here's a trivial code I am trying to debug using GDB:
void test_function(int a, int b, int c, int d) {
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
The purpose of debugging this code was to understand what happens in the stack when a function is called: so I had to examine the memory at various step of the execution of the program (before calling the function and during its execution). Although I managed to see things like the return address and the saved frame pointer by examining the base pointer, I really can't understand what I'm going to write after the disassembled code.
Disassembling:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400509 <+0>: push rbp
0x000000000040050a <+1>: mov rbp,rsp
0x000000000040050d <+4>: mov ecx,0x4
0x0000000000400512 <+9>: mov edx,0x3
0x0000000000400517 <+14>: mov esi,0x2
0x000000000040051c <+19>: mov edi,0x1
0x0000000000400521 <+24>: call 0x4004ec <test_function>
0x0000000000400526 <+29>: pop rbp
0x0000000000400527 <+30>: ret
End of assembler dump.
(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x00000000004004ec <+0>: push rbp
0x00000000004004ed <+1>: mov rbp,rsp
0x00000000004004f0 <+4>: mov DWORD PTR [rbp-0x14],edi
0x00000000004004f3 <+7>: mov DWORD PTR [rbp-0x18],esi
0x00000000004004f6 <+10>: mov DWORD PTR [rbp-0x1c],edx
0x00000000004004f9 <+13>: mov DWORD PTR [rbp-0x20],ecx
0x00000000004004fc <+16>: mov DWORD PTR [rbp-0x4],0x7a69
0x0000000000400503 <+23>: mov BYTE PTR [rbp-0x10],0x41
0x0000000000400507 <+27>: pop rbp
0x0000000000400508 <+28>: ret
End of assembler dump.
I understand that "saving the frame pointer on the stack" is done by " push rbp", "copying the value of the stack pointer into the base pointer" is done by "mov rbp, rsp" but what is getting me confused is the lack of a "sub rsp $n_bytes" for "saving some memory for local variables". I've seen that in a lot of exhibits (even in some topics here on stackoverflow).
I also read that arguments should have a positive offset from the base pointer (after it's filled with the stack pointer value), since if they are located in the caller function and the stack grows toward lower addresses it makes perfect sense that when the base pointer is updated with the stack pointer value the compiler goes back in the stack by adding some positive numbers. But my code seems to store them in a negative offset, just like local variables.. I also can't understand why they are put in those registers (in the main).. shouldn't they be saved directly in the rsp "offsetted"?
Maybe these differences are due to the fact that I'm using a 64 bit system, but my researches didn't lead me to anything that would explain what I am facing.
The System V ABI for x86-64 specifies a red zone of 128 bytes below %rsp. These 128 bytes belong to the function as long as it doesn't call any other function (it is a leaf function).
Signal handlers (and functions called by a debugger) need to respect the red zone, since they are effectively involuntary function calls. All of the local variables of your test_function, which is a leaf function, fit into the red zone, thus no adjustment of %rsp is needed. (Also, the function has no visible side-effects and would be optimized out on any reasonable optimization setting).
You can compile with -mno-red-zone to stop the compiler from using space below the stack pointer. Kernel code has to do this because hardware interrupts don't implement a red-zone.
But my code seems to store them in a negative offset, just like local variables
The first x86_64 arguments are passed on registers, not on the stack. So when rbp is set to rsp, they are not on the stack, and cannot be on a positive offset.
They are being pushed only to:
save register state for a second function call.
In this case, this is not required since it is a leaf function.
make register allocation easier.
But an optimized allocator could do a better job without memory spill here.
The situation would be different if you had:
x86_64 function with lots of arguments. Those that don't fit on registers go on the stack.
IA-32, where every argument goes on the stack.
the lack of a "sub rsp $n_bytes" for "saving some memory for local variables".
The missing sub rsp on red zone of leaf function part of the question had already been asked at: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
I am writing an assembly function to be called from C that will call the sidt machine instruction with a memory address passed to the C function. From my analysis of the code produced by MSVC10, I have constructed the following code (YASM syntax):
SECTION .data
SECTION .text
GLOBAL _sidtLoad
_sidtLoad:
push ebp
mov ebp,esp
sub esp,0C0h
push ebx
push esi
push edi
lea edi,[ebp-0C0h]
mov ecx,30h
mov eax,0CCCCCCCCh
sidt [ebp+8]
pop edi
pop esi
pop ebx
add esp,0C0h
cmp ebp,esp
mov esp,ebp
pop ebp
ret
Here is the C function signature:
void sidtLoad (void* mem);
As far as I can see everything should work, I have even checked the memory address passed to the function and have seen that it matches the address stored at [ebp+8] (the bytes are reversed, which I presume is a result of endianness and should be handled by the machine). I have tried other arguments for the sidt instruction, such as [ebp+12], [ebp+4], [ebp-8] etc but no luck.
P.S I am writing this in an external assembly module in order to get around the lack of inline assembly when targeting x64 using MSVC10. However, this particular assembly function/program is being run as x86, just so I can get to grips with the whole process.
P.P.S I am not using the __sidt intrinsic as the Windows Driver Kit (WDK) doesn't seem to support it. Or at least I can't get it to work with it!
Your code will write the IDT into the memory address ebp+8, which is not what you want. Instead, you need something like:
mov eax, [ebp+8]
sidt [eax]
compile:
int sidLoad_fake(void * mem) {
return (int)mem;
}
to assembly, and then look at where it pulls the value from to put in the return register.
I think that the first few arguments on x86_64 are passed in registers.