Whenever I use gdb on my programs (32bit, red hat linux), I see that the addresses used in the registers are extremely far away from the addresses that has been linked to the machine instructions.
Using a simple "hello world program"
0x080483a4 <main+0>: lea 0x4(%esp),%ecx
0x080483a8 <main+4>: and $0xfffffff0,%esp
0x080483ab <main+7>: pushl -0x4(%ecx)
....
where the info registers command produces
esp 0xffffb970 0xffffb970
ebp 0xffffb978 0xffffb978
esi 0x9d5ca0 10312864
edi 0x0 0
eip 0x80483b5 0x80483b5 <main+17>
Those esp and ebp shows a frame very different location from where the code is located.
What I think is that...
in the 4gb ram stick, there are 2^32 memory locations, the stack is located more in the top part (where it's 0xFFFFFFFF) and grows down and things like these hello program, the OS itself, other currently running programs are at the bottom and grow up (so it starts at 0x00000000), right beneath the heap which is also on the opposite side of where the stack is.
I'm probably wrong, but I was hoping to get an answer from you guys.
Also, is there more info on what's going on with the rest of the memory? Is there a course/book that covers broad questions similar to what I just asked? I feel like I have to fill in alot of gaps with what's going on in my intro assembly language class.
Thank you.
Unless you have a time machine and go back to the 1960's all addresses that you see are virtual addresses. They could be right near each other in physical memory but still one has a high address and the other low. If your OS is allocating 4K pages, only the low 12 address bits are not mapped by hardware.
Related
I'm a student learning computer security. Recently, I learned stack buffer overflow on c.
I understood its concepts and run sample codes written by c.
void main(){
char buf[] = "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0";
int* p;
p = (int*)&p + 2;
*p = (int)buf;
return;
}
Runtime Environment
Architecture: i686
OS: ubuntu 16.04 32bit
Compiler: gcc
Turn off ASLR(sysctl -w kernel.randomize_va_space=0)
Options: gcc -z execstack -mpreferred-stack-boundary=2 -fno-stack-protector
But I confuse what stack is saved and which memories are overlapped.
Above binary code, "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0",
the same assembly code is
.global main
main:
jmp strings
start:
xor %eax, %eax
movb $0xb, %al
xor %edx, %edx
xor %ecx, %ecx
popl %ebx
int $0x80
strings:
call start
.string "/bin/sh"
means execve("/bin/sh", NULL, NULL);.
When buffer overflow occurs, the binaries are overlapped return addresses of main on stack. But, I'm understood to be that the stack stores data s.t local variables, previous frame pointers, and return address.
I think the above binaries are not data, actually instructions. If so, why is this valid? The stack stores instructions and executes one-by-one by popping them? Or I misunderstand something?
And if the stack stores instructions, how do previous stack frame pointers(fp) and return addresses(ra) work?
I learned that previous function's stack frame address is stored in fp and next instruction's address on code area is stored in ra. So, when called function is terminated, sp is popped and then ra does to restore previous function state and run next instruction. Is it correct? Or I misunderstand something?
I want to know really this..
Thank you for your help.
Data are instructions are instructions are instructions.
The stack is memory is memory is memory.
That's just that.
Since the stack is ordinary memory, just like what you get with malloc, only growing downward and used implicitly by some instructions, you can put any data on the stack.
Since instructions are data, it follows that you can put instructions on the stack.
This particular exploitation works by overwriting the return address with a specific value and everything above it with a sequence of instructions.
That's why you need to tell GCC to make the stack executable (the code is on the stack) and not to generate a canary (both of these protections will suffice to prevent the attack) and also you need to tell Linux not to randomize the process address space layout (or the specific, fixed, value used to overwrite the return address won't work).
The fp and ra thing is most likely for a RISC architecture, x86 doesn't have such registers.
The execution flow is redirected when main returns (with ret), that's what ret does.
Look in Intel's manuals how the call/ret pair works and then see it in practice by just stepping into a call with a debugger.
Make sure you understand the calling convention and keep an eye on the stack every time you step.
I am trying to familiarize myself with gdb and had a few questions based upon its format and what it shows:
─── Assembly ────
0x00000000004004ed main+0 push %rbp
0x00000000004004ee main+1 mov %rsp,%rbp
!0x00000000004004f1 main+4 movl $0x539,-0x4(%rbp)
What does the memory address on the left column signify, and why is each instruction a variable-width "between" the next address?
What does the second column mean?
.
─── Registers ───────────────────────
rax 0x00000000004004ed
rbx 0x0000000000000000
rcx 0x0000000000000000
Is the value next to the register its memory location, or the value contained in the registry?
.
─── Stack ───────────────────
[0] from 0x000000000040058c in main+47 at main.c:7
What is this line telling us: does the stack start at memory address 0x000000000040058c, and what does the main+47 refer to?
What does the memory address on the left column signify, and why is
each instruction a variable-width "between" the next address?
x86 machine-code instructions are variable length. So some instructions take a single byte while for example movabs $0x12345678abcdef, %rax takes 10. The hard limit is 15 bytes, but only intentional padding with redundant prefixes can get all the way to 15.
Many other architectures are RISC and have fixed-width instructions.
What does the second column mean?
It tells you the relative address from the symbol main. Note that the actual location in memory is not assigned at compile time.
(Editor's note: this is not a PIE executable so the absolute address actually is set at link time. We can tell because the address is 0x00400... in the low 32 bits of address space, not 0x55555555....)
Is the value next to the register its memory location, or the value
contained in the registry?
Registers are not stored in memory (except in rare architectures); registers don't have addresses and are a separate space from memory. It's also not showing the value pointed to by a register that happens to be holding a valid address.
The value shown is the value in the resister itself. Note that rbx and rcx are both showing 0x0.
What is this line telling us: does the stack start at memory address
0x000000000040058c, and what does the main+47 refer to?
(editor's note: this part is wrong but I'm not sure enough exactly what it is to replace it with something else. But 0x40058c is definitely not a plausible value for RSP. main+47 is a code address somewhere inside main, like always for GDB symbol+number).
This is the location of the stack. Your code is small, so main is only taking space less than 48 addresses. Note that memory is normally allocated in blocks, so the stack would not appear at main+7, or whatever immediately follows the movl instruction.
#daShier's answer is mostly right, but is completely wrong about this part:
What is this line telling us: does the stack start at memory address
0x000000000040058c, and what does the main+47 refer to?
I think this is a qword value on the stack (pointed to by RSP). It's probably main's return address, or maybe just a value that was in RBP when main pushed it.
(But a return address is plausible: main starting at 0x4004ed is not far from 0x40058c).
main + 47 = 0x40051c is a code address inside main, corresponding to C source on line 7 of main.c. (main.c:7). This symbol+number is GDB's way of printing addresses in a human-readable way, relative to the closest symbol above them. i.e. what function they're in. ; I think that's the breakpoint you're stopped at when you copy/pasted this. It's telling you where execution is now. Or was when this snapshot of data on the stack was taken.
I'm not sure how you got GDB to print that Stack dump. It's a slightly different format from info stack or backtrace. TUI mode layout reg or any other layout doesn't include a Stack pane.
But anyway, 0x000000000040058c is most certainly not a stack address; it's in the same 4kiB virtual memory page as main so it's in the .text section. (In fact it's only 0x70 bytes past main + 47). That virtual page will be executable and not writeable.
RSP (stack pointer) values are things like 0x7ffff7fd4100, near the top of the lower 48 bits of virtual address space. (The top of the user-space part of the usable (canonical) part of virtual address space on x86-64).
As I said, main+47 is just a code address inside main. It has nothing to do with 47 or 48 bytes of stack space.
My understanding is that, to accomodate the x86 extension Streaming SIM Extensions (SSE) gcc aligns the stack on 16 byte boundaries. Consider ubuntu 16.04 (32-bit x86) with gcc version 4.8.4. What sections of the stack get aligned to a 16-byte boundary?
My question as a whole is about data-alignment on the stack and how the stack is aligned by gcc. There are 3 positions I am curious about, EBP, ESP, and the region between the previous ESP and new EBP.
From textbooks I know that EBP points to an address where the previous frame pointer is stored. Is EBP considered the bottom of the stack? Is there padding inserted before or after pushing the previous frame pointer to the stack?
I also know that ESP points to the top of the stack. Must ESP also be 16 byte aligned? Is there padding placed at the top of the stack to align ESP to a 16 byte boundary?
With most confusion, what about the regions of data between the new EBP and previous ESP. This is the place where function arguments and the return address in the caller are pushed. Are these considered to be part of the previous stack frame or current stack frame? How must this section of data be aligned?
This is a grey area for me. I have analyzed the stack using gdb but it is not clear to me how the stack is aligned and why it is done that way. I hope that someone on SO can shed some light on the design of the call stack by gcc.
In textbooks, the common description of the stack, is implementation agnostic and typically leaves out a discussion of alignment, padding, and the hard boundaries of the call stack. I have an image to include but am too new to the site to be allowed to post images.
Linux x86_64.
gcc 5.x
I was studying the output of two codes, with -fomit-frame-pointer and without (gcc at "-O3" enables that option by default).
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
My question is :
If I globally disable that option, even for, at the extreme, compiling an operating system, is there a catch ?
I know that interrupts use that information, so is that option good only for user space ?
The compilers always generate self consistent code, so disabling the frame pointer is fine as long as you don't use external/hand crafted code that makes some assumption about it (e.g. by relying on the value of rbp for example).
The interrupts don't use the frame pointer information, they may use the current stack pointer for saving a minimal context but this is dependent on the type of interrupt and OS (an hardware interrupt uses a Ring 0 stack probably).
You can look at Intel manuals for more information on this.
About the usefulness of the frame pointer:
Years ago, after compiling a couple of simple routines and looking at the generated 64 bit assembly code I had your same question.
If you don't mind reading a whole lot of notes I have written for myself back then, here they are.
Note: Asking about the usefulness of something is a little bit relative. Writing assembly code for the current main 64 bit ABIs I found my self using the stack frame less and less. However this is just my coding style and opinion.
I like using the frame pointer, writing the prologue and epilogue of a function, but I like direct uncomfortable answers too, so here's how I see it:
Yes, the frame pointer is almost useless in x86_64.
Beware it is not completely useless, especially for humans, but a compiler doesn't need it anymore.
To better understand why we have a frame pointer in the first place it is better to recall some history.
Back in the real mode (16 bit) days
When Intel CPUs supported only "16 bit mode" there were some limitation on how to access the stack, particularly this instruction was (and still is) illegal
mov ax, WORD [sp+10h]
because sp cannot be used as a base register. Only a few designated registers could be used for such purpose, for example bx or the more famous bp.
Nowadays it's not a detail everybody put their eyes on but bp has the advantage over other base register that by default it implicitly implicates the use of ss as a segment/selector register, just like implicit usages of sp (by push, pop, etc), and like esp does on later 32-bit processors.
Even if your program was scattered all across memory with each segment register pointing to a different area, bp and sp acted the same, after all that was the intent of the designers.
So a stack frame was usually necessary and consequently a frame pointer.
bp effectively partitioned the stack in four parts: the arguments area, the return address, the old bp (just a WORD) and the local variables area. Each area being identified by the offset used to access it: positive for the arguments and return address, zero for the old bp, negative for the local variables.
Extended effective addresses
As the Intel CPUs were evolving, the more extensive 32-bit addressing modes were added.
Specifically the possibility to use any 32-bit general-purpose register as a base register, this includes the use of esp.
Being instructions like this
mov eax, DWORD [esp+10h]
now valid, the use of the stack frame and the frame pointer seems doomed to an end.
Likely this was not the case, at least in the beginnings.
It is true that now it is possible to use entirely esp but the separation of the stack in the mentioned four areas is still useful, especially for humans.
Without the frame pointer a push or a pop would change an argument or local variable offset relative to esp, giving form to code that look non intuitive at first sight. Consider how to implement the following C routine with cdecl calling convention:
void my_routine(int a, int b)
{
return my_add(a, b);
}
without and with a framestack
my_routine:
push DWORD [esp+08h]
push DWORD [esp+08h]
call my_add
ret
my_routine:
push ebp
mov ebp, esp
push DWORD [ebp+0Ch]
push DWORD [ebp+08h]
call my_add
pop ebp
ret
At first sight it seems that the first version pushes the same value twice. It actually pushes the two separate arguments however, as the first push lowers esp so the same effective address calculation points the second push to a different argument.
If you add local variables (especially lots of them) then the situation quickly becomes hard to read: Does mov eax, [esp+0CAh] refer to a local variable or to an argument? With a stack frame we have fixed offsets for the arguments and local variables.
Even the compilers at first still preferred the fixed offsets given by the use of the frame base pointer. I see this behavior changing first with gcc.
In a debug build the stack frame effectively adds clarity to the code and makes it easy for the (proficient) programmer to follow what is going on and, as pointed out in the comment, lets them recover the stack frame more easily.
The modern compilers however are good at math and can easily keep count of the stack pointer movements and generate the appropriate offsets from esp, omitting the stack frame for faster execution.
When a CISC requires data alignment
Until the introduction of SSE instructions the Intel processors never asked much from the programmers compared to their RISC brothers.
In particular they never asked for data alignment, we could access 32 bit data on an address not a multiple of 4 with no major complaint (depending on the DRAM data width, this may result on increased latency).
SSE used 16 bytes operands that needed to be accessed on 16 byte boundary, as the SIMD paradigm becomes implemented efficiently in the hardware and becomes more popular the alignment on 16 byte boundary becomes important.
The main 64 bit ABIs now require it, the stack must be aligned on paragraphs (ie, 16 bytes).
Now, we are usually called such that after the prologue the stack is aligned, but suppose we are not blessed with that guarantee, we would need to do one of this
push rbp push rbp
mov rbp, rsp mov rbp, rsp
and spl, 0f0h sub rsp, xxx
sub rsp, 10h*k and spl, 0f0h
One way or another the stack is aligned after these prologues, however we can no longer use a negative offset from rbp to access local vars that need alignment, because the frame pointer itself is not aligned.
We need to use rsp, we could arrange a prologue that has rbp pointing at the top of an aligned area of local vars but then the arguments would be at unknown offsets.
We can arrange a complex stack frame (maybe with more than one pointer) but the key of the old fashioned frame base pointer was its simplicity.
So we can use the frame pointer to access the arguments on the stack and the stack pointer for the local variables, fair enough.
Alas the role of stack for arguments passing has been reduced and for a small number of arguments (currently four) it is not even used and in the future it will probably be used even less.
So we don't use the frame pointer for local variables (mostly), nor for the arguments (mostly), for what do we use it?
It saves a copy of the original rsp, so to restore the stack pointer at function exit, a mov is enough. If the stack is aligned with an and, which is not invertible, an original copy is necessary.
Actually some ABIs guarantee that after the standard prologue the stack is aligned thereby allowing us to use the frame pointer as usual.
Some variables don't need alignment and can be accessed with an unaligned frame pointer, this is usually true for hand crafted code.
Some functions require more than four parameters.
Summary
The frame pointer is a vestigial paradigm from 16 bit programs that has proven itself still useful on 32 bit machines because of its simplicity and clarity when accessing local variables and arguments.
On 64 bit machines however the strict requirements vanish most of the simplicity and clarity, the frame pointer remains used in debug mode however.
On the fact that the frame pointer can be used to make fun things: it is true I guess, I've never seen such code but I can image how it would work.
I, however, focused on the housekeeping role of the frame pointer as this is the way I always have seen it.
All the crazy things can be done with any pointer set to the same value of the frame pointer, I give the latter a more "special" role.
VS2013 for example sometimes uses rdi as a "frame pointer", but I don't consider it a real frame pointer if it doesn't use rbp/ebp/bp.
To me the use of rdi means a Frame Pointer Omission optimization :)
I would like to divide a stack to stack-frames by looking on the raw data on the stack. I thought to do so by finding a "linked list" of saved EBP pointers.
Can I assume that a (standard and commonly used) C compiler (e.g. gcc) will always update and save EBP on a function call in the function prologue?
pushl %ebp
movl %esp, %ebp
Or are there cases where some compilers might skip that part for functions that don't get any parameters and don't have local variables?
The x86 calling conventions and the Wiki article on function prologue don't help much with that.
Is there any better method to divide a stack to stack frames just by looking on its raw data?
Thanks!
Some versions of gcc have a -fomit-frame-pointer optimization option. If memory serves, it can be used even with parameters/local variables (they index directly off of ESP instead of using EBP). Unless I'm badly mistaken, MS VC++ can do roughly the same.
Offhand, I'm not sure of a way that's anywhere close to universally applicable. If you have code with debug info, it's usually pretty easy -- otherwise though...
Even with the framepointer optimized out, stackframes are often distinguishable by looking through stack memory for saved return addresses instead. Remember that a function call sequence in x86 always consists of:
call someFunc ; pushes return address (instr. following `call`)
...
someFunc:
push EBP ; if framepointer is used
mov EBP, ESP ; if framepointer is used
push <nonvolatile regs>
...
so your stack will always - even if the framepointers are missing - have return addresses in there.
How do you recognize a return address ?
to start with, on x86, instruction have different lengths. That means return addresses - unlike other pointers (!) - tend to be misaligned values. Statistically 3/4 of them end not at a multiple of four.
Any misaligned pointer is a good candidate for a return address.
then, remember that call instructions on x86 have specific opcode formats; read a few bytes before the return address and check if you find a call opcode there (99% most of the time, it's five bytes back for a direct call, and three bytes back for a call through a register). If so, you've found a return address.
This is also a way to distinguish C++ vtables from return addresses by the way - vtable entrypoints you'll find on the stack, but looking "back" from those addresses you don't find call instructions.
With that method, you can get candidates for the call sequence out of the stack even without having symbols, framesize debugging information or anything.
The details of how to piece the actual call sequence together from those candidates are less straightforward though, you need a disassembler and some heuristics to trace potential call flows from the lowest-found return address all the way up to the last known program location. Maybe one day I'll blog about it ;-) though at this point I'd rather say that the margin of a stackoverflow posting is too small to contain this ...