I am trying to familiarize myself with gdb and had a few questions based upon its format and what it shows:
─── Assembly ────
0x00000000004004ed main+0 push %rbp
0x00000000004004ee main+1 mov %rsp,%rbp
!0x00000000004004f1 main+4 movl $0x539,-0x4(%rbp)
What does the memory address on the left column signify, and why is each instruction a variable-width "between" the next address?
What does the second column mean?
.
─── Registers ───────────────────────
rax 0x00000000004004ed
rbx 0x0000000000000000
rcx 0x0000000000000000
Is the value next to the register its memory location, or the value contained in the registry?
.
─── Stack ───────────────────
[0] from 0x000000000040058c in main+47 at main.c:7
What is this line telling us: does the stack start at memory address 0x000000000040058c, and what does the main+47 refer to?
What does the memory address on the left column signify, and why is
each instruction a variable-width "between" the next address?
x86 machine-code instructions are variable length. So some instructions take a single byte while for example movabs $0x12345678abcdef, %rax takes 10. The hard limit is 15 bytes, but only intentional padding with redundant prefixes can get all the way to 15.
Many other architectures are RISC and have fixed-width instructions.
What does the second column mean?
It tells you the relative address from the symbol main. Note that the actual location in memory is not assigned at compile time.
(Editor's note: this is not a PIE executable so the absolute address actually is set at link time. We can tell because the address is 0x00400... in the low 32 bits of address space, not 0x55555555....)
Is the value next to the register its memory location, or the value
contained in the registry?
Registers are not stored in memory (except in rare architectures); registers don't have addresses and are a separate space from memory. It's also not showing the value pointed to by a register that happens to be holding a valid address.
The value shown is the value in the resister itself. Note that rbx and rcx are both showing 0x0.
What is this line telling us: does the stack start at memory address
0x000000000040058c, and what does the main+47 refer to?
(editor's note: this part is wrong but I'm not sure enough exactly what it is to replace it with something else. But 0x40058c is definitely not a plausible value for RSP. main+47 is a code address somewhere inside main, like always for GDB symbol+number).
This is the location of the stack. Your code is small, so main is only taking space less than 48 addresses. Note that memory is normally allocated in blocks, so the stack would not appear at main+7, or whatever immediately follows the movl instruction.
#daShier's answer is mostly right, but is completely wrong about this part:
What is this line telling us: does the stack start at memory address
0x000000000040058c, and what does the main+47 refer to?
I think this is a qword value on the stack (pointed to by RSP). It's probably main's return address, or maybe just a value that was in RBP when main pushed it.
(But a return address is plausible: main starting at 0x4004ed is not far from 0x40058c).
main + 47 = 0x40051c is a code address inside main, corresponding to C source on line 7 of main.c. (main.c:7). This symbol+number is GDB's way of printing addresses in a human-readable way, relative to the closest symbol above them. i.e. what function they're in. ; I think that's the breakpoint you're stopped at when you copy/pasted this. It's telling you where execution is now. Or was when this snapshot of data on the stack was taken.
I'm not sure how you got GDB to print that Stack dump. It's a slightly different format from info stack or backtrace. TUI mode layout reg or any other layout doesn't include a Stack pane.
But anyway, 0x000000000040058c is most certainly not a stack address; it's in the same 4kiB virtual memory page as main so it's in the .text section. (In fact it's only 0x70 bytes past main + 47). That virtual page will be executable and not writeable.
RSP (stack pointer) values are things like 0x7ffff7fd4100, near the top of the lower 48 bits of virtual address space. (The top of the user-space part of the usable (canonical) part of virtual address space on x86-64).
As I said, main+47 is just a code address inside main. It has nothing to do with 47 or 48 bytes of stack space.
Related
I'm a student learning computer security. Recently, I learned stack buffer overflow on c.
I understood its concepts and run sample codes written by c.
void main(){
char buf[] = "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0";
int* p;
p = (int*)&p + 2;
*p = (int)buf;
return;
}
Runtime Environment
Architecture: i686
OS: ubuntu 16.04 32bit
Compiler: gcc
Turn off ASLR(sysctl -w kernel.randomize_va_space=0)
Options: gcc -z execstack -mpreferred-stack-boundary=2 -fno-stack-protector
But I confuse what stack is saved and which memories are overlapped.
Above binary code, "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0",
the same assembly code is
.global main
main:
jmp strings
start:
xor %eax, %eax
movb $0xb, %al
xor %edx, %edx
xor %ecx, %ecx
popl %ebx
int $0x80
strings:
call start
.string "/bin/sh"
means execve("/bin/sh", NULL, NULL);.
When buffer overflow occurs, the binaries are overlapped return addresses of main on stack. But, I'm understood to be that the stack stores data s.t local variables, previous frame pointers, and return address.
I think the above binaries are not data, actually instructions. If so, why is this valid? The stack stores instructions and executes one-by-one by popping them? Or I misunderstand something?
And if the stack stores instructions, how do previous stack frame pointers(fp) and return addresses(ra) work?
I learned that previous function's stack frame address is stored in fp and next instruction's address on code area is stored in ra. So, when called function is terminated, sp is popped and then ra does to restore previous function state and run next instruction. Is it correct? Or I misunderstand something?
I want to know really this..
Thank you for your help.
Data are instructions are instructions are instructions.
The stack is memory is memory is memory.
That's just that.
Since the stack is ordinary memory, just like what you get with malloc, only growing downward and used implicitly by some instructions, you can put any data on the stack.
Since instructions are data, it follows that you can put instructions on the stack.
This particular exploitation works by overwriting the return address with a specific value and everything above it with a sequence of instructions.
That's why you need to tell GCC to make the stack executable (the code is on the stack) and not to generate a canary (both of these protections will suffice to prevent the attack) and also you need to tell Linux not to randomize the process address space layout (or the specific, fixed, value used to overwrite the return address won't work).
The fp and ra thing is most likely for a RISC architecture, x86 doesn't have such registers.
The execution flow is redirected when main returns (with ret), that's what ret does.
Look in Intel's manuals how the call/ret pair works and then see it in practice by just stepping into a call with a debugger.
Make sure you understand the calling convention and keep an eye on the stack every time you step.
Linux x86_64.
gcc 5.x
I was studying the output of two codes, with -fomit-frame-pointer and without (gcc at "-O3" enables that option by default).
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
My question is :
If I globally disable that option, even for, at the extreme, compiling an operating system, is there a catch ?
I know that interrupts use that information, so is that option good only for user space ?
The compilers always generate self consistent code, so disabling the frame pointer is fine as long as you don't use external/hand crafted code that makes some assumption about it (e.g. by relying on the value of rbp for example).
The interrupts don't use the frame pointer information, they may use the current stack pointer for saving a minimal context but this is dependent on the type of interrupt and OS (an hardware interrupt uses a Ring 0 stack probably).
You can look at Intel manuals for more information on this.
About the usefulness of the frame pointer:
Years ago, after compiling a couple of simple routines and looking at the generated 64 bit assembly code I had your same question.
If you don't mind reading a whole lot of notes I have written for myself back then, here they are.
Note: Asking about the usefulness of something is a little bit relative. Writing assembly code for the current main 64 bit ABIs I found my self using the stack frame less and less. However this is just my coding style and opinion.
I like using the frame pointer, writing the prologue and epilogue of a function, but I like direct uncomfortable answers too, so here's how I see it:
Yes, the frame pointer is almost useless in x86_64.
Beware it is not completely useless, especially for humans, but a compiler doesn't need it anymore.
To better understand why we have a frame pointer in the first place it is better to recall some history.
Back in the real mode (16 bit) days
When Intel CPUs supported only "16 bit mode" there were some limitation on how to access the stack, particularly this instruction was (and still is) illegal
mov ax, WORD [sp+10h]
because sp cannot be used as a base register. Only a few designated registers could be used for such purpose, for example bx or the more famous bp.
Nowadays it's not a detail everybody put their eyes on but bp has the advantage over other base register that by default it implicitly implicates the use of ss as a segment/selector register, just like implicit usages of sp (by push, pop, etc), and like esp does on later 32-bit processors.
Even if your program was scattered all across memory with each segment register pointing to a different area, bp and sp acted the same, after all that was the intent of the designers.
So a stack frame was usually necessary and consequently a frame pointer.
bp effectively partitioned the stack in four parts: the arguments area, the return address, the old bp (just a WORD) and the local variables area. Each area being identified by the offset used to access it: positive for the arguments and return address, zero for the old bp, negative for the local variables.
Extended effective addresses
As the Intel CPUs were evolving, the more extensive 32-bit addressing modes were added.
Specifically the possibility to use any 32-bit general-purpose register as a base register, this includes the use of esp.
Being instructions like this
mov eax, DWORD [esp+10h]
now valid, the use of the stack frame and the frame pointer seems doomed to an end.
Likely this was not the case, at least in the beginnings.
It is true that now it is possible to use entirely esp but the separation of the stack in the mentioned four areas is still useful, especially for humans.
Without the frame pointer a push or a pop would change an argument or local variable offset relative to esp, giving form to code that look non intuitive at first sight. Consider how to implement the following C routine with cdecl calling convention:
void my_routine(int a, int b)
{
return my_add(a, b);
}
without and with a framestack
my_routine:
push DWORD [esp+08h]
push DWORD [esp+08h]
call my_add
ret
my_routine:
push ebp
mov ebp, esp
push DWORD [ebp+0Ch]
push DWORD [ebp+08h]
call my_add
pop ebp
ret
At first sight it seems that the first version pushes the same value twice. It actually pushes the two separate arguments however, as the first push lowers esp so the same effective address calculation points the second push to a different argument.
If you add local variables (especially lots of them) then the situation quickly becomes hard to read: Does mov eax, [esp+0CAh] refer to a local variable or to an argument? With a stack frame we have fixed offsets for the arguments and local variables.
Even the compilers at first still preferred the fixed offsets given by the use of the frame base pointer. I see this behavior changing first with gcc.
In a debug build the stack frame effectively adds clarity to the code and makes it easy for the (proficient) programmer to follow what is going on and, as pointed out in the comment, lets them recover the stack frame more easily.
The modern compilers however are good at math and can easily keep count of the stack pointer movements and generate the appropriate offsets from esp, omitting the stack frame for faster execution.
When a CISC requires data alignment
Until the introduction of SSE instructions the Intel processors never asked much from the programmers compared to their RISC brothers.
In particular they never asked for data alignment, we could access 32 bit data on an address not a multiple of 4 with no major complaint (depending on the DRAM data width, this may result on increased latency).
SSE used 16 bytes operands that needed to be accessed on 16 byte boundary, as the SIMD paradigm becomes implemented efficiently in the hardware and becomes more popular the alignment on 16 byte boundary becomes important.
The main 64 bit ABIs now require it, the stack must be aligned on paragraphs (ie, 16 bytes).
Now, we are usually called such that after the prologue the stack is aligned, but suppose we are not blessed with that guarantee, we would need to do one of this
push rbp push rbp
mov rbp, rsp mov rbp, rsp
and spl, 0f0h sub rsp, xxx
sub rsp, 10h*k and spl, 0f0h
One way or another the stack is aligned after these prologues, however we can no longer use a negative offset from rbp to access local vars that need alignment, because the frame pointer itself is not aligned.
We need to use rsp, we could arrange a prologue that has rbp pointing at the top of an aligned area of local vars but then the arguments would be at unknown offsets.
We can arrange a complex stack frame (maybe with more than one pointer) but the key of the old fashioned frame base pointer was its simplicity.
So we can use the frame pointer to access the arguments on the stack and the stack pointer for the local variables, fair enough.
Alas the role of stack for arguments passing has been reduced and for a small number of arguments (currently four) it is not even used and in the future it will probably be used even less.
So we don't use the frame pointer for local variables (mostly), nor for the arguments (mostly), for what do we use it?
It saves a copy of the original rsp, so to restore the stack pointer at function exit, a mov is enough. If the stack is aligned with an and, which is not invertible, an original copy is necessary.
Actually some ABIs guarantee that after the standard prologue the stack is aligned thereby allowing us to use the frame pointer as usual.
Some variables don't need alignment and can be accessed with an unaligned frame pointer, this is usually true for hand crafted code.
Some functions require more than four parameters.
Summary
The frame pointer is a vestigial paradigm from 16 bit programs that has proven itself still useful on 32 bit machines because of its simplicity and clarity when accessing local variables and arguments.
On 64 bit machines however the strict requirements vanish most of the simplicity and clarity, the frame pointer remains used in debug mode however.
On the fact that the frame pointer can be used to make fun things: it is true I guess, I've never seen such code but I can image how it would work.
I, however, focused on the housekeeping role of the frame pointer as this is the way I always have seen it.
All the crazy things can be done with any pointer set to the same value of the frame pointer, I give the latter a more "special" role.
VS2013 for example sometimes uses rdi as a "frame pointer", but I don't consider it a real frame pointer if it doesn't use rbp/ebp/bp.
To me the use of rdi means a Frame Pointer Omission optimization :)
Whenever I use gdb on my programs (32bit, red hat linux), I see that the addresses used in the registers are extremely far away from the addresses that has been linked to the machine instructions.
Using a simple "hello world program"
0x080483a4 <main+0>: lea 0x4(%esp),%ecx
0x080483a8 <main+4>: and $0xfffffff0,%esp
0x080483ab <main+7>: pushl -0x4(%ecx)
....
where the info registers command produces
esp 0xffffb970 0xffffb970
ebp 0xffffb978 0xffffb978
esi 0x9d5ca0 10312864
edi 0x0 0
eip 0x80483b5 0x80483b5 <main+17>
Those esp and ebp shows a frame very different location from where the code is located.
What I think is that...
in the 4gb ram stick, there are 2^32 memory locations, the stack is located more in the top part (where it's 0xFFFFFFFF) and grows down and things like these hello program, the OS itself, other currently running programs are at the bottom and grow up (so it starts at 0x00000000), right beneath the heap which is also on the opposite side of where the stack is.
I'm probably wrong, but I was hoping to get an answer from you guys.
Also, is there more info on what's going on with the rest of the memory? Is there a course/book that covers broad questions similar to what I just asked? I feel like I have to fill in alot of gaps with what's going on in my intro assembly language class.
Thank you.
Unless you have a time machine and go back to the 1960's all addresses that you see are virtual addresses. They could be right near each other in physical memory but still one has a high address and the other low. If your OS is allocating 4K pages, only the low 12 address bits are not mapped by hardware.
I've seen an annotation for pushing/popping multiple registers in the same line, e.g:
push {fp, lr}
I couldn't find out who is pushed first - fp or lr?
An additional question - does SP point to the last occupied address in the stack or the first free one?
From the ARM ARM:
The registers are stored in sequence, the lowest-numbered register to the lowest memory address (start_address), through to the highest-numbered register to the highest memory address (end_address)
On ARM, the stack pointer normally points to the last occupied address on the stack. For example, when setting up the initial stack pointer, you normally initialize with with the address one past the end of the stack.
PUSH is just a synonym for STMDB using sp as the base register. The DB indicates the 'decrement-before' addressing mode.
Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(),
(Note that this diagram has high addresses at the bottom and low addresses at the top.)
Could anyone explain me what ebp and esp are in this context?
From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beginning of the the current function? Or what?
edit: I mean this in the context of windows programs
edit2: And how does eip work, too?
edit3: I have the following code from MSVC++:
var_C= dword ptr -0Ch
var_8= dword ptr -8
var_4= dword ptr -4
hInstance= dword ptr 8
hPrevInstance= dword ptr 0Ch
lpCmdLine= dword ptr 10h
nShowCmd= dword ptr 14h
All of them seem to be dwords, thus taking 4 bytes each. So I can see there is a gap from hInstance to var_4 of 4 bytes. What are they? I assume it is the return address, as can be seen in wikipedia's picture?
(editor's note: removed a long quote from Michael's answer, which doesn't belong in the question, but a followup question was edited in):
This is because the flow of the function call is:
* Push parameters (hInstance, etc.)
* Call function, which pushes return address
* Push ebp
* Allocate space for locals
My question (last, i hope!) now is, what is exactly what happens from the instant I pop the arguments of the function i want to call up to the end of the prolog? I want to know how the ebp, esp evolve during those moments(I already understood how the prolog works, I just want to know what is happening after i pushed the arguments on the stack and before the prolog).
esp is as you say it is, the top of the stack.
ebp is usually set to esp at the start of the function. Function parameters and local variables are accessed by adding and subtracting, respectively, a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frame's local variables to work.
Most function prologs look something like:
push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)
mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.
EDIT:
For your updated question, the missing two entries in the stack are:
nShowCmd = dword ptr +14h
hlpCmdLine = dword ptr +10h
PrevInstance = dword ptr +0Ch
hInstance = dword ptr +08h
return address = dword ptr +04h <==
savedFramePointer = dword ptr +00h <==
var_4 = dword ptr -04h
var_8 = dword ptr -08h
var_C = dword ptr -0Ch
This is because the flow of the function call is:
Push parameters (hInstance, PrevInstance, hlpCmdLine, nShowCmd)
Call function, which pushes return address
Push ebp
Allocate space for locals
ESP (Stack Pointer) is the current stack pointer, which will change any time a word or address is pushed or popped on/off the stack. EBP (Base Pointer) is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.
Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack by the calling function (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP, Instruction Pointer) onto the stack.
Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the Frame Pointer, because it points to the center of the function call frame.
Upon exit, all the function has to do is set ESP to the value of EBP (which deallocates the local variables from the stack, and exposes the entry EBP on the top of the stack), then pop the old EBP value from the stack, and then the function returns (popping the return address into EIP).
Upon returning back to the calling function, it can then increment ESP in order to remove the function arguments it pushed onto the stack just prior to calling the other function. At this point, the stack is back in the same state it was in prior to invoking the called function.
You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.
When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).
Doing memory allocation this way is very, very fast and efficient.
EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.
Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.
Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.
This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.
To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.
First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:
// push eax
--*esp = eax
// pop eax
eax = *esp++;
// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack
// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.
Long time since I've done Assembly programming, but this link might be useful...
The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.
The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.
Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.
Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.