stack operations on a basic C program

stack operations on a basic C program - disassembly

Im disassembling this basic C code, trying to figure out what operations
are done on the stack. Im doing in it on a vm, 32 bit, gcc 4.4.3, ubuntu based
distro. I compiled the code with this flags.
gcc -ggdb -mpreferred-stack-boundary=2 -fno-stack-protector -o ExploitMe ExploitMe.c
#include<stdio.h>
#include<string.h>
main(int argc, char **argv)
{
char buffer[80];
strcpy(buffer, argv[1]);
return 1;
}
The problems is that i cannot figure out why on operation 3, the stack
pointer is moved 0x58, the char is 80 characters long, shouldnt it be 0x50 ?
dump of assembler code for function main:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
=> 0x080483e7 <+3>: sub $0x58,%esp
0x080483ea <+6>: mov 0xc(%ebp),%eax
0x080483ed <+9>: add $0x4,%eax
0x080483f0 <+12>:mov (%eax),%eax
0x080483f2 <+14>:mov %eax,0x4(%esp)
0x080483f6 <+18>:lea -0x50(%ebp),%eax
0x080483f9 <+21>:mov %eax,(%esp)
0x080483fc <+24>:call 0x804831c <strcpy#plt>
0x08048401 <+29>:mov $0x1,%eax
0x08048406 <+34>:leave
0x08048407 <+35>:ret
End of assembler dump.
Im stuck on it, i see later that is taking the exected lenght but what
is the program making between those ops ?¿
0x080483f6 <+18>:lea -0x50(%ebp),%eax
Thank you

The compiler is free to arrange the stack however it sees fit.

The other 8 bytes are for the arguments to strcpy. Rather than push them on to the stack, the compiler has realised that it can simply subtract an extra 8 bytes from the stack pointer and then store the registers to memory. This means that the stack pointer only has to be adjusted once.

it is probably allocating a couple more locations for storing the passed in parameters (argv, argc). and/or it needs some more local storage. Compilers do whatever they want to implement the high level code, the same code will produce dozens/hundreds of different assembly langauge sequences depending on the compiler, version, and optimization settings as well as configure/build settings when the compiler itself was compiled.
You often see this sort of a stack frame though and usually due to a combination of performance and instruction set features/limitations. Much easier to code and debug if you move the stack pointer once or make a copy of it with another register, within the function everything is referenced to one static point while the prepparing, calling, and cleaning up of functions messes with the real stack pointer.
You will often also see that the stack frame leaves room for the passed in parameters and other local variables even if optimization has removed the need for those variables to actually spend any time on the stack. Up front the need for a stack frame and size is determined and optimization comes later and the compiler doesnt always go back and realize that if it makes another pass on the function it can make the stack frame smaller. Likewise the compiler writer can more easily debug if they know that their stack frame always starts with passed in parameters then the local variables in order, very fast and easy to read and debug the code, just an example.
Bottom line though is Oli's answer, the compiler can do whatever it wants so long as it implements your code. My extension to that is the output from the same high level code varies widely depending on the compiler and options. And it is rarely perfectly optimized.

Related

what's happened when process is attacked by "stack buffer overflow"?

I'm a student learning computer security. Recently, I learned stack buffer overflow on c.
I understood its concepts and run sample codes written by c.
void main(){
char buf[] = "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0";
int* p;
p = (int*)&p + 2;
*p = (int)buf;
return;
}
Runtime Environment
Architecture: i686
OS: ubuntu 16.04 32bit
Compiler: gcc
Turn off ASLR(sysctl -w kernel.randomize_va_space=0)
Options: gcc -z execstack -mpreferred-stack-boundary=2 -fno-stack-protector
But I confuse what stack is saved and which memories are overlapped.
Above binary code, "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0",
the same assembly code is
.global main
main:
jmp strings
start:
xor %eax, %eax
movb $0xb, %al
xor %edx, %edx
xor %ecx, %ecx
popl %ebx
int $0x80
strings:
call start
.string "/bin/sh"
means execve("/bin/sh", NULL, NULL);.
When buffer overflow occurs, the binaries are overlapped return addresses of main on stack. But, I'm understood to be that the stack stores data s.t local variables, previous frame pointers, and return address.
I think the above binaries are not data, actually instructions. If so, why is this valid? The stack stores instructions and executes one-by-one by popping them? Or I misunderstand something?
And if the stack stores instructions, how do previous stack frame pointers(fp) and return addresses(ra) work?
I learned that previous function's stack frame address is stored in fp and next instruction's address on code area is stored in ra. So, when called function is terminated, sp is popped and then ra does to restore previous function state and run next instruction. Is it correct? Or I misunderstand something?
I want to know really this..
Thank you for your help.

Data are instructions are instructions are instructions.
The stack is memory is memory is memory.
That's just that.
Since the stack is ordinary memory, just like what you get with malloc, only growing downward and used implicitly by some instructions, you can put any data on the stack.
Since instructions are data, it follows that you can put instructions on the stack.
This particular exploitation works by overwriting the return address with a specific value and everything above it with a sequence of instructions.
That's why you need to tell GCC to make the stack executable (the code is on the stack) and not to generate a canary (both of these protections will suffice to prevent the attack) and also you need to tell Linux not to randomize the process address space layout (or the specific, fixed, value used to overwrite the return address won't work).
The fp and ra thing is most likely for a RISC architecture, x86 doesn't have such registers.
The execution flow is redirected when main returns (with ret), that's what ret does.
Look in Intel's manuals how the call/ret pair works and then see it in practice by just stepping into a call with a debugger.
Make sure you understand the calling convention and keep an eye on the stack every time you step.

Why does main initialize stack frame when there are no variables

why does this code:
#include "stdio.h"
int main(void) {
puts("Hello, World!");
}
decide to initialize a stack frame? Here is the assembly code:
.LC0:
.string "Hello, World!"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov eax, 0
pop rbp
ret
Why does the compiler initialize a stack frame only for it to be destroyed later, withoput it ever being used? This surely wont cause any errors on the outside of the main function because I never use the stack, so I wont cause any errors. Why is it compiled this way?

Having these steps in every compiled function is the "baseline" for the compiler, unoptimized. It looks clean in disassembly, and makes sense. However, the compiler can optimize the output to reduce overhead from code that has no real effect. You can see this by compiling with different optimization levels.
What you got is like this:
.LC0:
.string "Hello, World!"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov eax, 0
pop rbp
ret
That's compiled in GCC with no optimization.
Adding the flag -O4 gives this output:
.LC0:
.string "Hello, World!"
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call puts
xor eax, eax
add rsp, 8
ret
You'll notice that this still moves the stack pointer, but it skips changing the base pointer, and avoid the time-consuming memory access associated with that.
The stack is assumed to be aligned on a 16-byte boundary. With the return address having been pushed, this leaves another 8 bytes to be subtracted to get to the boundary before the function call.

It's very common for compilers to generate unoptimized code in the least complicated way possible (or at least the least complicated way that doesn't lead to code that's so bad that the optimizer won't be able to fix it) to keep the code simple and to stick to the one-responsibility principle (in the sense that making code more efficient is the optimizer's job).
Generating code to initialize the stack for all functions is less complicated than only doing so where necessary. Since the optimizer will be able to remove the unnecessary code anyway (and it will do so in more cases than a simple "does this function have any local variables?" check would), generating the unnecessary code won't have any effect as long as optimizations are enabled (and if they're not, it's expected that the generated code will contain inefficiencies).
If we did add a "does this function have any local variables?" check to the function that generates the stack-initialization code, we'd be re-inventing a less powerful version of an optimization that the optimizer already performs anyway, so we'd be violating the one-responsibility principle and increasing the complexity of the part of the compiler that could otherwise be relatively simple (as opposed to the optimizer, which is full of complicated algorithms anyway).

The stack frame makes it possible to inspect the call stack during runtime. This is useful:
when debugging
in code that relies on __builtin_frame_address(level) with level > 0
As already pointed out by others, a compiler may omit the stackframe on higher optimization levels.
See also:
How do you get gcc's __builtin_frame_address to work with -O2?

Assembly: push vs movl

I have some C code that I compiled with gcc:
int main() {
int x = 1;
printf("%d\n",x);
return 0;
}
I've run it through gdb 7.9.1 and come up with this assembler code for main:
0x0000000100000f40 <+0>: push %rbp # save original frame pointer
0x0000000100000f41 <+1>: mov %rsp,%rbp # stack pointer is new frame pointer
0x0000000100000f44 <+4>: sub $0x10,%rsp # make room for vars
0x0000000100000f48 <+8>: lea 0x47(%rip),%rdi # 0x100000f96
0x0000000100000f4f <+15>: movl $0x0,-0x4(%rbp) # put 0 on the stack
0x0000000100000f56 <+22>: movl $0x1,-0x8(%rbp) # put 1 on the stack
0x0000000100000f5d <+29>: mov -0x8(%rbp),%esi
0x0000000100000f60 <+32>: mov $0x0,%al
0x0000000100000f62 <+34>: callq 0x100000f74
0x0000000100000f67 <+39>: xor %esi,%esi # set %esi to 0
0x0000000100000f69 <+41>: mov %eax,-0xc(%rbp)
0x0000000100000f6c <+44>: mov %esi,%eax
0x0000000100000f6e <+46>: add $0x10,%rsp # move stack pointer to original location
0x0000000100000f72 <+50>: pop %rbp # reclaim original frame pointer
0x0000000100000f73 <+51>: retq
As I understand it, push %rbb pushes the frame pointer onto the stack, so we can retrieve it later with pop %rbp. Then, sub $0x10,%rsp clears 10 bytes of room on the stack so we can put stuff on it.
Later interactions with the stack move variables directly into the stack via memory addressing, rather than pushing them onto the stack:
movl $0x0, -0x4(%rbp)
movl $0x1, -0x8(%rbp)
Why does the compiler use movl rather than push to get this information onto the stack?
Does referencing the register after the memory address also put that value into that register?

It is very common for modern compilers to move the stack pointer once at the beginning of a function, and move it back at the end. This allows for more efficient indexing because it can treat the memory space as a memory mapped region rather than a simple stack. For example, values which are suddenly found to be of no use (perhaps due to an optimized shortcutted operator) can be ignored, rather than forcing one to pop them off the stack.
Perhaps in simpler days, there was a performance reason to use push. With modern processors, there is no advantage, so there's no reason to make special cases in the compiler to use push/pop when possible. It's not like compiler-written assembly code is readable!

While Cort is correct, there is another important reason for this practice of apparently allocating space on the stack. According to the ABI, function calls must find the stack 16 byte aligned. Rather than fiddling with the stack every single time a call needs to be made from a function, it is generally easier and more efficient to adjust the stack for proper alignment first and then modify the values that might otherwise have been pushed onto it.
So, the stack is absolutely adjusted for local variable space, but it is also adjusted to provide correct stack alignment for calls into the standard library.

I'm not an authority on assemblers or compilers but I've played around with MASM back in the day and did spend a whole bunch of time with WinDbg while debugging production C++ issues.
I think the answer to your question is because it's easier.
push/pop instructions write to and read from the stack but they also modify the stack as they are processed. C/C++ compiler uses stack for all its local variables. It does it by shifting stack pointer by the exact number of bytes that is needed to hold all local variables and it does so right when you enter the function.
After that reading and writing all those variables can be done from anywhere in the function and also as many times as you want by simply using the mov instructions. If you look at pure assembly, you might question why create a hole in the stack just to copy two values into that space using mov when you could have done two push instructions.
But look at it from compiler author perspective. The process of entering a function, and allocating stack for local variables is coded separately and is completely decoupled from the process of reading/writing those variables.

Do functions occupy memory space?

void demo()
{
printf("demo");
}
int main()
{
printf("%p",(void*)demo);
return 0;
}
The above code prints the address of function demo.
So if we can print the address of a function, that means that this function is present in the memory and is occupying some space in it.
So how much space it is occupying in the memory?

You can see for yourself using objdump -r -d:
0000000000000000 <demo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <demo+0x13>
f: R_X86_64_PC32 printf-0x4
13: 5d pop %rbp
14: c3 retq
0000000000000015 <main>:
EDIT
I took your code and compiled (but not linked!) it. Using objdump you can see the actual way the compiler lays out the code to be run. At the end of the day there is no such thing as a function: for the CPU it's just a jump to some location (that in this listing happens to be labeled). So the size of the "function" is the size of the code that comprises it.
There seems to be some confusion that this is somehow not "real code". Here is what GDB says:
Dump of assembler code for function demo:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov $0x400614,%edi
0x0000000000400536 <+9>: mov $0x0,%eax
0x000000000040053b <+14>: callq 0x400410 <printf#plt>
0x0000000000400540 <+19>: pop %rbp
0x0000000000400541 <+20>: retq
This is exactly the same code, with exactly the same size, patched by the linker to use real addresses. gdb prints offsets in decimal while objdump uses the more favourable hex. As you can see, in both cases the size is 21 bytes.

So if we can print the address of a function, that means that this
function is present in the memory and is occupying some space in it.
Yes, the functions you write are compiled into code that's stored in memory. (In the case of an interpreted language, the code itself is kept in memory and executed by an interpreter.)
So how much space it is occupying in the memory?
The amount of memory depends entirely on the function. You can write a very long function or a very short one. The long one will require more memory. Space used for code generally isn't something you need to worry about, though, unless you're working in an environment with severe memory constraints, such as on a very small embedded system. On desktop computer (or even mobile device) with a modern operating system, the virtual memory system will take care of moving pages of code into or out of physical memory as they're needed, so there's very little chance that your code will consume too much memory.

Of course it's occupying space in memory, the entire program is loaded in memory once you execute it. Typically, the program instructions are stored in the lowest bytes of the memory space, known as the text section. You can read more about that here: http://www.geeksforgeeks.org/memory-layout-of-c-program/

Yes, all functions that you use in your code do occupy memory space. However, the memory space does not necessarily belong exclusively to your function. For example, an inline function would occupy space inside each function from where it is called.
The standard does not provide a way to tell how much space a function occupies in memory, as pointer arithmetic, the trick that lets you compute sizes of contiguous memory regions in the data memory, is not defined for function pointers. Moreover, ISO C forbids conversion of function pointer to object pointer type, so you cannot get around this restriction by casting your function pointer to, say, a char*.
printf("%p",demo);
The above code prints the address of function demo().
That is undefined behavior: %p expects a void*, while you are passing it a void (*)(). You should see a compiler warning, telling that what you are doing is not valid (demo).

As for determining the amount of memory it is occupying, this is not possible at run-time. However, there are other ways you can determine it:
How to get the length of a function in bytes?

The functions are compiled into machine code that will run only on a specific ISA (x86, probably ARM if it's going to run on your phone, etc.) Since different processors may need more or fewer instructions to run the same function, and the length of instructions can also vary, there is no way to know in advance exactly how big the function will be until you compile it.
Even if you know what processor and operating system it will be compiled for, different compilers will create different, equivalent representations of the function depending on which instructions they use and how they optimize the code.
Also, keep in mind a function occupies memory in different ways. I think you are talking about the code itself, which is its own section. During execution, the function can also occupy space on the stack - every time the function is called, more memory is taken up in the form of a stack frame. The amount depends on the number and type of local variables and arguments declared by the function.

Yes however you can declare it as being inline, so the compiler will take the source code and move it where ever you call that function. Or you can also use preprocessor macros. Though do keep in mind using inline will generate larger code but it will execute faster, and the compiler can decide to ignore your inline request if it feels that it will become to large.

what's this between local var and EBP on the stack?

For this simple code:
void main(){
char buf[12];
buf[11]='\xff';
puts(buf);
}
I use gdb to debug this code and get its stack info like this:
0xbffff480: 0x40158ff4 0x40158ff4 0xff0494dc 0x40158ff4
0xbffff490: 0x00000000 0x40016ca0 0xbffff4f8 0x40045de3
0xbffff480 is where "buf" starts, and the last two words are EBP and RET, but what the hell is that between buf and EBP? obviously I don't have any other local vars. And I also tried if I allocate 8 bytes for buf on stack, it just continues from EBP, but if I allocate 9 or more bytes, it always has something in between. Could somebody explain this to me please? Thanks a lot! I am on linux 2.6.9
disassembly for main:
0x080483c4 <main+0>: push %ebp
0x080483c5 <main+1>: mov %esp,%ebp
0x080483c7 <main+3>: sub $0x28,%esp
0x080483ca <main+6>: and $0xfffffff0,%esp
0x080483cd <main+9>: mov $0x0,%eax
0x080483d2 <main+14>: add $0xf,%eax
0x080483d5 <main+17>: add $0xf,%eax
0x080483d8 <main+20>: shr $0x4,%eax
0x080483db <main+23>: shl $0x4,%eax
0x080483de <main+26>: sub %eax,%esp
0x080483e0 <main+28>: movb $0xff,0xfffffff3(%ebp)
0x080483e4 <main+32>: lea 0xffffffe8(%ebp),%eax
0x080483e7 <main+35>: mov %eax,(%esp)
0x080483ea <main+38>: call 0x80482e4
0x080483ef <main+43>: leave
0x080483f0 <main+44>: ret

That would be padding. Your compiler is probably aligning EBP on a 8-byte boundary, because aligned memory is almost always easier and faster to work with (from the processor's point of view). Some data types even require proper alignment to work.
You don't see any padding when you allocate only 8 bytes in your buffer because, in that case, EBP is already properly aligned.

Normally gcc keep the stack aligned to a multiple of 16 for the sake of being able to use SSE instructions. Reading a disassembly of your main would be instructive.

GCC is known to be somewhat trigger happy with stack usage. It often reserves a bit more stack space than strictly needed. Also, it will try to achieve 16-byte stack alignment even when this is not necessary. This seems to be what happens with the first instructions (40 bytes reserved, then %esp alignment to a multiple of 16).
The code you show, however, contains come strange things, especially the sequence from offsets 9 to 27: this is a long, slow, convoluted way of subtracting 16 from %esp, something which could have been done in a single opcode. Subtracting some bytes from %esp at that point is logical in preparation for calling an external function (puts()), and the count (16) respects alignment, but why doing so in such a weird way ?
It might be possible that this sequence is meant to be patched up in some way (e.g. at link time) to support either stack smashing detection code, or some sort of profiling code. I cannot reproduce this on my own systems. You should specify the version of gcc and libc you are using, the exact compilation flags, and the Linux distribution (because distributors may activate some options by default). The "2.6.9" figure is the kernel version, and it has no bearing whatsoever on the problem at hand (it just tells us that the system is quite old).

You should really include an aseembly dump rather than a pure hex dump. but its more than likely one of 2 things:
A stack frame being restored
A stack check to ensure there was no corupption
the start may also contain a stack alignment, forcing one or both of the above