Why am I allowed to exit main using ret? - c

I am about to figure out how exactly a programm stack is set up.
I have learned that calling the function with
call pointer;
Is effectively the same as:
mov register, pc ;programcounter
add register, 1 ; where 1 is one instruction not 1 byte ...
push register
jump pointer
However, this would mean that when the Unix Kernel calls the main function that the stack base should point to reentry in the kernel function which calls main.
Therefore jumping "*rbp-1" in the C - Code should reenter the main function.
This, however, is not what happens in the following code:
#include <stdlib.h>
#include <unistd.h>
extern void ** rbp(); //pointer to stack pointing to function
int main() {
void ** p = rbp();
printf("Main: %p\n", main);
printf("&Main: %p\n", &main); //WTF
printf("*Main: %p\n", *main); //WTF
printf("Stackbasepointer: %p\n", p);
int (*c)(void) = (*p)-4;
asm("movq %rax, 0");
c();
return 0; //should never be executed...
}
Assembly file: rsp.asm
...
.intel_syntax
.text:
.global _rbp
_rbp:
mov rax, rbp
ret;
This is not allowed, unsurprisingly, maybe because the instruction at this point are not exactly 64 bits, maybe because UNIX does not allow this...
But also this call is not allowed:
void (*c)(void) = (*p);
asm("movq %rax, 0"); //Exit code is 11, so now it should be 0
c(); //this comes with stack corruption, when successful
This means I am not obliged to exit the main - calling function.
My question then is: Why am I when I use ret as seen in the end of every GCC main function?, which should do effectively the same as the code above. How does a unix - system check for such attempts effectively...
I hope my question is clear...
Thank you.
P.S.: Code compiles only on macOS, change assembly for linux

C main is called (indirectly) from CRT startup code, not directly from the kernel.
After main returns, that code calls atexit functions to do stuff like flushing stdio buffers, then passes main's return value to a raw _exit system call. Or exit_group which exits all threads.
You make several wrong assumptions, all I think based on a misunderstanding of how kernels work.
The kernel runs at a different privilege level from user-space (ring 0 vs. ring 3 on x86). Even if user-space knew the right address to jump to, it can't jump into kernel code. (And even if it could, it wouldn't be running with kernel privilege level).
ret isn't magic, it's basically just pop %rip and doesn't let you jump anywhere you couldn't jump to with other instructions. Also doesn't change privilege level1.
Kernel addresses aren't mapped / accessible when user-space code is running; those page-table entries are marked as supervisor-only. (Or they're not mapped at all in kernels that mitigate the Meltdown vulnerability, so entering the kernel goes through a "wrapper" block of code that changes CR3.)
Virtual memory is how the kernel protects itself from user-space. User-space can't modify page tables directly, only by asking the kernel to do it via mmap and mprotect system calls. (And user-space can't execute privileged instructions like mov cr3, rax to install new page tables. That's the purpose of having ring 0 (kernel mode) vs. ring 3 (user mode).)
The kernel stack is separate from the user-space stack for a process. (In the kernel, there's also a small kernel stack for each task (aka thread) that's used during system calls / interrupts while that user-space thread is running. At least that's how Linux does it, IDK about others.)
The kernel doesn't literally call user-space code; The user-space stack doesn't hold any return address back into the kernel. A kernel->user transition involves swapping stack pointers, as well as changing privilege levels. e.g. with an instruction like iret (interrupt-return).
Plus, leaving a kernel code address anywhere user-space can see it would defeat kernel ASLR.
Footnote 1: (The compiler-generated ret will always be a normal near ret, not a retf that could return through a call gate or something to a privileged cs value. x86 handles privilege levels via the low 2 bits of CS but nevermind that. MacOS / Linux don't set up call gates that user-space can use to call into the kernel; that's done with syscall or int 0x80 instructions.)
In a fresh process (after an execve system call replaced the previous process with this PID with a new one), execution begins at the process entry point (usually labeled _start), not at the C main function directly.
C implementations come with CRT (C RunTime) startup code that has (among other things) a hand-written asm implementation of _start which (indirectly) calls main, passing args to main according to the calling convention.
_start itself is not a function. On process entry, RSP points at argc, and above that on the user-space stack is argv[0], argv[1], etc. (i.e. the char *argv[] array is right there by value, and above that the envp array.) _start loads argc into a register and puts pointers to the argv and envp into registers. (The x86-64 System V ABI that MacOS and Linux both use documents all this, including the process-startup environment and the calling convention.)
If you try to ret from _start, you're just going to pop argc into RIP, and then code-fetch from absolute address 1 or 2 (or other small number) will segfault. For example, Nasm segmentation fault on RET in _start shows an attempt to ret from the process entry point (linked without CRT startup code). It has a hand-written _start that just falls through into main.
When you run gcc main.c, the gcc front-end runs multiple other programs (use gcc -v to show details). This is how the CRT startup code gets linked into your process:
gcc preprocesses (CPP) and compiles+assembles main.c to main.o (or a temporary file). On MacOS, the gcc command is actually clang which has a built-in assembler, but real gcc really does compile to asm and then run as on that. (The C preprocessor is built-in to the compiler, though.)
gcc runs something like ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie /usr/lib/Scrt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtbeginS.o main.o -lc -lgcc /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtendS.o. That's actually simplified a lot, with some of the CRT files left out, and paths canonicalized to remove ../../lib parts. Also, it doesn't run ld directly, it runs collect2 which is a wrapper for ld. But anyway, that statically links in those .o CRT files that contain _start and some other stuff, and dynamically links libc (-lc) and libgcc (for GCC helper functions like implementing __int128 multiply and divide with 64-bit registers, in case your program uses those).
.intel_syntax
.text:
.global _rbp
_rbp:
mov rax, rbp
ret;
This is not allowed, ...
The only reason that doesn't assemble is because you tried to declare .text: as a label, instead of using the .text directive. If you remove the trailing : it does assemble with clang (which treats .intel_syntax the same as .intel_syntax noprefix).
For GCC / GAS to assemble it, you'd also need the noprefix to tell it that register names aren't prefixed by %. (Yes you can have Intel op dst, src order but still with %rsp register names. No you shouldn't do this!) And of course GNU/Linux doesn't use leading underscores.
Not that it would always do what you want if you called it, though! If you compiled main without optimization (so -fno-omit-frame-pointer was in effect), then yes you'd get a pointer to the stack slot below the return address.
And you definitely use the value incorrectly. (*p)-4; loads the saved RBP value (*p) and then offsets by four 8-byte void-pointers. (Because that's how C pointer math works; *p has type void* because p has type void **).
I think you're trying to get your own return address and re-run the call instruction (in main's caller) that reached main, eventually leading to a stack overflow from pushing more return addresses. In GNU C, use void * __builtin_return_address (0) to get your own return address.
x86 call rel32 instructions are 5 bytes, but the call that called main was probably an indirect call, using a pointer in a register. So it might be a 2-byte call *%rax or a 3-byte call *%r12, you don't know unless you disassemble your caller. (I'd suggest single-stepping by instructions (GDB / LLDB stepi) off the end of main using a debugger in disassembly mode. If it has any symbol info for main's caller, you'll be able to scroll backward and see what the previous instruction was.
If not, you might have to try and see what looks sane; x86 machine code can't be unambiguously decoded backwards because it's variable-length. You can't tell the difference between a byte within an instruction (like an immediate or ModRM) vs. the start of an instruction. It all depends on where you start disassembling from. If you try a few byte offsets, usually only one will produce anything that looks sane.
asm("movq %rax, 0"); //Exit code is 11, so now it should be 0
This is a store of RAX to absolute address 0, in AT&T syntax. This of course segfaults. exit code 11 is from SIGSEGV, which is signal 11. (Use kill -l to see signal numbers).
Perhaps you wanted mov $0, %eax. Although that's still pointless here, you're about to call through your function pointer. In debug mode, the compiler might load it into RAX and step on your value.
Also, writing a register in an asm statement is never safe when you don't tell the compiler which registers you're modifying (using constraints).
printf("Main: %p\n", main);
printf("&Main: %p\n", &main); //WTF
main and &main are the same thing because main is a function. That's just how C syntax works for function names. main isn't an object that can have its address taken. & operator optional in function pointer assignment
It's similar for arrays: the bare name of an array can be assigned to a pointer or passed to functions as a pointer arg. But &array is also the same pointer, same as &array[0]. This is true only for arrays like int array[10], not for pointers like int *ptr; in the latter case the pointer object itself has storage space and can have its own address taken.

I think there are quite a few misunderstandings you have here. First, main is not what gets called by the kernel. The kernel allocates a process and loads our binary into memory - usually from an ELF file if you are using a Unix-based OS. This ELF file contains all of the sections that need to be mapped into memory and an address that is the "Entry Point" for the code in the ELF(among other things). The ELF can specify any address for the loader to jump to in order to start launching the program. In applications built with GCC, this is a function called _start. _start then sets up the stack and does any other initialization it needs to before calling __libc_start_main which is a libc function that can do additional set up before calling main main.
Here is an example of a start function:
00000000000006c0 <_start>:
6c0: 31 ed xor %ebp,%ebp
6c2: 49 89 d1 mov %rdx,%r9
6c5: 5e pop %rsi
6c6: 48 89 e2 mov %rsp,%rdx
6c9: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
6cd: 50 push %rax
6ce: 54 push %rsp
6cf: 4c 8d 05 0a 02 00 00 lea 0x20a(%rip),%r8 # 8e0 <__libc_csu_fini>
6d6: 48 8d 0d 93 01 00 00 lea 0x193(%rip),%rcx # 870 <__libc_csu_init>
6dd: 48 8d 3d 7c ff ff ff lea -0x84(%rip),%rdi # 660 <main>
6e4: ff 15 f6 08 20 00 callq *0x2008f6(%rip) # 200fe0 <__libc_start_main#GLIBC_2.2.5>
6ea: f4 hlt
6eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
As you can see, this function sets the value of the stack and the stack base pointer. Therefore, there is no valid stack frame in this function. The stack frame is not even set to anything but 0 until you call main (at least by this compiler)
Now what is important to see here is that The stack was initialized in this code, and by the loader, it is not a continuation of the kernel's stack. Each program has its own stack, and these are all different from the kernel's stack. In fact, even if you knew the address of the stack in the kernel, you could not read from it or write to it from your program because your process can only see the pages of memory that have been allocated to it by the MMU which is controlled by the kernel.
Just to clarify, when I said the stack was "created" I did not mean that it was allocated. I only mean that the stack pointer and stack base are set here. The memory for it is allocated when the program is loaded, and pages are added to it as needed whenever a page fault is triggered by a write to an unallocated part of the stack. Upon entering start there is clearly some stack in existence as evidence from the pop rsi instruction however this is not the stack the final stack values that will be used by the program. those are the variables that get set up in _start (maybe these get changed in __libc_start_main later on, I'm not sure.)

However, this would mean that when the Unix Kernel calls the main function that the stack base should point to reentry in the kernel function which calls main.
Absolutely not.
This particular question covers the details for MacOS, please have a look. In any case main is most likely returning to start function of the C standard library. Details of implementation differ between different *nix operating systems.
Therefore jumping "*rbp-1" in the C - Code should reenter the main function.
You have no guarantee what the compiler will emit and what will be the state of rsp/rbp when you call rbp() function. You can't make such assumptions.
Btw if you want to access stack entry in 64bit you would do this in +-8 increments (so rbp+8 rbp-8 rsp+8 rsp-8 respectively).

Related

what's happened when process is attacked by "stack buffer overflow"?

I'm a student learning computer security. Recently, I learned stack buffer overflow on c.
I understood its concepts and run sample codes written by c.
void main(){
char buf[] = "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0";
int* p;
p = (int*)&p + 2;
*p = (int)buf;
return;
}
Runtime Environment
Architecture: i686
OS: ubuntu 16.04 32bit
Compiler: gcc
Turn off ASLR(sysctl -w kernel.randomize_va_space=0)
Options: gcc -z execstack -mpreferred-stack-boundary=2 -fno-stack-protector
But I confuse what stack is saved and which memories are overlapped.
Above binary code, "\xeb\x0b\x31\xc0\xb0\x0b\x31\xd2\x31\xc9\x5b\xcd\x80\xe8\xf0\xff\xff\xff/bin/sh\x0",
the same assembly code is
.global main
main:
jmp strings
start:
xor %eax, %eax
movb $0xb, %al
xor %edx, %edx
xor %ecx, %ecx
popl %ebx
int $0x80
strings:
call start
.string "/bin/sh"
means execve("/bin/sh", NULL, NULL);.
When buffer overflow occurs, the binaries are overlapped return addresses of main on stack. But, I'm understood to be that the stack stores data s.t local variables, previous frame pointers, and return address.
I think the above binaries are not data, actually instructions. If so, why is this valid? The stack stores instructions and executes one-by-one by popping them? Or I misunderstand something?
And if the stack stores instructions, how do previous stack frame pointers(fp) and return addresses(ra) work?
I learned that previous function's stack frame address is stored in fp and next instruction's address on code area is stored in ra. So, when called function is terminated, sp is popped and then ra does to restore previous function state and run next instruction. Is it correct? Or I misunderstand something?
I want to know really this..
Thank you for your help.
Data are instructions are instructions are instructions.
The stack is memory is memory is memory.
That's just that.
Since the stack is ordinary memory, just like what you get with malloc, only growing downward and used implicitly by some instructions, you can put any data on the stack.
Since instructions are data, it follows that you can put instructions on the stack.
This particular exploitation works by overwriting the return address with a specific value and everything above it with a sequence of instructions.
That's why you need to tell GCC to make the stack executable (the code is on the stack) and not to generate a canary (both of these protections will suffice to prevent the attack) and also you need to tell Linux not to randomize the process address space layout (or the specific, fixed, value used to overwrite the return address won't work).
The fp and ra thing is most likely for a RISC architecture, x86 doesn't have such registers.
The execution flow is redirected when main returns (with ret), that's what ret does.
Look in Intel's manuals how the call/ret pair works and then see it in practice by just stepping into a call with a debugger.
Make sure you understand the calling convention and keep an eye on the stack every time you step.

Why doesn't my C program have a jmp esp instruction in the binary?

Why can you find jmp esp only in big applications?
In this little program you cant find jmp esp. But why?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
return 0;
}
AT&T jmp *%esp / Intel jmp esp has machine code ff e4. You should be looking for that byte sequence at any offset.
(I assembled a .s with that instruction and used objdump -d to get the machine code.)
There is a lot of discussion in comments from people who thought you were talking about
jmp *(%esp) as a ret without pop. For future readers, see Why JMP ESP instead of directly jumping into the stack on security.SE for more about this ret2reg technique to defeat stack ASLR when trying to return to your executable payload. (But not defeating non-executable stacks, so this is rarely useful on its own in modern systems.) It's a special case of a ROP gadget.
Compilers are never going to use that instruction intentionally, so you'll only ever find it as part of the bytes for another instruction, or in a non-code section. Or not at all if no data happens to include it.
Also, your search method could miss it if it did occur.
objdump | grep 'jmp.*esp' is not good here. That will miss ff e4 as part of mov eax, 0x1234e4ff for example. And disassembly of data sections similarly will only "check" bytes where objdump decides that an instruction starts. (It doesn't do overlapping disassembly starting from every possible byte address; it gets to the end of one instruction and assumes the next instruction starts there.)
But even so, I compiled your code with gcc8.2 with optimization disabled (gcc -m32 foo.c) and searched for e4 bytes in the output of hexdump -C. None of them were preceded by an ff byte. (I tried again with gcc -m32 -no-pie -fno-pie foo.c, still no ff e4)
There's no reason to expect that to appear in a tiny executable.
You could introduce one with a global const unsigned char jmp_esp[] = { 0xff, 0xe4 };
But note that modern toolchains (like late 2018 / 2019) put even the .rodata section in a non-executable segment. So you'd need to compile with -zexecstack for byte sequences in non-code sections to be useful as gadgets.
But you probably need -z execstack or something else to make the stack itself executable, for your payload itself to be in an executable page, not just a jmp esp in a const array.
If you disabled library ASLR, then you could use an ff e4 at a known address somewhere in libc. But with normal randomization of library mapping addresses, it's probably just as easy to try to guess the stack address of your buffer directly, +- some bytes you fill with a NOP slide. (Unless you can get the program you're attacking to leak a library address, defeating ASLR).

How to find the "exit" of a C program

The test is on 32-bit x86 Linux.
So basically I am trying to log the information of executed basic blocks by insert instrumentation instructions in assembly code.
My strategy is like this: Write the index of a executed basic block in a globl array, and flush the array from memory to the disk when the array is full (16M).
Here is my problem. I need the flush the array to the disk when the execution of instrumented binary is over, even if it does not reach 16M boundary. However, I just don't know where to find the exit of a assembly program.
I tried this:
grep exit from the target assembly program, and flush the memory right before the call exit instruction. But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution.
Flush the memory at the end of main function. However, in the assembly code, I just don't know where is the exact end of main function. I can do a conservative approach, say, looking for all the ret instruction, but it seems to me that not all the main function ends with a ret instruction.
So here is my question, how to identify the exact execution end of a assembly code , and insert some instrumentation instructions there? Hooking some library code is fine to me. I understand with different input, binary could exit at different position, so I guess I need some conservative estimation. Am I clear? thanks!
I believe you cannot do that in the general case. First, if main is returning some code, it is an exit code (if main has no explicit return the recent C standards require that the compiler adds an implicit return 0;). Then a function could store the address of exit in some data (e.g. a global function, a field in a struct, ...), and some other function could indrectly call that thru a function pointer. Practically, a program can load some plugins using dlopen and use dlsym for "exit" name, or simply call exit inside the plugin, etc... AFAIU solving that problem (of finding actual exit calls, in the dynamic sense) in full generality can be proved equivalent to the halting problem. See also Rice's theorem.
Without claiming an exhaustive approach, I would suggest something else (assuming you are interested in instrumenting programs coded in C or C++, etc... whose source code is available to you). You could customize the GCC compiler with MELT to change the basic blocks processed inside GCC to call some of your instrumentation functions. It is not trivial, but it is doable... Of course you'll need to recompile some C code with such a customized GCC to instrument it.
(Disclaimer, I am the main author of MELT; feel free to contact me for more...)
BTW, do you know about atexit(3)? It could be helpful for your flushing issue... And you might also use LD_PRELOAD tricks (read about dynamic linkers, see ld-linux(8)).
atexit() will properly handle 95+% of programs. You can either modify its chain of registered handlers, or instrument it as you are other blocks. However, some programs may terminate by use of _exit() which does not invoke atexit handlers. Probably instrumenting _exit to invoke data flushing and installing an atexit (or on_exit() on BSD-like programs) handler should cover nearly 100% of programs.
Addendum: Note that the Linux Base Specification says that the C library startup shall:
call the initializer function (*init)().
call main() with appropriate arguments.
call exit() with the return value from main().
A method that should be working everytime would be to create a shared memory section for storing your data there.
You also create a child process which is waiting for the process being debugged to finish.
As soon as the process being debugged has finished the child process will finalize the write operations using the data that is in the shared memory.
This should work on all forms of exit, process interruptions (e.g. Ctrl+C, closing the terminal window, ...) or even if the process has been killed using "kill".
But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution.
Let's take a look at a md5sum binary on an i686 GNU/Linux system:
In the disassembly (objdump -d /usr/bin/md5sum) we have this:
Disassembly of section .text:
08048f50 <.text>:
8048f50: 55 push %ebp
8048f51: 89 e5 mov %esp,%ebp
8048f53: 57 push %edi
8048f54: 56 push %esi
8048f55: 53 push %ebx
8048f56: 83 e4 f0 and $0xfffffff0,%esp
8048f59: 81 ec c0 00 00 00 sub $0xc0,%esp
8048f5f: 8b 7d 0c mov 0xc(%ebp),%edi
[ ... ]
8049e8f: 68 b0 d6 04 08 push $0x804d6b0
8049e94: 68 40 d6 04 08 push $0x804d640
8049e99: 51 push %ecx
8049e9a: 56 push %esi
8049e9b: 68 50 8f 04 08 push $0x8048f50
8049ea0: e8 4b ef ff ff call 8048df0 <__libc_start_main#plt>
8049ea5: f4 hlt
This is all startup boilerplate code. The actual program's main call is invoked inside the call __libc_start_main. If the program returns from that, then, hey look, there is a hlt instruction. That's your target. Look for that hlt instruction and instrument that as the end of the program.
You could try this:
int main()
bool keepGoing = true;
{
while(keepGoing) {
string x;
cin >> x;
if(x == "stop") {
keepGoing = false;
}
}
}
even though it is primitive... I probably butchered the coding but it's just a concept.

Do functions occupy memory space?

void demo()
{
printf("demo");
}
int main()
{
printf("%p",(void*)demo);
return 0;
}
The above code prints the address of function demo.
So if we can print the address of a function, that means that this function is present in the memory and is occupying some space in it.
So how much space it is occupying in the memory?
You can see for yourself using objdump -r -d:
0000000000000000 <demo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <demo+0x13>
f: R_X86_64_PC32 printf-0x4
13: 5d pop %rbp
14: c3 retq
0000000000000015 <main>:
EDIT
I took your code and compiled (but not linked!) it. Using objdump you can see the actual way the compiler lays out the code to be run. At the end of the day there is no such thing as a function: for the CPU it's just a jump to some location (that in this listing happens to be labeled). So the size of the "function" is the size of the code that comprises it.
There seems to be some confusion that this is somehow not "real code". Here is what GDB says:
Dump of assembler code for function demo:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov $0x400614,%edi
0x0000000000400536 <+9>: mov $0x0,%eax
0x000000000040053b <+14>: callq 0x400410 <printf#plt>
0x0000000000400540 <+19>: pop %rbp
0x0000000000400541 <+20>: retq
This is exactly the same code, with exactly the same size, patched by the linker to use real addresses. gdb prints offsets in decimal while objdump uses the more favourable hex. As you can see, in both cases the size is 21 bytes.
So if we can print the address of a function, that means that this
function is present in the memory and is occupying some space in it.
Yes, the functions you write are compiled into code that's stored in memory. (In the case of an interpreted language, the code itself is kept in memory and executed by an interpreter.)
So how much space it is occupying in the memory?
The amount of memory depends entirely on the function. You can write a very long function or a very short one. The long one will require more memory. Space used for code generally isn't something you need to worry about, though, unless you're working in an environment with severe memory constraints, such as on a very small embedded system. On desktop computer (or even mobile device) with a modern operating system, the virtual memory system will take care of moving pages of code into or out of physical memory as they're needed, so there's very little chance that your code will consume too much memory.
Of course it's occupying space in memory, the entire program is loaded in memory once you execute it. Typically, the program instructions are stored in the lowest bytes of the memory space, known as the text section. You can read more about that here: http://www.geeksforgeeks.org/memory-layout-of-c-program/
Yes, all functions that you use in your code do occupy memory space. However, the memory space does not necessarily belong exclusively to your function. For example, an inline function would occupy space inside each function from where it is called.
The standard does not provide a way to tell how much space a function occupies in memory, as pointer arithmetic, the trick that lets you compute sizes of contiguous memory regions in the data memory, is not defined for function pointers. Moreover, ISO C forbids conversion of function pointer to object pointer type, so you cannot get around this restriction by casting your function pointer to, say, a char*.
printf("%p",demo);
The above code prints the address of function demo().
That is undefined behavior: %p expects a void*, while you are passing it a void (*)(). You should see a compiler warning, telling that what you are doing is not valid (demo).
As for determining the amount of memory it is occupying, this is not possible at run-time. However, there are other ways you can determine it:
How to get the length of a function in bytes?
The functions are compiled into machine code that will run only on a specific ISA (x86, probably ARM if it's going to run on your phone, etc.) Since different processors may need more or fewer instructions to run the same function, and the length of instructions can also vary, there is no way to know in advance exactly how big the function will be until you compile it.
Even if you know what processor and operating system it will be compiled for, different compilers will create different, equivalent representations of the function depending on which instructions they use and how they optimize the code.
Also, keep in mind a function occupies memory in different ways. I think you are talking about the code itself, which is its own section. During execution, the function can also occupy space on the stack - every time the function is called, more memory is taken up in the form of a stack frame. The amount depends on the number and type of local variables and arguments declared by the function.
Yes however you can declare it as being inline, so the compiler will take the source code and move it where ever you call that function. Or you can also use preprocessor macros. Though do keep in mind using inline will generate larger code but it will execute faster, and the compiler can decide to ignore your inline request if it feels that it will become to large.

stack operations on a basic C program

Im disassembling this basic C code, trying to figure out what operations
are done on the stack. Im doing in it on a vm, 32 bit, gcc 4.4.3, ubuntu based
distro. I compiled the code with this flags.
gcc -ggdb -mpreferred-stack-boundary=2 -fno-stack-protector -o ExploitMe ExploitMe.c
#include<stdio.h>
#include<string.h>
main(int argc, char **argv)
{
char buffer[80];
strcpy(buffer, argv[1]);
return 1;
}
The problems is that i cannot figure out why on operation 3, the stack
pointer is moved 0x58, the char is 80 characters long, shouldnt it be 0x50 ?
dump of assembler code for function main:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
=> 0x080483e7 <+3>: sub $0x58,%esp
0x080483ea <+6>: mov 0xc(%ebp),%eax
0x080483ed <+9>: add $0x4,%eax
0x080483f0 <+12>:mov (%eax),%eax
0x080483f2 <+14>:mov %eax,0x4(%esp)
0x080483f6 <+18>:lea -0x50(%ebp),%eax
0x080483f9 <+21>:mov %eax,(%esp)
0x080483fc <+24>:call 0x804831c <strcpy#plt>
0x08048401 <+29>:mov $0x1,%eax
0x08048406 <+34>:leave
0x08048407 <+35>:ret
End of assembler dump.
Im stuck on it, i see later that is taking the exected lenght but what
is the program making between those ops ?¿
0x080483f6 <+18>:lea -0x50(%ebp),%eax
Thank you
The compiler is free to arrange the stack however it sees fit.
The other 8 bytes are for the arguments to strcpy. Rather than push them on to the stack, the compiler has realised that it can simply subtract an extra 8 bytes from the stack pointer and then store the registers to memory. This means that the stack pointer only has to be adjusted once.
it is probably allocating a couple more locations for storing the passed in parameters (argv, argc). and/or it needs some more local storage. Compilers do whatever they want to implement the high level code, the same code will produce dozens/hundreds of different assembly langauge sequences depending on the compiler, version, and optimization settings as well as configure/build settings when the compiler itself was compiled.
You often see this sort of a stack frame though and usually due to a combination of performance and instruction set features/limitations. Much easier to code and debug if you move the stack pointer once or make a copy of it with another register, within the function everything is referenced to one static point while the prepparing, calling, and cleaning up of functions messes with the real stack pointer.
You will often also see that the stack frame leaves room for the passed in parameters and other local variables even if optimization has removed the need for those variables to actually spend any time on the stack. Up front the need for a stack frame and size is determined and optimization comes later and the compiler doesnt always go back and realize that if it makes another pass on the function it can make the stack frame smaller. Likewise the compiler writer can more easily debug if they know that their stack frame always starts with passed in parameters then the local variables in order, very fast and easy to read and debug the code, just an example.
Bottom line though is Oli's answer, the compiler can do whatever it wants so long as it implements your code. My extension to that is the output from the same high level code varies widely depending on the compiler and options. And it is rarely perfectly optimized.

Resources