Understanding space allocation for variables on stack - c

I want to understand how space allocation is done for variables on stack.
Here for this C program with no variables
main() { return 0; }
It's disassembly is
push ebp
mov ebp, esp
sub esp, 0c0h
main() {
int i = 10; }
The dis-assembly for this program is
push ebp
mov ebp, esp
sub esp, 0cch
I am initializing an INT variable, whose size is 4 bytes. But in the above dis-assembly compiler is allocating 12 bytes (0cc-0c0).
For the following program
main() { long long int i = 10LL; }
The disassembly is
push ebp
mov ebp, esp
sub esp, 0D0h
In the above disassembly compiler is allocating 16 bytes(0D0 - 0C0) for long long int, whose size is 8 bytes.
Why is compiler assigning 12 bytes(4 bytes extra allocated. It should be 8 byte or 16 byte aligned) for INT, whose size is 4 bytes and 16 bytes for LONG LONG INT, whose size is 8 bytes?
Can someone please clarify this.
Thanks.

The compiler is free to allocate as much extra storage as it wants. The C standard does not dictate constraints on the stack allocation.
EDIT:
I did some experimentation on godbolt with the ICC compiler, the only compiler that generates code like your example. I disproved myself about the arguments to main thing I mentioned before. I also tried creating some character arrays and found that the stack will always allocate in increments of 16 bytes. A char array of 1-16 bytes all cause a 16-byte allocation. Next 17-32 will cause a 32-byte allocation and so on.

Related

how to increment value in assembly which was defined in C?

I want to combine c and assembly code.
I have the following C code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
extern void _increment(unsigned short *x);
int main(int argc, char** argv)
{
unsigned short x = 0;
_increment(&x);
printf("%d", x)
return 0;
}
and the assembly(32bit NASM) code:
section .text
global _increment
_increment:
push ebx
mov bx, word [esp+8] ;what is stored in bx here? the x address?
mov ax, [bx] ;what is stored in ax here? the x value?
;add word [esp+8], 1 -> dosnt work
pop ebx
ret
section .data
if I execute this I get an Segmentation fault. Can someone explain what is stored in the registers/stack? and maybe how I can increment the Value and not the address of X?
It's clear you understand how pointers work in general, the only mistake was the size of the pointer itself. On nearly all CPUs, the size of a pointer (not the size of the thing it points to) is going to equal the size of the instruction pointer register (often called PC for program counter on non-x86 architectures.) There are exceptions to this rule, but usually it's just going to match your hardware's default register size. Which is why, as you mentioned in the comments, that you only loaded half the pointer because you used bx as the destination of your load from the stack. In x86, the register operand that's not in brackets determines the number of bytes that will be read/written to memory. This is why you don't get an assembler error for mismatched operand size when you execute mov bx, word [esp+8] even though esp is 32-bit and bx is 16-bit; the right operand is 16-bit memory rather than a 32-bit register.

How to know what size a char array stores

I am writing a dummy example to simulate buffer overflow attack.
Here is the code:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void target() {
printf("You overflowed successfully, gg");
exit(0);
}
void vulnerable(char* str1) {
char buf[5];
strcpy(buf, str1);
}
int main() {
printf("%d",sizeof(char));
vulnerable("abcdefghijklmnop");
printf("This only prints in normal control flow");
}
I have checked the size of char which is 1.Here my buffer size is 5 but it is still able to hold 16 values before going to segmentation fault. How is this possible. I know I am missing something in basics. Can anyone explain this?
I cannot stress enough that this is undefined behavior. Compilers are not obligated by the C standard to support this behavior and their implementations will vary.
However, to illuminate the issue you are experiencing and hopefully provide insights into what may be occurring, here is an example.
I compiled your code using https://godbolt.org/, x86-64 gcc 11.1, with the -m32 flag. Here is the notable assembly for the vulnerable function.
vulnerable(char*):
push ebp
mov ebp, esp
sub esp, 24
sub esp, 8
push DWORD PTR [ebp+8]
lea eax, [ebp-13]
push eax
call strcpy
add esp, 16
nop
leave
ret
At a high level, this is what is happening:
The stack frame is adjusted such that the previous ebp is pushed on the stack at the new location of ebp, which is the previous esp. Then 32 bytes are subtracted to extend the new stack frame.
The argument to this function (technically a char*) is stored at ebp+8 by the calling convention used
The memory location of buf is at ebp-13.
strcpy is called with the two parameters that you would expect, buf and the function argument.
So by this implementation, you would have to write 13 bytes into buf before overwriting the value of the previous base pointer at ebp. You would need an additional 4 bytes to overwrite the return address (which would be located at ebp+4). strcpy includes the null-terminator character. So technically calling vulnerable("abcdefghijklmnop") is copying 17 characters into buf. But as I mentioned earlier, buf is at ebp-13 and the return address is at ebp+4, so writing 17 bytes will overflow the return address and almost certainly result in a segfault.
It's worth noting that overwriting the return address will almost always cause a segfault (in the case of exploits, overwriting the return address to something "valid" can result in code execution). It's also worth noting that corrupting the value of the previous ebp will corrupt the stack frame of the previous function, but it may not result in a crash. In this case it likely isn't crashing your program because you simply return to main and then exit.
Exploiting Buffer Overflows
If you are interested in exploiting this vulnerability, you need to use the information I provided above to craft a payload. Let's say you want to call the target function. You first need to identify the address of that function in memory. Assuming features such as Address Space Layout Randomization (ASLR) and Position Independent Code (PIC) are turned off, then your functions will be loaded into consistent memory locations. One way to determine the address of target is by disassembling the binary by attaching a debugger or using a tool such as objdump. Let's say the address of target is 0x0408aab0. Then all you need to do is replace the location of the return address on the stack with that value. Let's put it all together.
The address of target is 0x0408aab0 (hypothetically)
The return address is at ebp+4
The return address is 16 bytes away from your buffer
Thus,
Your payload would look like: 16_byte_padding + 0x0408aab0. Depending on the endianness of your system, you may need to write the address bytes in reverse. Under these assumptions you could generate such a payload with python -c "print 'A'*16 + '\xb0\xaa\x08\x04'"

x86 mov instruction in C pointer of different size

I'm trying to replicate an x86 mov instruction, such as mov %ecx,-0x4(%ebp) in C and am confused about how to do it. I have an int array for the registers and an int displacement. How would I move the value of %ecx into the memory address 4 less than the value stored in %ebp?
I have:
int* destAddress=(int*)(displacement + registers[destination]);
*destAddress=registers[source];
I'm getting a Warning: cast to pointer from integer of different size.
mov %ecx,-0x4(%ebp)
or, in Intel syntax:
mov DWORD PTR [ebp-4], ecx
is storing the value in ECX into the memory location [ebp-4].
EBP is the "base pointer" and is commonly used (in unoptimized code) to access data on the stack. Based on the negative offset, this instruction is almost certainly storing the value of ECX into the first DWORD-sized local variable.
If you wanted to translate this to C, it would be:
int local = value;
assuming that value is mapped to the ECX register, and local is a local variable allocated on the stack. Really, that's it.
[Except that a C compiler would generally put a local variable like this in a register, so this would really translate to something more like mov edx, ecx. The only time it would spill to stack would be if it ran out of registers (which isn't uncommon in the very register-poor x86 ISA).Alternatively, you could force it to spill by making the variable volatile: volatile int local = value;.But there is no good reason for doing that in real code.]
There is pointer dereferencing going on here under the hood, of course, as you see in the assembly-language instruction, but it doesn't manifest in the C representation.
If you wanted to get some pointer notation in there, say you had an array of values allocated on the stack, and wanted to initialize its first member:
int array[4];
array[0] = value; // set first element of array to 'value' (== ECX)
The displacement (-4) won't appear at all in the C code. The C compiler handles that.

How does GCC implement variable-length arrays?

How does GCC implement Variable-length arrays (VLAs)? Are such arrays essentially pointers to the dynamically allocated storage such as returned by alloca?
The other alternative I could think of, is that such an array is allocated as last variable in a function, so that the offset of the variables are known during compile-time. However, the offset of a second VLA would then again not be known during compile-time.
Here's the allocation code (x86 - the x64 code is similar) for the following example line taken from some GCC docs for VLA support:
char str[strlen (s1) + strlen (s2) + 1];
where the calculation for strlen (s1) + strlen (s2) + 1 is in eax (GCC MinGW 4.8.1 - no optimizations):
mov edx, eax
sub edx, 1
mov DWORD PTR [ebp-12], edx
mov edx, 16
sub edx, 1
add eax, edx
mov ecx, 16
mov edx, 0
div ecx
imul eax, eax, 16
call ___chkstk_ms
sub esp, eax
lea eax, [esp+8]
add eax, 0
mov DWORD PTR [ebp-16], eax
So it looks to be essentially alloca().
Well, these are just a few wild stabs in the dark, based on the restrictions around VLA's, but anyway:
VLA's can't be:
extern
struct members
static
declared with unspecified bounds (save for function prototype)
All this points to VLA's being allocated on the stack, rather than the heap. So yes, VLA's probably are the last chunks of stack memory allocated whenever a new block is allocated (block as in block scope, these are loops, functions, branches or whatever).
That's also why VLA's increase the risk of Stack overflow, in some cases significantly (word of warning: don't even think about using VLA's in combination with recursive function calls, for example!).
This is also why out-of-bounds access is very likely to cause issues: once the block ends, anything pointing to what Was VLA memory, is pointing to invalid memory.
But on the plus side: this is also why these arrays are thread safe, though (owing to threads having their own stack), and why they're faster compared to heap memory.
The size of a VLA can't be:
an extern value
zero or negative
the extern restriction is pretty self evident, as is the non-zero, non-negative one... however: if the variable that specifies the size of a VLA is a signed int, for example, the compiler won't produce an error: the evaluation, and thus allocation, of a VLA is done during runtime, not compile-time. Hence The size of a VLA can't, and needn't be a given during compile-time.
As MichaelBurr rightly pointed out, VLA's are very similar to alloca memory, with one, IMHO, crucial distinction: memory allocated by alloca is valid from the point of allocation, and throughout the rest of the function. VLA's are block scoped, so the memory is freed once you exit the block in which a VLA is used:
void alloca_diff( void )
{
char *alloca_c, *vla_c;
for (int i=1;i<10;++i)
{
char *alloca_mem = alloca(i*sizeof(*alloca_mem));
alloca_c = alloca_mem;//valid
char vla_arr[i];
vla_c = vla_arr;//invalid
}//end of scope, VLA memory is freed
printf("alloca: %c\n", *alloca_c);//fine
printf("vla: %c\n\", *vla_c);//undefined behaviour... avoid!
}//end of function alloca memory is freed, irrespective of block scope

How do local variables get stored in stack?

It is known that when we declare local variables they get stored into stack which is FILO. But I was asked to draw a diagram to show how those variables are getting pushed into stack? Well, I got little confused for the sample code was given:
int giveMe_Red(int *xPos, int *yPos)
{
int count = 0;
int *nextpos, ifTreped;
int loc[8] = {0};
.
.
.
.
return count;
}
Could anyone please help me to understand how every variable get stored into memory, like arrays, pointers etc. Say, "count" in level-0 then "*nextpos" in level-1 of stack or something else. If there is recursion then how they are stored?
The details depend on the processor, but for example in x86 normally the stack space for all variables is allocated at once with a single subtraction to esp.
The standard prologue is
push ebp ; Save old base pointer
mov ebp, esp ; Mark current base pointer
sub esp, stack_space ; Allocate stack space
the epilogue is
mov esp, ebp ; Free stack space
pop ebp ; Reload old base pointer
ret ; Return to caller
In your case the space needed (assuming 32bit and that those are all the locals) would be
4 bytes for count
4 for nextPos
4 for ifTreped
4*8 for the loc array
for a total of 44 bytes (+ 4 for the space needed to save ebp).
After the sub esp, 44 there would be the code to zero all elements of loc.
EDIT
After checking with gcc seems the allocated space is for 48 bytes (not 44), not sure why but possibly for stack alignment reasons.
Jonathan's reply is correct, but doesn't answer your question. So let's take the simplest case, assume no optimisation, all arguments are passed on the stack, 32 bit ints, 32 bit addresses, the stack grows downwards, caller cleans up.
Before the call to giveme_red is made SP points somewhere. To be able to return from the call you need the return address on the stack, that's four bytes. The two int arguments also go on the stack, 4 bytes each, so SP is now down 12 bytes from its original. Once giveme_red is called more space is needed: four bytes for count, four bytes for the int pointer, four more for 'iftreped' and finally 8 times four bytes for the int array.
In order to be able to implement recursion (giveme_red calling itself directly or indirectly through another function) giveme_red will need to set up a new stack frame to call itself. The same sequence as above is repeated. There is usually one more trick, because you need to be able to access your local variables and the arguments another register called BP is usually saved and restored (on the stack). If you want to learn more Aho, Sethi, Ullman, Lam: Compilers (The Dragon Book) is still the standard reference.

Resources