Variable stack size - c

My system (linux kernel 2.6.32-24) is implementing a feature named Address Space Layout Randomization (ASLR). ASLR seems to change the stack size:
void f(int n)
{
printf(" %d ", n);
f(n + 1);
}
int main(...)
{
f(0);
}
Obviously if you execute the program you'll get a stack overflow. The problem is that segmentation fault happens on different values of "n" on each execution. This is clearly caused by the ASLR (if you disable it the program exits always at the same value of "n").
I have two questions:
does it mean that ASLR make stack size slightly variable?
if so, do you see a problem in this fact? Could be a kernel bug?

It might mean that in one instance the stack happens to flow into some other allocated block, and in the other instance, it trips over unallocated address-space.

ASLR stands for "address space layout randomization". What it does is change various section/segment start addresses on each run, and yes, this includes the stack.
It's not a bug; it's by design. Its purpose, in part, is to make it harder to gain access by overflowing buffers, since in order to execute arbitrary code, you need to trick the CPU into "returning" to a certain point on the stack, or in the runtime libraries. Legitimate code would know where to return to, but some canned exploit wouldn't -- it could be a different address every time.
As for why the apparent stack size changes, stack space is allocated in pages, not bytes. Tweaking the stack pointer, especially if it's not by a multiple of the page size, changes the amount of space you see available.

Related

how to fix stack overflow error?

so, i was making this program that let people know the number of contiguous subarray which sum is equal to a certain value.
i have written the code , but when i try to run this code in vcexpress 2010, it says these error
Unhandled exception at 0x010018e7 in test 9.exe: 0xC00000FD: Stack overflow.
i have tried to search for the solution in this website and other webisites, but i can't seem to find any solution which could help me fix the error in this code(they are using recursion while i'm not).
i would be really grateful if you would kindly explain what cause this error in my code, and how to fix this error. any help would be appreciated. Thank you.
here is my code :
#include <stdio.h>
int main ()
{
int n,k,a=0,t=0;
unsigned long int i[1000000];
int v1,v2=0,v3;
scanf("%d %d",&n,&k);
for(v3=0;v3<n;v3++)
{
scanf("%d",&i[v3]);
}
do
{
for(v1=v2;v1<n;v1++)
{
t=i[v1]+t;
if(t==k)
{
a++;
break;
}
}
t=0;
v2++;
}while(v2!=n);
printf("%lu",a);
return 0;
}
Either move
unsigned long int i[1000000];
outside of main, thus making it a global variable (not an automatic one), or better yet, use some C dynamic heap allocation:
// inside main
unsigned long int *i = calloc(1000000, sizeof(unsigned long int));
if (!i) { perror("calloc"); exit(EXIT_FAILURE); };
BTW, for such a pointer, I would use (for readability reasons) some other name than i. And near the end of main you'll better free(i); to avoid memory leaks.
Also, you could move these 2 lines after the read of n and use calloc(n, sizeof(unsigned long int)) instead of calloc(1000000, sizeof(unsigned long int)) ; then you can handle arrays bigger than a million elements if your computer and system provides enough resources for that.
Your initial code is declaring an automatic variable which goes into the call frame of main on your call stack (which has a limited size, typically a megabyte or a few of them). On some operating systems there is a way to increase the size of that call stack (in an OS-specific way). BTW each thread has its own call stack.
As a rule of thumb, your C functions (including main) should avoid having call frames bigger than a few kilobytes. With the GCC compiler, you could invoke it with gcc -Wall -Wextra -Wframe-larger-than=1024 -g to get useful warnings and debug information.
Read the virtual address space wikipage. It has a nice picture worth many words. Later, find the way to query, on your operating system, the virtual address space of your process (on Linux, use proc(5) like cat /proc/$$/maps etc...). In practice, your virtual address space is likely to contain many segments (perhaps a dozen, sometimes thousands). Often, the dynamic linker or some other part of your program (or of your C standard library) uses memory-mapped files. The standard C heap (managed by malloc etc) may be organized in several segments.
If you want to understand more about virtual address space, take time to read a good book, like: Operating systems, three easy pieces (freely downloadable).
If you want to query the organization of the virtual address space in some process, you need to find an operating-system specific way to do that (on Linux, for a process of pid 1234, use /proc/1234/maps or /proc/self/maps from inside the process).
Memory is laid out much more differently than simply 4 segments(which was done long ago). The answer to the question can be generalized this way - the global or dynamically allocated memory space is handled differently than that of local variables by the system, where as the memory for local variable is limited in size, memory for dynamic allocation or global variables doesn't put a lower constraint like this.
In modern system the concept of virtual address space is there. The process from your program gets a chunk of it. That portion of memory is now responsible for holding the required memory.
Now for dynamic allocation and so on, the process can request more memory and depending on the other processes and so on, new memory request is serviced. For dynamic or global array there is no limit process wise (of course system wise there is- it cant haverequest all memory). That's why dynamic allocation or using global variable won't cause the process to run out of it's allocated memory unlike the automatic lifetime memory that it originally had for local variables.
Basically you can check your stack size
for example in Linux : ulimit -s (Kbytes)
and then decide how you manipulate your code regarding that.
As a concept I would never allocate big piece of memory on the stack because unless you know exactly the depth of your function call and the stack use, it's hard to control the precised allocated memory on stack during run time

Why are stackoverflow errors chaotic?

This simple C program rarely terminates at the same call depth:
#include <stdio.h>
#include <stdlib.h>
void recursive(unsigned int rec);
int main(void)
{
recursive(1);
return 0;
}
void recursive(unsigned int rec) {
printf("%u\n", rec);
recursive(rec + 1);
}
What could be the reasons behind this chaotic behavior?
I am using fedora (16GiB ram, stack size of 8192), and compiled using cc without any options.
EDIT
I am aware that this program will throw a stackoverflow
I know that enabling some compiler optimizations will change the behavior and that the program will reach integer overflow.
I am aware that this is undefined behavior, the purpose of this question is to understand/get an overview of the implementation specific internal behaviors that might explain what we observe there.
The question is more, given that on Linux the thread stack size is fixed and given by ulimit -s, what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
EDIT 2
#BlueMoon always sees the same output on his CentOS, while on my Fedora, with a stack of 8M, I see different outputs (last printed integer 261892 or 261845, or 261826, or ...)
Change the printf call to:
printf("%u %p\n", rec, &rec);
This forces gcc to put rec on the stack and gives you its address which is a good indication of what's going on with the stack pointer.
Run your program a few times and note what's going on with the address that's being printed at the end. A few runs on my machine shows this:
261958 0x7fff82d2878c
261778 0x7fffc85f379c
261816 0x7fff4139c78c
261926 0x7fff192bb79c
First thing to note is that the stack address always ends in 78c or 79c. Why is that? We should crash when crossing a page boundary, pages are 0x1000 bytes long and each function eats 0x20 bytes of stack so the address should end with 00X or 01X. But looking at this closer, we crash in libc. So the stack overflow happens somewhere inside libc, from this we can conclude that calling printf and everything else it calls needs at least 0x78c = 1932 (possibly plus X*4096) bytes of stack to work.
The second question is why does it take a different number of iterations to reach the end of the stack? A hint is in the fact that the addresses we get are different on every run of the program.
1 0x7fff8c4c13ac
1 0x7fff0a88f33c
1 0x7fff8d02fc2c
1 0x7fffbc74fd9c
The position of the stack in memory is randomized. This is done to prevent a whole family of buffer overflow exploits. But since memory allocations, especially at this level, can only be done in multiple of pages (4096 bytes) all initial stack pointers would be aligned at 0x1000. This would reduce the number of random bits in the randomized stack address, so additional randomness is added by just wasting a random amount of bytes at the top of the stack.
The operating system can only account the amount of memory you use, including the limit on the stack, in whole pages. So even though the stack starts at a random address, the last accessible address on the stack will always be an address ending in 0xfff.
The short answer is: to increase the amount of randomness in the randomized memory layout a bunch of bytes on the top of the stack are deliberately wasted, but the end of the stack has to end on a page boundary.
You won't have the same behaviour between executions because it depends on the current memory available. The more memory you have available, the further you'll go in this recursive function.
Your program runs infinitely as there is no base condition in your recursive function. Stack will grow continuously by each function call and will result in stack overflow.
If it would be the case of tail-recursion optimization (with option -O2), then stack overflow will occur for sure. Its invoke undefined behavior.
what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
When stack overflow occurs it invokes undefined behavior. Nothing can be said about the result in this case.
Your recursive call is not necessarily going to cause undefined behaviour due to stackoverflow (but will due to integer overflow) in practice. An optimizing compiler could simply turn your compiler into an infinite "loop" with a jump instruction:
void recursive(int rec) {
loop:
printf("%i\n", rec);
rec++;
goto loop;
}
Note that this is going to cause undefined behaviour since it's going to overflow rec (signed int overflow is UB). For example, if rec is of an unsigned int, for example, then the code is valid and in theory, should run forever.
The above code can cause two issue:
Stack Overflow.
Integer overflow.
Stack Overflow: When a recursive function is called, all its variable is pushed onto the call stack including its return address. As there is no base condition which will terminate the recursion and the stack memory is limited, the stack will exhausted resulting Stack Overflow exception. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash.
Note that, every time a function exits/return, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables. But for recursive function, the return address are still on the stack until the recursion terminates. Moreover, automatic local variables are allocated as a single block and stack pointer advanced far enough to account for the sum of their sizes. You maybe interested at Recursive Stack in C.
Integer overflow: As every recursive call of recursive() increments rec by 1, there is a chance that Integer Overflow can occur. For that, you machine must have a huge stack memory as the range of unsigned integer is: 0 to 4,294,967,295. See reference here.
There is a gap between the stack segment and the heap segment. Now because the size of heap is variable( keeps on changing during execution), therefore the extent to which your stack will grow before stackoverflow occurs is also variable and this is the reason why your program rarely terminates at the same call depth.
When a process loads a program from an executable, typically it allocates areas of memory for the code, the stack, the heap, initialised and uninitialised data.
The stack space allocated is typically not that large, (10s of megabytes probably) and so you would imagine that physical RAM exhaustion would not be an issue on a modern system and the stack overflow would always happen at the same depth of recursion.
However, for security reasons, the stack isn't always in the same place. Address Space Layout Randomisation ensures that the base of the stack's location varies between invocations of the program. This means that the program may be able to do more (or fewer) recursions before the top of the stack hits something inaccessible like the program code.
That's my guess as to what is happening, anyway.

Accessing memory below the stack on linux

This program accesses memory below the stack.
I would assume to get a segfault or just nuls when going out of stack bounds but I see actual data. (This is assuming 100kb below stack pointer is beyond the stack bounds)
Or is the system actually letting me see memory below the stack? Weren't there supposed to be kernel level protections against this, or does that only apply to allocated memory?
Edit: With 1024*127 below char pointer it randomly segfaults or runs, so the stack doesn't seem to be a fixed 8MB, and there seems to be a bit of random to it too.
#include <stdio.h>
int main(){
char * x;
int a;
for( x = (char *)&x-1024*127; x<(char *)(&x+1); x++){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
}
Edit: Another wierd thing. The first program segfaults at only 1024*127 but if I printf downwards away from the stack I don't get a segfault and all the memory seems to be empty (All 0x00):
#include <stdio.h>
int main(){
char * x;
int a;
for( x = (char *)(&x); x>(char *)&x-1024*1024; x--){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
}
When you access memory, you're accessing the process address space.
The process address space is divided into pages (typically 4 KB on x86). These are virtual pages: their contents are held elsewhere. The kernel manages a mapping from virtual pages to their contents. Contents can be provided by:
A physical page, for pages that are currently backed by physical RAM. Accesses to these happen directly (via the memory management hardware).
A page that's been swapped out to disk. Accessing this will cause a page fault, which the kernel handles. It needs to fill a physical page with the on-disk contents, so it finds a free physical page (perhaps swapping that page's contents out to disk), reads in the contents from disk, and updates the mapping to state that "virtual page X is in physical page Y".
A file (i.e. a memory mapped file).
Hardware devices (i.e. hardware device registers). These don't usually concern us in user space.
Suppose that we have a 4 GB virtual address space, split into 4 KB pages, giving us 1048576 virtual pages. Some of these will be mapped by the kernel; others will not. When the process starts (i.e. when main() is invoked), the virtual address space will contain, amongst other things:
Program code. These pages are usually readable and executable.
Program data (i.e. for initialised variables). This usually has some read-only pages and some read-write pages.
Code and data from libraries that the program depends on.
Some pages for the stack.
These things are all mapped as pages in the 4 GB address space. You can see what's mapped by looking at /proc/(pid)/maps, as one of the comments has pointed out. The precise contents and location of these pages depend on (a) the program in question, and (b) address space layout randomisation (ASLR), which makes locations of things harder to guess, thereby making certain security exploitation techniques more difficult.
You can access any particular location in memory by defining a pointer and dereferencing it:
*(unsigned char *)0x12345678
If this happens to point to a mapped page, and that page is readable, then the access will succeed and yield whatever's mapped at that address. If not, then you'll receive a SIGSEGV from the kernel. You could handle that (which is useful in some cases, such as JIT compilers), but normally you don't, and the process will be terminated. As noted above, due to ASLR, if you do this in a program and run the program several times then you'll get non-deterministic results for some addresses.
There is usually quite a bit of accessible memory below the stack pointer, because that memory is used when you grow the stack normally. The stack itself is only controlled by the value of the stack pointer - it is a software entity, not a hardware entity.
However, system code may assume typical stack usage. I. e., on some systems, the stack is used to store state for a context switch, while a signal handler runs, etc. This also depends on whether the hardware automatically switches stack pointers when leaving user mode. If the system does use your stack for this, it will clobber the data you stored there, and that can really happen at every point in your program.
So it is not safe to manipulate stack memory below the stack pointer. It's not even safe to assume that a value that has successfully been written will still be the same in the next line code. Only the portion above the stack pointer is guaranteed not to be touched by the runtime/kernel.
It goes without saying, that this code invokes undefined behavior. The pointer arithmetic is invalid, because the address &x-1024*127 is not allocated to the variable x, so that dereferencing this pointer invokes undefined behavior.
This is undefined behavior in C. You're accessing a random memory address which, depending on the platform, may or may not be on the stack. It may or may not be in memory this user can access; if not you will get a segfault or similar. There are absolutely no promises either way.
Actually, it's not undefined behaviour, it's pretty well defined. Accessing memory locations through pointers is and was always defined since C is as close to the hardware as it can be.
I however agree that accessing hardware through pointers when you don't know exactly what you're doing is a dangerous thing to do.
Don't Do That. (If you're one of the five or six people who has a legitimate reason to do this, you already know it and don't need our advice.)
It would be a poor world with only five or six people legitimately programming operating systems, embedded devices and drivers (although it sometimes appears as if the latter is the case...).
This is undefined behavior in C. You're accessing a random memory address which, depending on the platform, may or may not be on the stack. It may or may not be in memory this user can access; if not you will get a segfault or similar. There are absolutely no promises either way.
Don't Do That. (If you're one of the five or six people who has a legitimate reason to do this, you already know it and don't need our advice.)

Why does this stack overflow so quickly?

So I wrote a toy C program that would intentionally cause a stack overflow, just to play around with the limits of my system:
#include <stdio.h>
int kefladhen(int i) {
int j = i + 1;
printf("j is %d\n",j);
kefladhen(j);
}
int main() {
printf("Hello!:D\n");
kefladhen(0);
}
I was surprised to find that the last line printed before a segmentation fault was "j is 174651". Of course the exact number it got to varied a little each time I ran it, but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop. I thought that maybe printf was incurring some overhead, but printf returns before I call kefladhen() recursively so the stack pointer should be back where it was before. I'm storing exactly one int per call, so each stack frame should only be 8 bytes total, right? So 174-thousand odd of them is only about a megabyte and a half of actual memory used, which seems way low to me. What am I misunderstanding here?
...but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop...
Note that the stack isn't the general memory pool. The stack is a chunk pre-allocated for the purpose of providing the stack. It could be just 1MB out of those 4GB of memory on the machine. My guess is your stack size is about 1.3MB; that would be enough for 174,651 eight-byte frames (four bytes for return address, four bytes for the int).
I think that the key misunderstanding here is that the stack does not grow dynamically by itself. It is set statically to a relatively small number, but you can change it in runtime (here is a link to an answer explaining how it is done with setrlimit call).
Others have already discussed the size and allocation of the stack. The reason why the "the exact number it got to varied a little each time I ran it" has to do with cache performance on multi-threaded systems.
In general, you can expect the memory pre-allocated for the stack to be page aligned. However, the starting stack pointer will vary from thread-to-thread/process-to-process/task-to-task. This is to help avoid cache line flushes, invalidations and loads. If all the tasks/threads/processes had the same virtual address for the stack pointer, you would expect that there would be more cache collisions whenever a context switch occurs. In an attempt to reduce this likelihood, many OSes will have the starting stack pointer somewhere within the starting stack page, but not necessarily at the very top, or at the same position. Thus, when a context switch occurs and a subsequent stack access occurs, there is ...
a better chance the variable will already be in the cache
a better chance that there will not be a cache collision
Hope this helps.

Is there any way to determine the available stack space at run time?

I know that stack size is fixed. So we can not store large objects on stack and we shift to dynamic allocations (e.g. malloc). Also, stack gets used when there is nesting of function calls so we avoid recursive functions as well for this reason. Is there any way at runtime to determine how much stack memory is used so far and how much is left ?
Here, I am assuming linux environment (gcc compiler) with x86 architecture.
There is a pthread API to determine where the stack lies:
#include <pthread.h>
void PrintStackInfo (void)
{ pthread_attr_t Attributes;
void *StackAddress;
int StackSize;
// Get the pthread attributes
memset (&Attributes, 0, sizeof (Attributes));
pthread_getattr_np (pthread_self(), &Attributes);
// From the attributes, get the stack info
pthread_attr_getstack (&Attributes, &StackAddress, &StackSize);
// Done with the attributes
pthread_attr_destroy (&Attributes);
printf ("Stack top: %p\n", StackAddress);
printf ("Stack size: %u bytes\n", StackSize);
printf ("Stack bottom: %p\n", StackAddress + StackSize);
}
On i386, the stack starts at the bottom and grows towards the top.
So you know you have ($ESP - StackAddress) bytes available.
In my system, I have a wrapper around pthread_create(), so each thread starts in my private function. In that function, I find the stack as described above, then find the unused portion, then initialize that memory with a distinctive pattern (or "Patton", as my Somerville, MA-born father-in-law would say).
Then when I want to know how much of the stack has been used, I start at the top and search towards the bottom for the first value that doesn't match my pattern.
Just read %esp, and remember its value goes down. You already know your defaulted max size from the environment, as well as your threads' starting point.
gcc has great assembly support, unlike some flakes out there.
If your application needs to be sure it can use X MB of memory the usual approach is for the process to alloc it at startup time (and fail to start if it cannot alloc the minimum requirement).
This of course, means the application has to employ its own memory management logic.
You can see the state of the stack virtual memory area by looking at /proc/<pid>/smaps. The stack vma grows down automatically when you use more stack spa. You can check how much stack space you are really using by checking how far %esp is from the upper limit of the stack area on smaps (as the stack grows down). Probably the first limit you will hit if you use too much stack space will be the one set by ulimit.
But always remember that these low level details may vary without any notice. Don't expect all Linux kernel versions and all glibc versions to have the same behavior. I would never make my program rely on this information.
That's very much depending on your OS and its memory management. On Linux you can use procfs. It's something like /proc/$PID/memory. I'm not on a Linux box right now.
GCC generally adds 16 bits for the registers (to jump back to the function context referred from) to the stack-frame. Normally you can gain more information on how the program exactly is compiled by disassembling it. Or use -S to get the assembly.
Tcl had a stack check at some time, to avoid crashing due to unlimted recursion or other out of stack issues. Wasn't too portable, e.g. crashed on one of the BSDs..., but you could try to find the code they used.

Resources