So I wrote a toy C program that would intentionally cause a stack overflow, just to play around with the limits of my system:
#include <stdio.h>
int kefladhen(int i) {
int j = i + 1;
printf("j is %d\n",j);
kefladhen(j);
}
int main() {
printf("Hello!:D\n");
kefladhen(0);
}
I was surprised to find that the last line printed before a segmentation fault was "j is 174651". Of course the exact number it got to varied a little each time I ran it, but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop. I thought that maybe printf was incurring some overhead, but printf returns before I call kefladhen() recursively so the stack pointer should be back where it was before. I'm storing exactly one int per call, so each stack frame should only be 8 bytes total, right? So 174-thousand odd of them is only about a megabyte and a half of actual memory used, which seems way low to me. What am I misunderstanding here?
...but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop...
Note that the stack isn't the general memory pool. The stack is a chunk pre-allocated for the purpose of providing the stack. It could be just 1MB out of those 4GB of memory on the machine. My guess is your stack size is about 1.3MB; that would be enough for 174,651 eight-byte frames (four bytes for return address, four bytes for the int).
I think that the key misunderstanding here is that the stack does not grow dynamically by itself. It is set statically to a relatively small number, but you can change it in runtime (here is a link to an answer explaining how it is done with setrlimit call).
Others have already discussed the size and allocation of the stack. The reason why the "the exact number it got to varied a little each time I ran it" has to do with cache performance on multi-threaded systems.
In general, you can expect the memory pre-allocated for the stack to be page aligned. However, the starting stack pointer will vary from thread-to-thread/process-to-process/task-to-task. This is to help avoid cache line flushes, invalidations and loads. If all the tasks/threads/processes had the same virtual address for the stack pointer, you would expect that there would be more cache collisions whenever a context switch occurs. In an attempt to reduce this likelihood, many OSes will have the starting stack pointer somewhere within the starting stack page, but not necessarily at the very top, or at the same position. Thus, when a context switch occurs and a subsequent stack access occurs, there is ...
a better chance the variable will already be in the cache
a better chance that there will not be a cache collision
Hope this helps.
Related
This simple C program rarely terminates at the same call depth:
#include <stdio.h>
#include <stdlib.h>
void recursive(unsigned int rec);
int main(void)
{
recursive(1);
return 0;
}
void recursive(unsigned int rec) {
printf("%u\n", rec);
recursive(rec + 1);
}
What could be the reasons behind this chaotic behavior?
I am using fedora (16GiB ram, stack size of 8192), and compiled using cc without any options.
EDIT
I am aware that this program will throw a stackoverflow
I know that enabling some compiler optimizations will change the behavior and that the program will reach integer overflow.
I am aware that this is undefined behavior, the purpose of this question is to understand/get an overview of the implementation specific internal behaviors that might explain what we observe there.
The question is more, given that on Linux the thread stack size is fixed and given by ulimit -s, what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
EDIT 2
#BlueMoon always sees the same output on his CentOS, while on my Fedora, with a stack of 8M, I see different outputs (last printed integer 261892 or 261845, or 261826, or ...)
Change the printf call to:
printf("%u %p\n", rec, &rec);
This forces gcc to put rec on the stack and gives you its address which is a good indication of what's going on with the stack pointer.
Run your program a few times and note what's going on with the address that's being printed at the end. A few runs on my machine shows this:
261958 0x7fff82d2878c
261778 0x7fffc85f379c
261816 0x7fff4139c78c
261926 0x7fff192bb79c
First thing to note is that the stack address always ends in 78c or 79c. Why is that? We should crash when crossing a page boundary, pages are 0x1000 bytes long and each function eats 0x20 bytes of stack so the address should end with 00X or 01X. But looking at this closer, we crash in libc. So the stack overflow happens somewhere inside libc, from this we can conclude that calling printf and everything else it calls needs at least 0x78c = 1932 (possibly plus X*4096) bytes of stack to work.
The second question is why does it take a different number of iterations to reach the end of the stack? A hint is in the fact that the addresses we get are different on every run of the program.
1 0x7fff8c4c13ac
1 0x7fff0a88f33c
1 0x7fff8d02fc2c
1 0x7fffbc74fd9c
The position of the stack in memory is randomized. This is done to prevent a whole family of buffer overflow exploits. But since memory allocations, especially at this level, can only be done in multiple of pages (4096 bytes) all initial stack pointers would be aligned at 0x1000. This would reduce the number of random bits in the randomized stack address, so additional randomness is added by just wasting a random amount of bytes at the top of the stack.
The operating system can only account the amount of memory you use, including the limit on the stack, in whole pages. So even though the stack starts at a random address, the last accessible address on the stack will always be an address ending in 0xfff.
The short answer is: to increase the amount of randomness in the randomized memory layout a bunch of bytes on the top of the stack are deliberately wasted, but the end of the stack has to end on a page boundary.
You won't have the same behaviour between executions because it depends on the current memory available. The more memory you have available, the further you'll go in this recursive function.
Your program runs infinitely as there is no base condition in your recursive function. Stack will grow continuously by each function call and will result in stack overflow.
If it would be the case of tail-recursion optimization (with option -O2), then stack overflow will occur for sure. Its invoke undefined behavior.
what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
When stack overflow occurs it invokes undefined behavior. Nothing can be said about the result in this case.
Your recursive call is not necessarily going to cause undefined behaviour due to stackoverflow (but will due to integer overflow) in practice. An optimizing compiler could simply turn your compiler into an infinite "loop" with a jump instruction:
void recursive(int rec) {
loop:
printf("%i\n", rec);
rec++;
goto loop;
}
Note that this is going to cause undefined behaviour since it's going to overflow rec (signed int overflow is UB). For example, if rec is of an unsigned int, for example, then the code is valid and in theory, should run forever.
The above code can cause two issue:
Stack Overflow.
Integer overflow.
Stack Overflow: When a recursive function is called, all its variable is pushed onto the call stack including its return address. As there is no base condition which will terminate the recursion and the stack memory is limited, the stack will exhausted resulting Stack Overflow exception. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash.
Note that, every time a function exits/return, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables. But for recursive function, the return address are still on the stack until the recursion terminates. Moreover, automatic local variables are allocated as a single block and stack pointer advanced far enough to account for the sum of their sizes. You maybe interested at Recursive Stack in C.
Integer overflow: As every recursive call of recursive() increments rec by 1, there is a chance that Integer Overflow can occur. For that, you machine must have a huge stack memory as the range of unsigned integer is: 0 to 4,294,967,295. See reference here.
There is a gap between the stack segment and the heap segment. Now because the size of heap is variable( keeps on changing during execution), therefore the extent to which your stack will grow before stackoverflow occurs is also variable and this is the reason why your program rarely terminates at the same call depth.
When a process loads a program from an executable, typically it allocates areas of memory for the code, the stack, the heap, initialised and uninitialised data.
The stack space allocated is typically not that large, (10s of megabytes probably) and so you would imagine that physical RAM exhaustion would not be an issue on a modern system and the stack overflow would always happen at the same depth of recursion.
However, for security reasons, the stack isn't always in the same place. Address Space Layout Randomisation ensures that the base of the stack's location varies between invocations of the program. This means that the program may be able to do more (or fewer) recursions before the top of the stack hits something inaccessible like the program code.
That's my guess as to what is happening, anyway.
On Linux, using C, assume I have an dynamically determined n naming the number of elements I have to store in an array (int my_array[n]) just for a short period of time, say, one function call, whereby the called function only uses little memory (some hundred bytes).
Mostly n is little, some tenths. But sometimes n may be big, as much as 1000 or 1'000'000.
How do I calculate, whether my stack can hold n*o + p bytes without overflowing?
Basically: How much bytes are there left on my stack?
Indeed, the checking available stack question gives good answer.
But a more pragmatic answer is: don't allocate big data on the call stack.
In your case, you could handle differently the case when n<100 (and then allocating on the stack, perhaps thru alloca, makes sense) and the case when n>=100 (then, allocate on the heap with malloc (or calloc) and don't forget to free it). Make the threshold 100 a #define-d constant.
A typical call frame on the call stack should be, on current laptops or desktops, a few kilobytes at most (and preferably less if you have recursion or threads). The total stack space is ordinarily at most a few megabytes (and sometimes much less: inside the kernel, stacks are typically 4Kbytes each!).
If you are not using threads, or if you know that your code executes on the main stack, then
Record current stack pointer when entering main
In your routine, get current stack limit (see man getrlimit)
Compare difference between current stack pointer and the one recorded in step 1 with the limit from step 2.
If you are using threads and could be executing on a thread other than main, see man pthread_getattr_np
My system (linux kernel 2.6.32-24) is implementing a feature named Address Space Layout Randomization (ASLR). ASLR seems to change the stack size:
void f(int n)
{
printf(" %d ", n);
f(n + 1);
}
int main(...)
{
f(0);
}
Obviously if you execute the program you'll get a stack overflow. The problem is that segmentation fault happens on different values of "n" on each execution. This is clearly caused by the ASLR (if you disable it the program exits always at the same value of "n").
I have two questions:
does it mean that ASLR make stack size slightly variable?
if so, do you see a problem in this fact? Could be a kernel bug?
It might mean that in one instance the stack happens to flow into some other allocated block, and in the other instance, it trips over unallocated address-space.
ASLR stands for "address space layout randomization". What it does is change various section/segment start addresses on each run, and yes, this includes the stack.
It's not a bug; it's by design. Its purpose, in part, is to make it harder to gain access by overflowing buffers, since in order to execute arbitrary code, you need to trick the CPU into "returning" to a certain point on the stack, or in the runtime libraries. Legitimate code would know where to return to, but some canned exploit wouldn't -- it could be a different address every time.
As for why the apparent stack size changes, stack space is allocated in pages, not bytes. Tweaking the stack pointer, especially if it's not by a multiple of the page size, changes the amount of space you see available.
i declared a struct variable in C of size greater than 1024bytes. On running Coverity (a static code analyzer application) it reports that this stack variable is greater than 1024 bytes and therefore a cause of error.
I'd like to know if I need to worry about this warning? Is there really a maximum limit to the size of a single stack variable?
thanks,
che
The maximum size of a variable is the limited by the maximum size of the stack (specifically, how much of the stack is left over from any current use including variables and parameters from functions higher on the stack as well as process frame overhead).
On Windows, the stacksize of the first thread is a property of the executable set during linking while the stacksize of a thread can be specified during thread creation.
On Unix, the stacksize of the first thread is usually only limited only by how much room there is for it to grow. Depending on how the particular Linux lays out memory and your use of shared objects, that can vary. The stacksize of a thread can also be specified during thread creation.
The problem it is trying to protect you from is stack overflow, because of different execution paths, it is very hard to find in testing. Mostly for this reason - it is considered bad form to allocate a large amount of data on the stack. You are only really likely to run into a real problem on an embedded system though.
In other words, it sets an arbitrary limit to what it considers too much data on the stack.
Yes. Of course it's limited by the address space of your system. It's also limited by the amount of space allocated to the stack by your OS, which usually can't be changed after your program starts but can be changed beforehand (either by the launching process, or by the properties of the executable). At a quick glance, the maximum stack size on my OS X system is 8 MiB and on Linux it's 10 MiB. On some systems, you can even allocate a different amount of stack to each different thread you start, although this is of limited usefulness. Most compilers also have another limit to how much they'll allow in a single stack frame.
On a modern desktop, I wouldn't worry about a 1k stack allocation unless the function were recursive. If you're writing embedded code or code for use inside an OS kernel, it would be a problem. Code in the Linux kernel is only permitted 64 KiB stacks or less, depending on configuration options.
This article is pretty interesting regarding stack size http://www.embedded.com/columns/technicalinsights/47101892?_requestid=27362
Yes is it OS dependent and also other things dependent. Sorry to be so vague. You may also be able to dig up some code in the gcc collection for testing stack size.
If your function was involved (directly or indirectly) in recursion, then allocating a large amount on the stack would limit the depth of recursion and might well blow the stack. Under Windows this stack reserve defaults to 1MB, though you can increase it statically with linker commands. The stack will grow as it is used, but the operating system sometimes cannot extend it. I discuss this in a little more detail on my website here.
As I have seen, a C compiler(turbo) provides a maximum size of 64000k for a variable. If we need more size, then it is declared as "huge".
It's not a good idea to try to use a massive amount of stack space.
Here is a link to the default gcc stack size: http://www.cs.nyu.edu/exact/core/doc/stackOverflow.txt
Also, you could specify --stack,xxxxx to customize the stack size, so it's best to assume xxxxx is a small number and stick with heap allocation.
Stack, heap, low, high VM -- Nuts, for the first thread, the stack at the top pf 64 bit VM, there should be no limit, so it seems like a gcc/c compiler bug that for local automatic "int x[2621440];" I get SIGSEGV. The compiler should be letting the first thread stack grow until it hits the heap, which in a 16 billion billion byte VM is pretty unlikely for now. The kindest thing is to call it a compiler "limitation". (In testing some while back, probably on a Solaris SPARC, it seemed that local variables processed faster than global ones. Go figure!)
What's the advantage of using malloc (besides the NULL return on failure) over static arrays? The following program will eat up all my ram and start filling swap only if the loops are uncommented. It does not crash.
...
#include <stdio.h>
unsigned int bigint[ 1u << 29 - 1 ];
unsigned char bigchar[ 1u << 31 - 1 ];
int main (int argc, char **argv) {
int i;
/* for (i = 0; i < 1u << 29 - 1; i++) bigint[i] = i; */
/* for (i = 0; i < 1u << 31 - 1; i++) bigchar[i] = i & 0xFF; */
getchar();
return 0;
}
...
After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit? Apparently I can have as many of of them as I want. It will segfault, but only if I ask for (and try to use) more than malloc would give me anyway.
Is there a way to determine if a static array was actually allocated and safe to use?
EDIT: I'm interested in why malloc is used to manage the heap instead of letting the virtual memory system handle it. Apparently I can size an array to many times the size I think I'll need and the virtual memory system will only keep in ram what is needed. If I never write to e.g. the end (or beginning) of these huge arrays then the program doesn't use the physical memory. Furthermore, if I can write to every location then what does malloc do besides increment a pointer in the heap or search around previous allocations in the same process?
Editor's note: 1 << 31 causes undefined behaviour if int is 32-bit, so I have modified the question to read 1u. The intent of the question is to ask about allocating large static buffers.
Well, for two reasons really:
Because of portability, since some systems won't do the virtual memory management for you.
You'll inevitably need to divide this array into smaller chunks for it to be useful, then to keep track of all the chunks, then eventually as you start "freeing" some of the chunks of the array you no longer require you'll hit the problem of memory fragmentation.
All in all you'll end up implementing a lot of memory management functionality (actually pretty much reimplementing the malloc) without the benefit of portability.
Hence the reasons:
Code portability via memory management encapsulation and standardisation.
Personal productivity enhancement by the way of code re-use.
Please see:
malloc() and the C/C++ heap
Should a list of objects be stored on the heap or stack?
C++ Which is faster: Stack allocation or Heap allocation
Proper stack and heap usage in C++?
About C/C++ stack allocation
Stack,Static and Heap in C++
Of Memory Management, Heap Corruption, and C++
new on stack instead of heap (like alloca vs malloc)
with malloc you can grow and shrink your array: it becomes dynamic, so you can allocate exactly for what you need.
This is called custom memory management, I guess.
You can do that, but you'll have to manage that chunk of memory yourself.
You'd end up writing your own malloc() woring over this chunk.
Regarding:
After some trial and error I found the
above is the largest static array
allowed on my 32-bit Intel machine
with GCC 4.3. Is this a standard
limit, a compiler limit, or a machine
limit?
One upper bound will depend on how the 4GB (32-bit) virtual address space is partitioned between user-space and kernel-space. For Linux, I believe the most common partitioning scheme has a 3 GB range of addresses for user-space and a 1 GB range of addresses for kernel-space. The partitioning is configurable at kernel build-time, 2GB/2GB and 1GB/3GB splits are also in use. When the executable is loaded, virtual address space must be allocated for every object regardless of whether real memory is allocated to back it up.
You may be able to allocate that gigantic array in one context, but not others. For example, if your array is a member of a struct and you wish to pass the struct around. Some environments have a 32K limit on struct size.
As previously mentioned, you can also resize your memory to use exactly what you need. It's important in performance-critical contexts to not be paging out to virtual memory if it can be avoided.
There is no way to free stack allocation other than going out of scope. So when you actually use global allocation and VM has to alloc you real hard memory, it is allocated and will stay there until your program runs out. This means that any process will only grow in it's virtual memory use (functions have local stack allocations and those will be "freed").
You cannot "keep" the stack memory once it goes out of scope of function, it is always freed. So you must know how much memory you will use at compile time.
Which then boils down to how many int foo[1<<29]'s you can have. Since first one takes up whole memory (on 32bit) and will be (lets lie: 0x000000) the second will resolve to 0xffffffff or thereaobout. Then the third one would resolve to what? Something that 32bit pointers cannot express. (remember that stack reservations are resolved partially at compiletime, partially runtime, via offsets, how far the stack offset is pushed when you alloc this or that variable).
So the answer is pretty much that once you have int foo [1<<29] you cant have any reasonable depth of functions with other local stack variables anymore.
You really should avoid doing this unless you know what you're doing. Try to only request as much memory as you need. Even if it's not being used or getting in the way of other programs it can mess up the process its self. There are two reasons for this. First, on certain systems, particularly 32bit ones it can cause address space to be exhausted prematurely in rare circumstances. Additionally many kernels have some kind of per process limit on reserved/virtual/not in use memory. If your program asks for memory at points in run time the kernel can kill the process if it asks for memory to be reserved that exceeds this limit. I've seen programs that have either crashed or exited due to a failed malloc because they are reserving GBs of memory while only using a few MB.