On Linux, using C, assume I have an dynamically determined n naming the number of elements I have to store in an array (int my_array[n]) just for a short period of time, say, one function call, whereby the called function only uses little memory (some hundred bytes).
Mostly n is little, some tenths. But sometimes n may be big, as much as 1000 or 1'000'000.
How do I calculate, whether my stack can hold n*o + p bytes without overflowing?
Basically: How much bytes are there left on my stack?
Indeed, the checking available stack question gives good answer.
But a more pragmatic answer is: don't allocate big data on the call stack.
In your case, you could handle differently the case when n<100 (and then allocating on the stack, perhaps thru alloca, makes sense) and the case when n>=100 (then, allocate on the heap with malloc (or calloc) and don't forget to free it). Make the threshold 100 a #define-d constant.
A typical call frame on the call stack should be, on current laptops or desktops, a few kilobytes at most (and preferably less if you have recursion or threads). The total stack space is ordinarily at most a few megabytes (and sometimes much less: inside the kernel, stacks are typically 4Kbytes each!).
If you are not using threads, or if you know that your code executes on the main stack, then
Record current stack pointer when entering main
In your routine, get current stack limit (see man getrlimit)
Compare difference between current stack pointer and the one recorded in step 1 with the limit from step 2.
If you are using threads and could be executing on a thread other than main, see man pthread_getattr_np
Related
This is my problem in essence. In the life of a function, I generate some integers, then use the array of integers in an algorithm that is also part of the same function. The array of integers will only be used within the function, so naturally it makes sense to store the array on the stack.
The problem is I don't know the size of the array until I'm finished generating all the integers.
I know how to allocate a fixed size and variable sized array on the stack. However, I do not know how to grow an array on the stack, and that seems like the best way to solve my problem. I'm fairly certain this is possible to do in assembly, you just increment stack pointer and store an int for each int generated, so the array of ints would be at the end of the stack frame. Is this possible to do in C though?
I would disagree with your assertion that "so naturally it makes sense to store the array on the stack". Stack memory is really designed for when you know the size at compile time. I would argue that dynamic memory is the way to go here
C doesn't define what the "stack" is. It only has static, automatic and dynamic allocations. Static and automatic allocations are handled by the compiler, and only dynamic allocation puts the controls in your hands. Thus, if you want to manually deallocate an object and allocate a bigger one, you must use dynamic allocation.
Don't use dynamic arrays on the stack (compare Why is the use of alloca() not considered good practice?), better allocate memory from the heap using malloc and resize it using realloc.
Never Use alloca()
IMHO this point hasn't been made well enough in the standard references.
One rule of thumb is:
If you're not prepared to statically allocate the maximum possible size as a
fixed length C array then you shouldn't do it dynamically with alloca() either.
Why? The reason you're trying to avoid malloc() is performance.
alloca() will be slower and won't work in any circumstance static allocation will fail. It's generally less likely to succeed than malloc() too.
One thing is sure. Statically allocating the maximum will outdo both malloc() and alloca().
Static allocation is typically damn near a no-op. Most systems will advance the stack pointer for the function call anyway. There's no appreciable difference for how far.
So what you're telling me is you care about performance but want to hold back on a no-op solution? Think about why you feel like that.
The overwhelming likelihood is you're concerned about the size allocated.
But as explained it's free and it gets taken back. What's the worry?
If the worry is "I don't have a maximum or don't know if it will overflow the stack" then you shouldn't be using alloca() because you don't have a maximum and know it if it will overflow the stack.
If you do have a maximum and know it isn't going to blow the stack then statically allocate the maximum and go home. It's a free lunch - remember?
That makes alloca() either wrong or sub-optimal.
Every time you use alloca() you're either wasting your time or coding in one of the difficult-to-test-for arbitrary scaling ceilings that sleep quietly until things really matter then f**k up someone's day.
Don't.
PS: If you need a big 'workspace' but the malloc()/free() overhead is a bottle-neck for example called repeatedly in a big loop, then consider allocating the workspace outside the loop and carrying it from iteration to iteration. You may need to reallocate the workspace if you find a 'big' case but it's often possible to divide the number of allocations by 100 or even 1000.
Footnote:
There must be some theoretical algorithm where a() calls b() and if a() requires a massive environment b() doesn't and vice versa.
In that event there could be some kind of freaky play-off where the stack overflow is prevented by alloca(). I have never heard of or seen such an algorithm. Plausible specimens will be gratefully received!
The innards of the C compiler requires stack sizes to be fixed or calculable at compile time. It's been a while since I used C (now a C++ convert) and I don't know exactly why this is. http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html provides a useful comparison of the pros and cons of the two approaches.
I appreciate your assembly code analogy but C is largely managed, if that makes any sense, by the Operating System, which imposes/provides the task, process and stack notations.
In order to address your issue dynamic memory allocation looks ideal.
int *a = malloc(sizeof(int));
and dereference it to store the value .
Each time a new integer needs to be added to the existing list of integers
int *temp = realloc(a,sizeof(int) * (n+1)); /* n = number of new elements */
if(temp != NULL)
a = temp;
Once done using this memory free() it.
Is there an upper limit on the size? If you can impose one, so the size is at most a few tens of KiB, then yes alloca is appropriate (especially if this is a leaf function, not one calling other functions that might also allocate non-tiny arrays this way).
Or since this is C, not C++, use a variable-length array like int foo[n];.
But always sanity-check your size, otherwise it's a stack-clash vulnerability waiting to happen. (Where a huge allocation moves the stack pointer so far that it ends up in the middle of another memory region, where other things get overwritten by local variables and return addresses.) Some distros enable hardening options that make GCC generate code to touch every page in between when moving the stack pointer by more than a page.
It's usually not worth it to check the size and use alloc for small, malloc for large, since you also need another check at the end of your function to call free if the size was large. It might give a speedup, but this makes your code more complicated and more likely to get broken during maintenance if future editors don't notice that the memory is only sometimes malloced. So only consider a dual strategy if profiling shows this is actually important, and you care about performance more than simplicity / human-readability / maintainability for this particular project.
A size check for an upper limit (else log an error and exit) is more reasonable, but then you have to choose an upper limit beyond which your program will intentionally bail out, even though there's plenty of RAM you're choosing not to use. If there is a reasonable limit where you can be pretty sure something's gone wrong, like the input being intentionally malicious from an exploit, then great, if(size>limit) error(); int arr[size];.
If neither of those conditions can be satisfied, your use case is not appropriate for C automatic storage (stack memory) because it might need to be large. Just use dynamic allocation autom don't want malloc.
Windows x86/x64 the default user-space stack size is 1MiB, I think. On x86-64 Linux it's 8MiB. (ulimit -s). Thread stacks are allocated with the same size. But remember, your function will be part of a chain of function calls (so if every function used a large fraction of the total size, you'd have a problem if they called each other). And any stack memory you dirty won't get handed back to the OS even after the function returns, unlike malloc/free where a large allocation can give back the memory instead of leaving it on the free list.
Kernel thread stack are much smaller, like 16 KiB total for x86-64 Linux, so you never want VLAs or alloca in kernel code, except maybe for a tiny max size, like up to 16 or maybe 32 bytes, not large compared to the size of a pointer that would be needed to store a kmalloc return value.
I invoke a function 147 times in recursive and when it invokes for 147. times, program exe stops(codeblocks).
Before invokin function again, it assigned 1 int global variable to local, 1 int 2 dimensional global array to local and 1 string global variable to local variable. So, 146 of those maybe became a very huge load for program?
The function is:
It seems your stack is overflowing by recursive calls.
Quoting from above wiki page
In software, a stack overflow occurs when the stack pointer exceeds
the stack bound. The call stack may consist of a limited amount of
address space, often determined at the start of the program. The size
of the call stack depends on many factors, including the programming
language, machine architecture, multi-threading, and amount of
available memory. When a program attempts to use more space than is
available on the call stack (that is, when it attempts to access
memory beyond the call stack's bounds, which is essentially a buffer
overflow), the stack is said to overflow, typically resulting in a
program crash
Very deep recursion and large stack variables along with recursion are some easy to fall reasons of stack overflow.
You may want to write a smarter code to get away from recursions.
Below links may help you get there.
Way to go from recursion to iteration
Replace Recursion with Iteration
Each time you invoke your function, you allocate:
int visitedS[2416] = 2416 * 32 bits = 9.4KB
char pathS[4500] = 4500 * 8 bits = 4.4KB
So that's almost 14KB that gets placed on the stack every time you recurse.
After 147 recursions, you've put 1.98MB on the stack. That's not so huge - a typical Linux stack limit is 8MB.
I would check - through using a debugger or even adding debug print statements - your assumption that this is truly happening after 147 recursions. Perhaps there is a bug causing more invocations than you believed.
Even so, it may well be worth thinking about ways to reduce the memory footprint of each invocation. You seem to be creating local arrays which are copies of a global. Why not just use the data in the global. If your function must make changes to that data, keep a small set of deltas locally.
Which is preferable way to allocate memory for a function that is frequently allocating and freeing memory ? Assume this function is getting called around 500 to 1000 times a second on a 1GHz Processor.
(Please ignore static and global variables/allocation. I am interested only this specific case:)
void Test()
{
ptr=malloc(512) // 512 bytes
...
free(ptr)
}
OR
void Test()
{
struct MyStruct localvar; // 512 byte sized structure
...
}
stack allocation of local variables is faster than heap allocation with malloc. However, the total stack space is limited (e.g. to several megabytes). So you should limit yourself to "small" data on the local stack. (and 512 bytes is small by today's standard, but 256Kb would be too large for local stack allocation).
If your function is very deeply recursive, then perhaps even 512 bytes could be too big, because you'll need that for each recursive call frame.
But calling malloc a few thousands time per second should be painless (IMHO a typical small-sized malloctakes a few dozens of microseconds).
For your curiosity, and outside of the C world, you might be interested by old A.Appel's paper garbage collection can be faster than stack allocation (but perhaps cache performance considerations could weaken this claim today).
Local variables are allocated essentially "for free", so there is no contest here if we are only interested in performance.
However:
the choice between a local and a heap-allocated variable is not normally something that you are free to decide without constraint; usually there are factors that mandate the choice, so your question is a bit suspect because it seems to disregard this issue
while allocating on the stack is "free" performance-wise, space on the stack might be limited (although of course 512 bytes is nothing)
Which is preferable way to allocate memory....
Which allocation is faster ?
Do you want the faster way,or the preferable way?
Anyway, in the case you mentioned, I think the second option:
struct MyStruct localvar;
is more efficient, since the memory allocation is done by the Stack. Which is a lot more efficient that using dynamic memory allocation functions like malloc.
Optimizing
Also, if you are doing this for performance & optimizing...
On my PC, using malloc to allocate strings instead of declaring a char array from the stack gives me a lag of ~ 73 nanoseconds per string.
if you copied 50 strings in your program:
4757142 / 50 = 95142 (and a bit) runs of your program
If I run your program 50 times a day:
95142 / 50 = 1902 (and a bit) days
1902 days = 5 1/5 years
So if you run your program every day for 5 years and 2 months, you'll save the time to blink your eye an extra time. Wow, how rewarding...
Turn on your disassembler when you enter your function, and step through the 2 cases.
The local variable (stack based) will require 0 extra cycles -- you won't even see where the allocation comes, because the function will allocate all the local variables in 1 cycle by just moving the stack pointer, and free all the local variables in 1 cycle by restoring the stack pointer. It doesn't matter if you have 1 or 1000 local variables, the "allocation" takes the same amount of time.
The malloc variable ... well, you will quickly get bored click-stepping through the thousands of instructions that are executed to get memory from the system heap. On top of that, you might notice that the number of cycles varies from call to call, depending on how many things you have already allocated from the heap, as malloc requires a "search" through the heap structure every time you ask for memory.
My rule of thumb: always use the stack if possible, instead of malloc/free or new/delete. In addition to faster performance, you get the added benefit of not having to worry about memory or resource leaks. In C this just means forgetting to call free(), but in C++ exceptions can ruin your day if something throws an exception before you call delete. If you use stack variables, this is all handled automatically! However, only use the stack if you are talking about "small" pieces of memory (bytes and KB) and not huge objects (not MB or GB!). If you are talking about huge objects anyways, you are not talking about performance any more and you will probably not be calling free/delete in the same function call anyways.
Stack allocation is faster than malloc+free.
Stack allocations are typically measured in instructions, while malloc+free may require multiple locks (as one example of why it takes long in comparison).
The local variable case will be much faster: allocating a variable on the stack takes no extra time, it just changes the amount the stack pointer is moved. Whereas malloc will have to do some bookkeeping.
Another advantage by using the stack is that it does not fragment the memory space, which the malloc has a tendency to do. Of course this is just an issue for long-running processes.
for example I can do
int *arr;
arr = (int *)malloc(sizeof(int) * 1048575);
but I cannot do this without the program crashing:
int arr[1048575];
why is this so?
Assuming arr is a local variable, declaring it as an array uses memory from the (relatively limited) stack, while malloc() uses memory from the (comparatively limitless) heap.
If you're allocating these as local variables in functions (which is the only place you could have the pointer declaration immediately followed by a malloc call), then the difference is that malloc will allocate a chunk of memory from the heap and give you its address, while directly doing int arr[1048575]; will attempt to allocate the memory on the stack. The stack has much less space available to it.
The stack is limited in size for two main reasons that I'm aware of:
Traditional imperative programming makes very little use of recursion, so deep recursion (and heavy stack growth) is "probably" a sign of infinite recursion, and hence a bug that's going to kill the process. It's therefore best if it is caught before the process consumes the gigabytes of virtual memory (on a 32 bit architecture) that will cause the process to exhaust its address space (at which point the machine is probably using far more virtual memory than it actually has RAM, and is therefore operating extremely slowly).
Multi-threaded programs need multiple stacks. Therefore, the runtime system needs know that the stack will never grow beyond a certain bound, so it can put another stack after that bound if a new thread is created.
When you declare an array, you are placing it on the stack.
When you call malloc(), the memory is taken from the heap.
The stack is usually more limited compared to the heap, and is usually transient (but it depends on how often you enter and exit the function that this array is declared in.
For such a large (maybe not by today's standards?) memory, it is good practice to malloc it, assuming you want the array to last around for a bit.
I'm testing the timing of an algorithm that does lots of recursive calls. My program dies at about 128k recursive calls, and this takes only .05 seconds. I'd like to allow more memory to have longer timings in my analysis. I'm running linux and using gcc. Is there a system call, or environment variable, or gcc flag, or wrapper, or something?
Try to organize your recursive function to have tail recursion.
That is to say, make sure the last operation of your recursive function is the recursive call. By doing this, the compiler can optimize it into simply iterations.
Tail recursion will help you because iterations will dramatically decrease the stack space used.
With tail recursion, you typically pass your value UP all the way, calculating 1 step at a time. So when the recursion is done, all that needs to be done is to return.
Example:
Convert the following code:
unsigned fact(unsigned x)
{
if(x == 1)
return 1;
//--> Not tail recursive, multiply operation after the recursive call
return fact(x-1) * x;
}
To this:
unsigned fact(unsigned x)
{
return tail_fact(x, 1);
}
unsigned tail_fact(unsigned x, unsigned cur_val)
{
if(x == 1)
return cur_val;
return tail_fact(x-1, x * cur_val);
}
There is no stack size complier option for gcc under Linux. However this text discusses how to set the stack size on Linux. using the ulimit command.
You have three options:
Rewrite the program to make it non-recursive. This is the best, but not always possible.
Rewrite the program to use a separate stack for storing the execution state. This way you preserve the recursive nature but no longer use the system stack for storing the recursion algorithm state data.
Tweak the environment to postpone the inevitable. Visual C++ has a linker setting for the stack size. Almost sure gcc has one too.
Although other answers talk about how to either avoid recursion altogether, or how to use tail recursion, or how to simply set a larger stack size, I think for completeness that it's worthwhile to consider memory usage patterns (to answer "how to allow more memory ... on lots of recursion").
Out of habit, many programmers will allocate buffers inside the recursive function, and reallocate new buffers when the function is called recursively:
int recursive_function(int x)
{
if (1 == x)
return 1;
int scratchpad[100];
... // use scratchpad
return recursive_function(x-1) + scratchpad[x-1];
}
Since this is a throwaway sample, I won't bother worrying about invalid input (negative values, values larger than 100) and I will assume somebody asking a question about programming either knows how to do that or is smart enough to find out.
The important point here is that scratchpad takes up 400 bytes (on a 32 bit machine, 800 bytes on a 64 bit machine) of the stack each and every time recursive_function() is called, so if recursive_function() is called recursively 100 times, then 40,000 bytes (or 80,000 bytes on a 64 bit machine) of stack space are being used for buffers, and it's very likely you can modify the function to reuse the same buffer on each call:
int recursive_function(int x, int* buffer, int buffer_length)
{
if (1 == x)
return 1;
... // use buffer (and buffer_length to avoid overflow)
int temp_value = buffer[x-1];
return recursive_function(x-1, buffer, buffer_length) + temp_value;
}
Of course you could instead use a std::vector, which handles some details for you to protect you against memory leaks and buffer overruns (and, for the record, keeps the data on the heap [see footnote], meaning it will likely use less stack space).
40k or even 80k may not seem like much but things can add up. If the function doesn't have many other stack-allocated variables then this can dominate stack space (that is, if it weren't for the extra space the buffers take up you may be able to call the function far more times).
This may seem obvious, but it does come up, even in nonrecursive functions. Additionally, buffers aren't always obvious as arrays. They may be strings or objects, for instance.
Footnote: STL containers, such as arrays don't necessarily put all their data on the heap. They actually take a template argument to specify the memory allocation used; it's just that the allocator they use by default puts the data on the heap. Obviously unless you specify an allocator that somehow puts the data on the stack, the end result will be the same: using STL containers will probably use less memory than using stack allocated arrays or objects.
I say "probably" because although the data is kept on the heap (or somewhere else), the container can only access that data through pointers it keeps internally, if the container is on the stack then those pointers will reside on the stack, and those pointers take up space. So a one or two element std::vector may actually take up more space on the stack than the corresponding array.
Take a look at setrlimit():
RLIMIT_STACK
This is the maximum size of the initial thread's stack, in bytes. The implementation does not automatically grow the stack beyond this limit. If this limit is exceeded, SIGSEGV shall be generated for the thread. If the thread is blocking SIGSEGV, or the process is ignoring or catching SIGSEGV and has not made arrangements to use an alternate stack, the disposition of SIGSEGV shall be set to SIG_DFL before it is generated.