Classical Stack Overflowing?

Classical Stack Overflowing? - c

I am new to this forum and I am still an amateur in programming languages so please be kind with any of my silly mistakes :p
I am programming a recursive function which builds a kd-tree for a search process. I am using the c language on Visual Studio '08. After some seconds of processing, the program execution halts due to an error namely:
Unhandled exception at 0x77063de7 in run_FAST_corner_detection.exe:
0xC00000FD: Stack overflow
Now when the code breaks, there is a green arrow just near the instruction:
kd_node = malloc(sizeof(struct kd_node));
//this function allocates a pointer to a reserved memory of size struct kd_node.
Is this the classical problem of running out of memory?
How can I monitor the stack memory? (I know that this question has been asked repeatedly but honestly I have yet found no good method to do this).

Well, the stack overflow might be due to you calling malloc while deep in the recursion. The call to malloc pushes the return address and perhaps even parameters on the stack and this might be the thing which causes the stack to overflow. Don't do recursions in your code - try to make the code iterative (with loops instead). This is especially true for when the recursion is not bounded.

To monitor stack usage, simply take the address of any local variable (one defined within a function), and compare it against the address of a local variable in your main function (or thread entry function):
int stack_bottom;
int stack-usage () {
int top = 0;
/* Note, stack grows downward through memory, so high - low is .. */
return stack_bottom - (int)&top;
}
....
int main () {
int bottom = 0;
stack_bottom = (int)&bottom;
....
}
To reduce stack usage, either limit recursion, or avoid using large local variables (such as structs, arrays) and don't use alloca. You can replace large local variable with pointers to dynamically allocated heap memory (but don't forget to free it!)

Related

How to measure the amount of stack an arbitrary function call uses in C?

Our company bought a proprietary C function: we have a compiled library ProcessData.a and an interface file to call it:
# ProcessData.h
void ProcessData(char* pointer_to_data, int data_len);
We want to use this function on an ARM embedded CPU and we want to know how much stack space it might use.
Question: how to measure the stack usage of an arbitrary function?
What I tried so far is to implement the following helper functions:
static int* stackPointerBeforeCall;
void StartStackMeasurement(void) {
asm ("mov %0, sp" : "=r"(stackPointerBeforeCall));
// For some reason I can't overwrite values immediately below the
// stack pointer. I suspect a return address is placed there.
static int* pointer;
pointer = stackPointerBeforeCall - 4;
// Filling all unused stack space with a fixed constant
while (pointer != &_sstack) {
*pointer = 0xEEEEEEEE;
pointer--;
}
*pointer = 0xEEEEEEEE;
}
void FinishStackMeasurement(void) {
int* lastUnusedAddress = &_sstack;
while (*lastUnusedAddress == 0xEEEEEEEE) {
lastUnusedAddress++;
}
// Printing how many stack bytes a function has used
printf("STACK: %d\n", (stackPointerBeforeCall-lastUnusedAddress)*sizeof(int));
}
And then use them just before and after the function call:
StartStackMeasurement();
ProcessData(array, sizeof(array));
FinishStackMeasurement();
But this seems like a dangerous hack - especially the part where I am subtracting 4 from the stackPointerBeforeCall and overwriting everything below. Is there a better way?

Compile the program and analyze the assembly or machine code for the function in question.  Many functions use the stack in a static manner, and this static size can be reasoned by analysis of the compiled code.  Some functions dynamically allocate stack space based on some computation, usually associated with some input parameter.  In those cases, you'll see different instructions being used to allocate stack space, and will have to work back to reason how the dynamic stack size might be derived.
Of course, this analysis would have to be redone with updates to the function (library).

You can use getrusage which is a function that gets you the resource usage of your software, in particular ru_isrss which is
An integral value expressed the same way, which is the amount of unshared memory used for stack space
(source)
You can then compare it to the stack usage of your program with a mocked call to the library.
However, this will only work if your system has implemented ru_isrss (unlike linux), otherwise the field will be set to 0.

Move current stack frame in C

I was wondering if there would be a convenient way to copy the current stack frame, move it somewhere else, and then 'return' from the function, from the new location?
I have been playing around with setjmp and longjmp while allocating large arrays on the stack to force the stack pointer away. I am familiar with the calling conventions and where arguments to functions end up etc, but I am not extremely experienced with pointer arithmetic.
To describe the end goal in general terms; The ambition is to be able to allocate stack frames and to jump to another stack frame when I call a function (we can call this function switch). Before I jump to the new stack frame, however, I'd like to be able to grab the return address from switch so when I've (presumably) longjmpd to the new frame, I'd be able to return to the position that initiated the context switch.
I've already gotten some inspiration of how to imitate coroutines using longjmp an setjmp from this post.
If this is possible, it would be a component of my current research, where I am trying to implement a (very rough) proof of concept extension in a compiler. I'd appreciate answers and comments that address the question posed in my first paragraph, only.
Update
To try and make my intention clearer, I wrote up this example in C. It needs to be compiled with -fno-stack-protector. What i want is for the local variables a and b in main to not be next to each other on the stack (1), but rather be separated by a distance specified by the buffer in call. Furthermore, currently this code will return to main twice, while I only want it to do so once (2). I suggest you read the procedures in this order: main, call and change.
If anyone could answer any of the two question posed in the paragraph above, I would be immensely grateful. It does not have to be pretty or portable.
Again, I'd prefer answers to my questions rather than suggestions of better ways to go about things.
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
jmp_buf* buf;
long* retaddr;
int change(void) {
// local variable to use when computing offsets
long a[0];
for(int i = 0; i < 5; i++) a[i]; // same as below, not sure why I need to read this
// save this context
if(setjmp(*buf) == 0) {
return 1;
}
// the following code runs when longjmp was called with *buf
// overwrite this contexts return address with the one used by call
a[2] = *retaddr;
// return, hopefully now to main
return 1;
}
static void* retain;
int call() {
buf = (jmp_buf*)malloc(sizeof(jmp_buf));
retaddr = (long*) malloc(sizeof(long));
long a[0];
for(int i = 0; i < 5; i++) a[i]; // not sure why I need to do this. a[2] reads (nil) otherwise
// store return address
*retaddr = a[2];
// allocate local variables to move the stackpointer
char n[1024];
retain = n; // maybe cheat the optimiser?
// get a jmp_buf from another context
change();
// jump there
longjmp(*buf, 1);
}
// It returns to main twice, I am not sure why
int main(void) {
char a;
call(); // this function should move stackpointer (in this case, 1024 bytes)
char b;
printf("address of a: %p\n", &a);
printf("address of b: %p\n", &b);
return 1;
}

This is possible, it is what multi-tasking schedulers do, e.g. in embedded environments.
It is however extremely environment-specific and would have to dig into the the specifics of the processor it is running on.
Basically, the possible steps are:
Determine the registers which contain the needed information. Pick them by what you need, they are probably different from what the compiler uses on the stack for implementing function calls.
Find out how their content can be stored (most likely specific assembler instructions for each register).
Use them to store all contents contiguosly.
The place to do so is probably allocated already, inside the object describing and administrating the current task.
Consider not using a return address. Instead, when done with the "inserted" task, decide among the multiple task datasets which describe potential tasks to return to. That is the core of scheduling. If the return address is known in advance, then it is very similar to normal function calling. I.e. the idea is to potentially return to a different task than the last one left. That is also the reason why tasks need their own stack in many cases.
By the way, I don't think that pointer arithmetic is the most relevant tool here.
The content of the registers which make the stack frame are in registers, not anywhere in memory which a pointer can point to. (At least in most current systems, C64 staying out of this....).

tl;dr - no.
(On every compiler worth considering): The compiler knows the address of local variables by their offset from either the sp, or a designated saved stack pointer, the frame or base pointer. a might have an address of (sp+1), and b might have an address of (sp+0). If you manage to successfully return to main with the stack pointer lowered by 1024; these will still be known as (sp+1), (sp+0); although they are technically now (sp+1-1024), (sp+0-1024), which means they are no longer a & b.
You could design a language which fixed the local allocation in the way you consider, and that might have some interesting expressiveness, but it isn't C. I doubt any existing compiler could come up with a consistent handling of this. To do so, when it encountered:
char a;
it would have to make an alias of this address at the point it encountered it; say:
add %sp, $0, %r1
sub %sp, $1, %sp
and when it encountered
char b;
add %sp, $0, %r2
sub %sp, $1, %sp
and so on, but one it runs out of free registers, it needs to spill them on the stack; and because it considers the stack to change without notice, it would have to allocate a pointer to this spill area, and keep that stored in a register.
Btw, this is not far removed from the concept of a splayed stack (golang uses these), but generally the granularity is at a function or method boundary, not between two variable definitions.
Interesting idea though.

Can alloca() memory be reallocated?

Memory allocated by malloc can be reallocated with realloc. Is there a similar function for alloca? Reallocating stack memory could be useful when you don't want memory to be allocated on the heap, and you need to allocate variable stack memory multiple times, for example in a library function, where you need dynamic memory, but don't want to allocate on the heap, because the user of the library might use a custom heap allocation strategy. It would look like this:
int main(void) {
float * some_mem = alloca(40 * sizeof(float));
// do something with this memory...
// now we need a different amount of memory, but some_mem still occupies a lot of the stack, so just reallocate it.
// is something like this possible?
some_mem = realloca(some_mem, 50 * sizeof(float));
}
The important thing is that this all happens on the stack. Q: is there a way to reallocate dynamic stack memory?

No: that wouldn't work with a stack as commonly implemented. A variable on the stack occupies a fixed range of addresses. The next variable comes immediately after it, so there's no room to grow. Consider a function like this:
void f(int x) {
int i;
float *a = alloca(40 * sizeof(float));
int k;
…
}
The stack after the function prologue looks something like this:
----------------+-----+-----+-----+-------------------+-----+---------------------
... | ret | x | i | a | k | ...
----------------+-----+-----+-----+-------------------+-----+---------------------
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
previous frames f's frame free space at the top
There's no room to grow a.
I'm showing a highly simplified example: in the real world, variables end up in registers, variables can be reordered even if they do end up on the stack, etc. But only one variable can be the last one on the stack with room to grow.
So if realloca existed, it could only be applied to the variable that's at the top of the stack. (Or else it would have to move everything else that's on top of it, but that would require updating all existing pointers to those, which is not possible in general.) This would be a very limited mechanism, so support for this feature would have a very small benefit. Supporting it would have a significant cost, because compilers are normally free to put things on the stack in the order they want: this feature would require a new mechanism to let the compiler know that one specific variable must go to the top.
It's possible that some C implementation somewhere has realloca, but it's unlikely given the cost/benefit ratio.
Of course realloca can easily be implemented if alloca does not use a stack allocation strategy. But allocating on the stack is the whole point of alloca. If you want resizable objects, you need a memory management structure with a heap interface, and that's what malloc is for.
As a practical matter, there are several possible approaches to dynamic memory management in a library.
The most common approach is to call malloc, realloc and free when you need them. That's what they're for.
In some environments, it's useful to support custom allocators. You can give the user of the library the option to pass pointers to alternative implementations of malloc, realloc and free. It's useful when you want to write a portable library that needs to be used by code that is itself fully portable. Most of the time, though, users who want to use custom allocators can do it by linking their own malloc and friends. And even that is rarely useful.
If you need code that can work in an environment without dynamic allocation (such as safety-critical environments), then you should not use alloca either. alloca is worse than malloc because it causes unpredictable stack usage and can lead to a stack overflow which won't be detected at all, or which will only be detected by a program crash. If you need a variable (or large) amount of temporary memory in a function, have the user pass a suitably-sized buffer to you.
/** [documentation of the function] …
* working_buffer must point to an array of floats of 3*n elements.
*/
void f(size_t n, float *working_buffer);
Better, if you have the code size budget, pass the array size and verify it.
/** [documentation of the function] …
* working_buffer must point to an array of floats of 3*n elements.
*/
int f(size_t n, float *working_buffer, size_t working_buffer_length)
{
if (working_buffer_length < 3 * n) return -EINVAL;
…
}

The accepted answer has correctly pointed out that usually there is not enough benefit from realloca because allocations are difficult to "grow".
Another issue I see is that these allocations have a life time till the end of the function. What happens when you pass this pointer to another function and call realloca on it there? This function would not be able to change the stack frame of a function deeper on the stack. It also cannot reallocate it in its own frame because the object would be destroyed when it returns, whereas the original object would still have to be alive.
This problem is not there for malloc/realloc because the heap has a global lifetime.
One could argue that the semantics can be defined in such way that a function can be realloced only in the function it was alloc'd in. This greatly reduces the use such a function would have.

How to check Stack Usage when Calculating Ackermann

I'm learning about my system's ability to calculate Ackermann's algorithm both the two and three parameter version. For very small values of m and n, my system will calculate and print results returning from A0 and A1 method calls. However anything higher than 3 or 4 does not return and freezes the terminal I'm using atm. My problem is that I do determine for what values of m and n my machine can compute.
I have tried a few things to catch a stack overflow, for all i know c++ doesn't have a stackoverflowexception I can catch. try-catch blocks don't work. In the below code, I use getrlimit() to find the stack limit, create a address location in main gStackRef. I call checkStack recursively checking the local variable pointer to gStackLimit.
Is there a better way of checking my stack usage in relation to recursive methods? Also I do i check for segment faults? I'll let you know I'm running on a unix terminal.
#include <cstdlib>
#include <iostream>
#define _XOPEN_SOURCE_EXTENDED 1
#include <sys/resource.h>
int getrlimit(int resource, struct rlimit *rlp);
using namespace std;
int * gStackRef;
int gStackLimit;
void checkStack(void);
int main(int argc, char *argv[])
{
int temp = 0;
gStackRef = &temp;
rlimit myl;
getrlimit(RLIMIT_STACK, &myl);
gStackLimit = (myl.rlim_cur / 3 * 8 / 10) ;/* modified for segment fault */
cout << gStackLimit << "\n";
checkStack();
}
void checkStack()
{
int temp = 0;
int* pVariableHere = &temp;
size_t stackUsage = gStackRef - pVariableHere;
printf("Stack usage: %d / %d \n", stackUsage, gStackLimit);
if(stackUsage > gStackLimit) return;
else checkStack();
}

However anything higher than 3 or 4 does not return and freezes the terminal I'm using atm.
That's kind of the point of the Ackermann function. It grows extremely rapidly. For m >= 4 and n >= 3, if you're calculating A(m, n) recursively, I doubt your function will return before you're dead.
I have tried a few things to catch a stack overflow, for all i know c++ doesn't have a stackoverflowexception I can catch.
Of course not. The process is out of stack space. It should be torn down immediately.
Is there a better way of checking my stack usage in relation to recursive methods?
If you have to use recursion, do it manually by creating your own stack data structure that is allocated on the heap instead of in the stack space. Use that to keep track of where you are in the recursion. Push and pop and as you recurse, instead of recursing by nested method calls.
But at the end, you shouldn't be using recursion to calculate Ackermann anyway.

I have tried a few things to catch a stack overflow, for all i know c++ doesn't have a stackoverflowexception I can catch. try-catch blocks don't work. In the below code, I use getrlimit() to find the stack limit, create a address location in main gStackRef. I call checkStack recursively checking the local variable pointer to gStackLimit.
POSIX does not have a "safe" way of detecting a stack overflow. Stack Overflows result in SIGSEGV signals, which you (generally) should not catch because they also are indicative of general segmentation faults, which should crash your program. Windows environments can deal with stack overflows safely, using EXCEPTION_STACK_OVERFLOW -- but in such cases what Windows is doing is merely putting a guard page at the end of the stack and notifying with SEH. If you use up the guard page (after ignoring the SEH exception), then your program gets terminated (just as it would in POSIX-land).
Is there a better way of checking my stack usage in relation to recursive methods? Also I do i check for segment faults? I'll let you know I'm running on a unix terminal.
No. Even what you're doing has undefined behavior. On some machines the stack grows up. On some machines the stack grows down. The compiler may insert any amount of slop space in between two methods. Technically, the compiler could implement things such that there were two separate stacks, located in two completely different memory segments, and still be conformant.
If you want to calculate Ackermann in a stack safe manner, either use an explicit stack structure allocated from the heap, or use dynamic programming.

Determining Stack Space with Visual Studio

I'm programming in C in Visual Studio 2005. I have a multi-threaded program, but that's not especially important here.
How can I determine (approximately) how much stack space my threads use?
The technique I was planning to use is setting the stack memory to some predetermined value, say 0xDEADBEEF, running the program for a long time, pausing the program, and investigating the stack.
How do I read and write stack memory with Visual Studio?
EDIT: See, for example, "How to determine maximum stack usage." That question talks about an embedded system, but here I'm trying to determine the answer on a regular PC.

Windows does not commit the stack memory immediately; instead, it reserves the address space for it, and commits it page-by-page when it is accessed. Read this page for more info.
As a result, stack address space consists of three contiguous regions:
Reserved but uncommitted memory which can be used for stack growth (but was never accessed yet);
Guard page, which was never accessed yet too, and serves to trigger stack growth when accessed;
Committed memory, i.e. stack memory which was ever accessed by the thread.
This allows us to construct a function that obtains stack size (with page size granularity):
static size_t GetStackUsage()
{
MEMORY_BASIC_INFORMATION mbi;
VirtualQuery(&mbi, &mbi, sizeof(mbi));
// now mbi.AllocationBase = reserved stack memory base address
VirtualQuery(mbi.AllocationBase, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe reserved (uncommitted) portion of the stack
// skip it
VirtualQuery((char*)mbi.BaseAddress + mbi.RegionSize, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe the guard page
// skip it
VirtualQuery((char*)mbi.BaseAddress + mbi.RegionSize, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe the committed (i.e. accessed) portion of the stack
return mbi.RegionSize;
}
One thing to consider: CreateThread allows to specify initial stack commit size (via dwStackSize parameter, when STACK_SIZE_PARAM_IS_A_RESERVATION flag is not set). If this parameter is nonzero, our function will return correct value only when stack usage becomes greater than dwStackSize value.

You can make use of information in the Win32 Thread Information Block
When you want in a thread to find out how much stack space it uses you can do something like this:
#include <windows.h>
#include <winnt.h>
#include <intrin.h>
inline NT_TIB* getTib()
{
return (NT_TIB*)__readfsdword( 0x18 );
}
inline size_t get_allocated_stack_size()
{
return (size_t)getTib()->StackBase - (size_t)getTib()->StackLimit;
}
void somewhere_in_your_thread()
{
// ...
size_t sp_value = 0;
_asm { mov [sp_value], esp }
size_t used_stack_size = (size_t)getTib()->StackBase - sp_value;
printf("Number of bytes on stack used by this thread: %u\n",
used_stack_size);
printf("Number of allocated bytes on stack for this thread : %u\n",
get_allocated_stack_size());
// ...
}

The stack doesn't work the way you expect it too. The stack is a linear sequence of pages, the last (top) one of which is marked with a page guard bit. When this page is touched, the guard bit is removed, and the page can be used. For further growth, a new guard page is allocated.
Hence, the answer you want is where the gaurd page is allocated. But the technique you propose would touch the page in question, and as a result it would invalidate the very thing you're trying to measure.
The non-invasive way to determine if a (stack) page has the guard bit is via VirtualQuery().

You can use GetThreadContext() function to determine thread's current stack pointer. Then use VirtualQuery() to find stack base for this pointer. Substracting those two pointers will give you stack size for given thread.