Determining Stack Space with Visual Studio

Determining Stack Space with Visual Studio - c

I'm programming in C in Visual Studio 2005. I have a multi-threaded program, but that's not especially important here.
How can I determine (approximately) how much stack space my threads use?
The technique I was planning to use is setting the stack memory to some predetermined value, say 0xDEADBEEF, running the program for a long time, pausing the program, and investigating the stack.
How do I read and write stack memory with Visual Studio?
EDIT: See, for example, "How to determine maximum stack usage." That question talks about an embedded system, but here I'm trying to determine the answer on a regular PC.

Windows does not commit the stack memory immediately; instead, it reserves the address space for it, and commits it page-by-page when it is accessed. Read this page for more info.
As a result, stack address space consists of three contiguous regions:
Reserved but uncommitted memory which can be used for stack growth (but was never accessed yet);
Guard page, which was never accessed yet too, and serves to trigger stack growth when accessed;
Committed memory, i.e. stack memory which was ever accessed by the thread.
This allows us to construct a function that obtains stack size (with page size granularity):
static size_t GetStackUsage()
{
MEMORY_BASIC_INFORMATION mbi;
VirtualQuery(&mbi, &mbi, sizeof(mbi));
// now mbi.AllocationBase = reserved stack memory base address
VirtualQuery(mbi.AllocationBase, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe reserved (uncommitted) portion of the stack
// skip it
VirtualQuery((char*)mbi.BaseAddress + mbi.RegionSize, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe the guard page
// skip it
VirtualQuery((char*)mbi.BaseAddress + mbi.RegionSize, &mbi, sizeof(mbi));
// now (mbi.BaseAddress, mbi.RegionSize) describe the committed (i.e. accessed) portion of the stack
return mbi.RegionSize;
}
One thing to consider: CreateThread allows to specify initial stack commit size (via dwStackSize parameter, when STACK_SIZE_PARAM_IS_A_RESERVATION flag is not set). If this parameter is nonzero, our function will return correct value only when stack usage becomes greater than dwStackSize value.

You can make use of information in the Win32 Thread Information Block
When you want in a thread to find out how much stack space it uses you can do something like this:
#include <windows.h>
#include <winnt.h>
#include <intrin.h>
inline NT_TIB* getTib()
{
return (NT_TIB*)__readfsdword( 0x18 );
}
inline size_t get_allocated_stack_size()
{
return (size_t)getTib()->StackBase - (size_t)getTib()->StackLimit;
}
void somewhere_in_your_thread()
{
// ...
size_t sp_value = 0;
_asm { mov [sp_value], esp }
size_t used_stack_size = (size_t)getTib()->StackBase - sp_value;
printf("Number of bytes on stack used by this thread: %u\n",
used_stack_size);
printf("Number of allocated bytes on stack for this thread : %u\n",
get_allocated_stack_size());
// ...
}

The stack doesn't work the way you expect it too. The stack is a linear sequence of pages, the last (top) one of which is marked with a page guard bit. When this page is touched, the guard bit is removed, and the page can be used. For further growth, a new guard page is allocated.
Hence, the answer you want is where the gaurd page is allocated. But the technique you propose would touch the page in question, and as a result it would invalidate the very thing you're trying to measure.
The non-invasive way to determine if a (stack) page has the guard bit is via VirtualQuery().

You can use GetThreadContext() function to determine thread's current stack pointer. Then use VirtualQuery() to find stack base for this pointer. Substracting those two pointers will give you stack size for given thread.

Related

How to measure the amount of stack an arbitrary function call uses in C?

Our company bought a proprietary C function: we have a compiled library ProcessData.a and an interface file to call it:
# ProcessData.h
void ProcessData(char* pointer_to_data, int data_len);
We want to use this function on an ARM embedded CPU and we want to know how much stack space it might use.
Question: how to measure the stack usage of an arbitrary function?
What I tried so far is to implement the following helper functions:
static int* stackPointerBeforeCall;
void StartStackMeasurement(void) {
asm ("mov %0, sp" : "=r"(stackPointerBeforeCall));
// For some reason I can't overwrite values immediately below the
// stack pointer. I suspect a return address is placed there.
static int* pointer;
pointer = stackPointerBeforeCall - 4;
// Filling all unused stack space with a fixed constant
while (pointer != &_sstack) {
*pointer = 0xEEEEEEEE;
pointer--;
}
*pointer = 0xEEEEEEEE;
}
void FinishStackMeasurement(void) {
int* lastUnusedAddress = &_sstack;
while (*lastUnusedAddress == 0xEEEEEEEE) {
lastUnusedAddress++;
}
// Printing how many stack bytes a function has used
printf("STACK: %d\n", (stackPointerBeforeCall-lastUnusedAddress)*sizeof(int));
}
And then use them just before and after the function call:
StartStackMeasurement();
ProcessData(array, sizeof(array));
FinishStackMeasurement();
But this seems like a dangerous hack - especially the part where I am subtracting 4 from the stackPointerBeforeCall and overwriting everything below. Is there a better way?

Compile the program and analyze the assembly or machine code for the function in question.  Many functions use the stack in a static manner, and this static size can be reasoned by analysis of the compiled code.  Some functions dynamically allocate stack space based on some computation, usually associated with some input parameter.  In those cases, you'll see different instructions being used to allocate stack space, and will have to work back to reason how the dynamic stack size might be derived.
Of course, this analysis would have to be redone with updates to the function (library).

You can use getrusage which is a function that gets you the resource usage of your software, in particular ru_isrss which is
An integral value expressed the same way, which is the amount of unshared memory used for stack space
(source)
You can then compare it to the stack usage of your program with a mocked call to the library.
However, this will only work if your system has implemented ru_isrss (unlike linux), otherwise the field will be set to 0.

Move current stack frame in C

I was wondering if there would be a convenient way to copy the current stack frame, move it somewhere else, and then 'return' from the function, from the new location?
I have been playing around with setjmp and longjmp while allocating large arrays on the stack to force the stack pointer away. I am familiar with the calling conventions and where arguments to functions end up etc, but I am not extremely experienced with pointer arithmetic.
To describe the end goal in general terms; The ambition is to be able to allocate stack frames and to jump to another stack frame when I call a function (we can call this function switch). Before I jump to the new stack frame, however, I'd like to be able to grab the return address from switch so when I've (presumably) longjmpd to the new frame, I'd be able to return to the position that initiated the context switch.
I've already gotten some inspiration of how to imitate coroutines using longjmp an setjmp from this post.
If this is possible, it would be a component of my current research, where I am trying to implement a (very rough) proof of concept extension in a compiler. I'd appreciate answers and comments that address the question posed in my first paragraph, only.
Update
To try and make my intention clearer, I wrote up this example in C. It needs to be compiled with -fno-stack-protector. What i want is for the local variables a and b in main to not be next to each other on the stack (1), but rather be separated by a distance specified by the buffer in call. Furthermore, currently this code will return to main twice, while I only want it to do so once (2). I suggest you read the procedures in this order: main, call and change.
If anyone could answer any of the two question posed in the paragraph above, I would be immensely grateful. It does not have to be pretty or portable.
Again, I'd prefer answers to my questions rather than suggestions of better ways to go about things.
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
jmp_buf* buf;
long* retaddr;
int change(void) {
// local variable to use when computing offsets
long a[0];
for(int i = 0; i < 5; i++) a[i]; // same as below, not sure why I need to read this
// save this context
if(setjmp(*buf) == 0) {
return 1;
}
// the following code runs when longjmp was called with *buf
// overwrite this contexts return address with the one used by call
a[2] = *retaddr;
// return, hopefully now to main
return 1;
}
static void* retain;
int call() {
buf = (jmp_buf*)malloc(sizeof(jmp_buf));
retaddr = (long*) malloc(sizeof(long));
long a[0];
for(int i = 0; i < 5; i++) a[i]; // not sure why I need to do this. a[2] reads (nil) otherwise
// store return address
*retaddr = a[2];
// allocate local variables to move the stackpointer
char n[1024];
retain = n; // maybe cheat the optimiser?
// get a jmp_buf from another context
change();
// jump there
longjmp(*buf, 1);
}
// It returns to main twice, I am not sure why
int main(void) {
char a;
call(); // this function should move stackpointer (in this case, 1024 bytes)
char b;
printf("address of a: %p\n", &a);
printf("address of b: %p\n", &b);
return 1;
}

This is possible, it is what multi-tasking schedulers do, e.g. in embedded environments.
It is however extremely environment-specific and would have to dig into the the specifics of the processor it is running on.
Basically, the possible steps are:
Determine the registers which contain the needed information. Pick them by what you need, they are probably different from what the compiler uses on the stack for implementing function calls.
Find out how their content can be stored (most likely specific assembler instructions for each register).
Use them to store all contents contiguosly.
The place to do so is probably allocated already, inside the object describing and administrating the current task.
Consider not using a return address. Instead, when done with the "inserted" task, decide among the multiple task datasets which describe potential tasks to return to. That is the core of scheduling. If the return address is known in advance, then it is very similar to normal function calling. I.e. the idea is to potentially return to a different task than the last one left. That is also the reason why tasks need their own stack in many cases.
By the way, I don't think that pointer arithmetic is the most relevant tool here.
The content of the registers which make the stack frame are in registers, not anywhere in memory which a pointer can point to. (At least in most current systems, C64 staying out of this....).

tl;dr - no.
(On every compiler worth considering): The compiler knows the address of local variables by their offset from either the sp, or a designated saved stack pointer, the frame or base pointer. a might have an address of (sp+1), and b might have an address of (sp+0). If you manage to successfully return to main with the stack pointer lowered by 1024; these will still be known as (sp+1), (sp+0); although they are technically now (sp+1-1024), (sp+0-1024), which means they are no longer a & b.
You could design a language which fixed the local allocation in the way you consider, and that might have some interesting expressiveness, but it isn't C. I doubt any existing compiler could come up with a consistent handling of this. To do so, when it encountered:
char a;
it would have to make an alias of this address at the point it encountered it; say:
add %sp, $0, %r1
sub %sp, $1, %sp
and when it encountered
char b;
add %sp, $0, %r2
sub %sp, $1, %sp
and so on, but one it runs out of free registers, it needs to spill them on the stack; and because it considers the stack to change without notice, it would have to allocate a pointer to this spill area, and keep that stored in a register.
Btw, this is not far removed from the concept of a splayed stack (golang uses these), but generally the granularity is at a function or method boundary, not between two variable definitions.
Interesting idea though.

StackOverFlow Delay

While executing the following program in Visual Studio 2012 Console App:
#include <stdio.h>
int main() {
int integer1, integer2, sum;
char str[5];
scanf("%s",str); /* Try to enter 10 chars */
printf("%s\n",str);
printf( "Enter first integer\n" );
scanf( "%d", &integer1 );
printf( "Enter second integer\n" );
scanf( "%d", &integer2 );
sum = integer1 + integer2;
printf( "Sum = %d\n", sum );
return 0;
}
It throws an exception "StackOverFlow" and this is obvious because of the statement:
scanf("%s",str); /* Try to enter 10 chars */
My question is: Why does the program continue the execution (by printing the str string, asking for entering the 2 integers, sum them and print the result also) even though the exception should have happened earlier?

The stack grows down, from high addresses to lower addresses. A CPU register, the stack pointer keeps track of the top of stack - which is in reality at the lowest address, because the stack grows towards lower addresses. The compiler looks at your function (main in this case) and looks to see how much automatic storage it needs, that is, storage for local variables. It generates code to decrement that stack pointer by the amount of local storage needed by your function. When the function gets called the caller pushes on the stack the return address (decrementing the stack pointer) and then branches to the called function, which in turn decrements the stack pointer (creating a stack frame) to make room for local variables.
If a program overflows the local variables (as yours did) it is likely to trash the return address. Since the stack grew down towards lower addresses, writing beyond the stack frame (towards higher addresses) will overwrite older stack frames (your caller, and the caller's caller, etc).
Although main() is the first function to be called in your program, there is already an active stack frame, corresponding to main()'s caller, which is the runtime environment.
Any side effects of trashing the stack (like overwriting the return address) won't be noticed until your function, main() in this case, tries to return. What happens then is anyone's guess. If the return address was overwritten with a value that points to a location on the stack, the CPU will branch there, this is a classic exploit by malicious code that takes advantage of buffer overflows with buffers allocated on the stack.
These links are helpful in understanding stack based buffer overflow:
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow3.html
http://en.wikipedia.org/wiki/Format_string_attack
Recent microprocessors provide a security feature that prevents execution of data, the CPU would raise an exception as soon as your program attempts to return to a corrupted address that points to data (like the stack).
http://en.wikipedia.org/wiki/NX_bit

Because C doesn't check everything (anything?). Your long string has scribbled on the stack, which when the function returns the stack corruption is noticed.
It's worth noting that safe versions of scanf type functions should always be used.

In C, code can't throw exceptions. Also, scanf() doesn't check the stack.
What probably happens is that Visual Studio creates the environment for your program, including setting up the stack. While it does that, it fills the stack with a pattern.
When main() returns, the pattern is checked. Only at that time, the C runtime will notice that you trashed the stack.
Conclusion: Never use the unsafe versions of scanf() and sprintf(). The runtime might catch the error but it will do it too late and even when you get an error message, that won't help you one bit to find out when it happened.

Classical Stack Overflowing?

I am new to this forum and I am still an amateur in programming languages so please be kind with any of my silly mistakes :p
I am programming a recursive function which builds a kd-tree for a search process. I am using the c language on Visual Studio '08. After some seconds of processing, the program execution halts due to an error namely:
Unhandled exception at 0x77063de7 in run_FAST_corner_detection.exe:
0xC00000FD: Stack overflow
Now when the code breaks, there is a green arrow just near the instruction:
kd_node = malloc(sizeof(struct kd_node));
//this function allocates a pointer to a reserved memory of size struct kd_node.
Is this the classical problem of running out of memory?
How can I monitor the stack memory? (I know that this question has been asked repeatedly but honestly I have yet found no good method to do this).

Well, the stack overflow might be due to you calling malloc while deep in the recursion. The call to malloc pushes the return address and perhaps even parameters on the stack and this might be the thing which causes the stack to overflow. Don't do recursions in your code - try to make the code iterative (with loops instead). This is especially true for when the recursion is not bounded.

To monitor stack usage, simply take the address of any local variable (one defined within a function), and compare it against the address of a local variable in your main function (or thread entry function):
int stack_bottom;
int stack-usage () {
int top = 0;
/* Note, stack grows downward through memory, so high - low is .. */
return stack_bottom - (int)&top;
}
....
int main () {
int bottom = 0;
stack_bottom = (int)&bottom;
....
}
To reduce stack usage, either limit recursion, or avoid using large local variables (such as structs, arrays) and don't use alloca. You can replace large local variable with pointers to dynamically allocated heap memory (but don't forget to free it!)

What does the brk() system call do?

According to Linux programmers manual:
brk() and sbrk() change the location of the program break, which
defines the end of the process's data segment.
What does the data segment mean over here? Is it just the data segment or data, BSS, and heap combined?
According to wiki Data segment:
Sometimes the data, BSS, and heap areas are collectively referred to as the "data segment".
I see no reason for changing the size of just the data segment. If it is data, BSS and heap collectively then it makes sense as heap will get more space.
Which brings me to my second question. In all the articles I read so far, author says that heap grows upward and stack grows downward. But what they do not explain is what happens when heap occupies all the space between heap and stack?

In the diagram you posted, the "break"—the address manipulated by brk and sbrk—is the dotted line at the top of the heap.
The documentation you've read describes this as the end of the "data segment" because in traditional (pre-shared-libraries, pre-mmap) Unix the data segment was continuous with the heap; before program start, the kernel would load the "text" and "data" blocks into RAM starting at address zero (actually a little above address zero, so that the NULL pointer genuinely didn't point to anything) and set the break address to the end of the data segment. The first call to malloc would then use sbrk to move the break up and create the heap in between the top of the data segment and the new, higher break address, as shown in the diagram, and subsequent use of malloc would use it to make the heap bigger as necessary.
Meantime, the stack starts at the top of memory and grows down. The stack doesn't need explicit system calls to make it bigger; either it starts off with as much RAM allocated to it as it can ever have (this was the traditional approach) or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach). Either way, there may or may not be a "guard" region at the bottom of the address space that can be used for stack. If this region exists (all modern systems do this) it is permanently unmapped; if either the stack or the heap tries to grow into it, you get a segmentation fault. Traditionally, though, the kernel made no attempt to enforce a boundary; the stack could grow into the heap, or the heap could grow into the stack, and either way they would scribble over each other's data and the program would crash. If you were very lucky it would crash immediately.
I'm not sure where the number 512GB in this diagram comes from. It implies a 64-bit virtual address space, which is inconsistent with the very simple memory map you have there. A real 64-bit address space looks more like this:
Legend: t: text, d: data, b: BSS
This is not remotely to scale, and it shouldn't be interpreted as exactly how any given OS does stuff (after I drew it I discovered that Linux actually puts the executable much closer to address zero than I thought it did, and the shared libraries at surprisingly high addresses). The black regions of this diagram are unmapped -- any access causes an immediate segfault -- and they are gigantic relative to the gray areas. The light-gray regions are the program and its shared libraries (there can be dozens of shared libraries); each has an independent text and data segment (and "bss" segment, which also contains global data but is initialized to all-bits-zero rather than taking up space in the executable or library on disk). The heap is no longer necessarily continous with the executable's data segment -- I drew it that way, but it looks like Linux, at least, doesn't do that. The stack is no longer pegged to the top of the virtual address space, and the distance between the heap and the stack is so enormous that you don't have to worry about crossing it.
The break is still the upper limit of the heap. However, what I didn't show is that there could be dozens of independent allocations of memory off there in the black somewhere, made with mmap instead of brk. (The OS will try to keep these far away from the brk area so they don't collide.)

Minimal runnable example
What does brk( ) system call do?
Asks the kernel to let you you read and write to a contiguous chunk of memory called the heap.
If you don't ask, it might segfault you.
Without brk:
#define _GNU_SOURCE
#include <unistd.h>
int main(void) {
/* Get the first address beyond the end of the heap. */
void *b = sbrk(0);
int *p = (int *)b;
/* May segfault because it is outside of the heap. */
*p = 1;
return 0;
}
With brk:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
int main(void) {
void *b = sbrk(0);
int *p = (int *)b;
/* Move it 2 ints forward */
brk(p + 2);
/* Use the ints. */
*p = 1;
*(p + 1) = 2;
assert(*p == 1);
assert(*(p + 1) == 2);
/* Deallocate back. */
brk(b);
return 0;
}
GitHub upstream.
The above might not hit a new page and not segfault even without the brk, so here is a more aggressive version that allocates 16MiB and is very likely to segfault without the brk:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
int main(void) {
void *b;
char *p, *end;
b = sbrk(0);
p = (char *)b;
end = p + 0x1000000;
brk(end);
while (p < end) {
*(p++) = 1;
}
brk(b);
return 0;
}
Tested on Ubuntu 18.04.
Virtual address space visualization
Before brk:
+------+ <-- Heap Start == Heap End
After brk(p + 2):
+------+ <-- Heap Start + 2 * sizof(int) == Heap End
| |
| You can now write your ints
| in this memory area.
| |
+------+ <-- Heap Start
After brk(b):
+------+ <-- Heap Start == Heap End
To better understand address spaces, you should make yourself familiar with paging: How does x86 paging work?.
Why do we need both brk and sbrk?
brk could of course be implemented with sbrk + offset calculations, both exist just for convenience.
In the backend, the Linux kernel v5.0 has a single system call brk that is used to implement both: https://github.com/torvalds/linux/blob/v5.0/arch/x86/entry/syscalls/syscall_64.tbl#L23
12 common brk __x64_sys_brk
Is brk POSIX?
brk used to be POSIX, but it was removed in POSIX 2001, thus the need for _GNU_SOURCE to access the glibc wrapper.
The removal is likely due to the introduction mmap, which is a superset that allows multiple range to be allocated and more allocation options.
I think there is no valid case where you should to use brk instead of malloc or mmap nowadays.
brk vs malloc
brk is one old possibility of implementing malloc.
mmap is the newer stricly more powerful mechanism which likely all POSIX systems currently use to implement malloc. Here is a minimal runnable mmap memory allocation example.
Can I mix brk and malloc?
If your malloc is implemented with brk, I have no idea how that can possibly not blow up things, since brk only manages a single range of memory.
I could not however find anything about it on the glibc docs, e.g.:
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Resizing-the-Data-Segment
Things will likely just work there I suppose since mmap is likely used for malloc.
See also:
What's unsafe/legacy about brk/sbrk?
Why does calling sbrk(0) twice give a different value?
More info
Internally, the kernel decides if the process can have that much memory, and earmarks memory pages for that usage.
This explains how the stack compares to the heap: What is the function of the push / pop instructions used on registers in x86 assembly?

You can use brk and sbrk yourself to avoid the "malloc overhead" everyone's always complaining about. But you can't easily use this method in conjuction with malloc so it's only appropriate when you don't have to free anything. Because you can't. Also, you should avoid any library calls which may use malloc internally. Ie. strlen is probably safe, but fopen probably isn't.
Call sbrk just like you would call malloc. It returns a pointer to the current break and increments the break by that amount.
void *myallocate(int n){
return sbrk(n);
}
While you can't free individual allocations (because there's no malloc-overhead, remember), you can free the entire space by calling brk with the value returned by the first call to sbrk, thus rewinding the brk.
void *memorypool;
void initmemorypool(void){
memorypool = sbrk(0);
}
void resetmemorypool(void){
brk(memorypool);
}
You could even stack these regions, discarding the most recent region by rewinding the break to the region's start.
One more thing ...
sbrk is also useful in code golf because it's 2 characters shorter than malloc.

There is a special designated anonymous private memory mapping (traditionally located just beyond the data/bss, but modern Linux will actually adjust the location with ASLR). In principle it's no better than any other mapping you could create with mmap, but Linux has some optimizations that make it possible to expand the end of this mapping (using the brk syscall) upwards with reduced locking cost relative to what mmap or mremap would incur. This makes it attractive for malloc implementations to use when implementing the main heap.

malloc uses brk system call to allocate memory.
include
int main(void){
char *a = malloc(10);
return 0;
}
run this simple program with strace, it will call brk system.

I can answer your second question. Malloc will fail and return a null pointer. That's why you always check for a null pointer when dynamically allocating memory.

The heap is placed last in the program's data segment. brk() is used to change (expand) the size of the heap. When the heap cannot grow any more any malloc call will fail.

The data segment is the portion of memory that holds all your static data, read in from the executable at launch and usually zero-filled.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight