overcome stack randomization by brute force

overcome stack randomization by brute force - c

I'm reading a book that shows how buffer overflow attacks and one technique to stop it is called Stack Randomization, below is quoted from the book:
a persistent attacker can overcome randomization by brute force, repeatedly attempting attacks with different addresses. A common trick is to include a long sequence of nop (pronounced “no op,” short for “no operation”) instructions before the actual exploit code. Executing this instruction has no effect, other than incrementing the program counter to the next instruction. As long as the attacker can guess an address somewhere within this sequence, the program will run through the sequence and then hit the exploit code. If we set up a 256-byte(28) nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses
I understand the first part, but don't get the second part about enumerating starting addresses. For example, %rsp points to the current start address as picture below shows (only show 8 bytes instead of 256 bytes for simplicity)
I think what the author mean is, try and guess different address to the stack memory where the %rsp is pointing to. And the padding between return address and %rsp are all nop, and then overwrite the return address with the guessed address which is highly likely points to part of padding(nop). But since Stack Randomization allocats a random amount of space between 0 and n bytes on the stack at the start of a program, so we can only say the probability of successful attack is 215/223 = 0.78%, how can we say try 32,768(a fixed number) addresses then it will be cracked? it is the same as toss a coin, you can only say the probablity of getiing a head is 0.5, you cannot say you will get a head in the second round, becuase you might get two tails

Stack Randomization allocats a random amount of space between 0 and n bytes on the stack at the start of a program
No, it doesn't allocate. It randomizes where in virtual address space the stack is mapped, leaving other pages unmapped. This is with page granularity.
Guessing wrong will typically result in the victim program segfaulting (or whatever the OS does on an invalid page fault). This is noisy and obvious to any intrusion-detection. And if the program does eventually restart so you can try again, its stack address will be re-randomized, as you suggest.
Wrong guesses that land in valid memory but not your NOP sled will also typically crash soon, on an invalid instruction or something that decodes to an invalid memory access.
So yes, you're right, you can't just enumerate the address space, it's only a probabilistic thing. Trying enough times will mean it's likely you succeed, not guaranteed. And trying to enumerate the possible ASLR entropy isn't particularly useful, or any more likely to succeed sooner than guessing the same page every time I think.
But trying different offsets within a single page is useful because OS-driven stack ASLR only has page granularity.
There is some amount of stack space used by env vars and command line args which will vary from system to system, but tend to be constant across invocations of the same program, I think. Unless the vulnerable function is reached from different call chains, or there's a variable-sized stack allocation in a parent function, in which case the buffer's offset within page could vary every run.
Although of course most processes these days run with non-executable stacks, making direct code-injection impossible for stack buffers. A ROP attack (injecting a return address to an existing sequence of bytes that decode to something interesting) is possible if there are any known static data or code addresses. If not, you have to deal with ASLR of code/data, but you don't get to inject NOP sleds.

Related

buffer overflow exploit: Why does "jmp esp" need to be located in a DLL?

I am trying to understand classical buffer overflow exploits where an input buffer overwrites the stack, the function return address that is saved on the stack and upper memory regions (where you usually place the shell code).
There are many examples of this on the internet and I think I have understood this pretty well:
You put more data in some input buffer that the developer has made fixed size
Your input overwrites the function arguments and the return address to the calling function on the stack
That address is loaded into EIP when the OS tries to return from the function the overflow happened in and that is what allows you to get data you control into the EIP register (when not reading carefully some articles you get the impression you can overwrite CPU registers. Of course, this is not the case, you can only overwrite the stack but the CPU will load addresses from the stack into its registers)
If the exploit is well designed the value loaded into EIP will make the program jump to the beginning of the shell code (see point 5)
The next thing I think I have understood well is the "JMP ESP" mechanism. When replicating the crash in your lab environment you look for a place in memory that contains the "JMP ESP" instruction and you overwrite the EIP (now my wording is not precise, I know...) with that address. That address needs to be identical no matter how often you run this, at what time, which thread is executing your stuff etc., at least for the same OS version and the address must not contain any forbidden bytes. The code will jump to that address (at the same time the stack is reduced by 4 bytes) so "jmp esp" will jump to the next address in my overflow buffer after the place where I had put the value to overwrite EIP with and that is usually the place where the shellcode goes, maybe prepended with NOPs.
Now comes the question.
All articles I've read so far are looking for the address of the "JMP ESP" instructions in a DLL (which must not be relocatable, not compiled with ASLR, etc.). Why not looking in exe itself for a "jmp esp"? Why does it need to be in a DLL?
I have run the "!mona modules" command in Immunity Debugger and the only module displayed that satisfies all these conditions is the exe itself. When I look in popular exploit databases the address is always in loaded DLLs.
I cannot see any obvious reason for this. The exe can also be located at the same address in memory the same way a DLL can. What is the difference?

Found another resource on this:
As I wrote in a comment before, the addresses of the exe usually contain a zero:
http://resources.infosecinstitute.com/in-depth-seh-exploit-writing-tutorial-using-ollydbg/#commands
A module you can consider using is the main executable itself, which
is quite often not subject to any compiler based exploit protections,
especially when the application has been written by a third party
developer and not Microsoft. Using the main executable has one major
flaw however, in that it almost always starts with a zero byte. This
is a problem because the zero byte is a string terminator in C/C++,
and using zero bytes as part of strings used to overflow a buffer will
often result in the string being terminated at that point, potentially
preventing the buffer from being appropriately overflowed and breaking
the exploit.

In position independent code jump addresses are relative to the program-counter, while in non-relocatable code they are absolute. DLLs in Windows do not generally use position independent code. Exploits that rely on knowing the offset of the executable code in the binary require non-relocatable code.

short answer: the address DOES NOT need to be in a DLL.
long answer:
ANY mapped unprotected executable memory in your process will be executed if the Instruction register is set to its address.
MAPPED: means the memory is mapped to your process by the operating system, some addresses may not be mapped and any and all access will cause operating system memory fault signals to be raised.
EXECUTABLE: mapped memory often has permissions set to it, sometimes it is enough for the memory to be readable yet with newer processors with NX-bits it may be required to be mapped executable
UNPROTECTED: means that the memory is not mapped as a guard page. memory pages that are protected by guard pages will raise processor interrupt. the handling of these depends on your operating system and may be used to implement non-executable pages on processors that do not implement NX-bits.
if the memory in your executable file fulfils these demands it CAN be used for your exploit. whether you can FIND this address during runtime is another question and can be very difficult for real world exploits. sticking to non relocatable DLLs and EXEs is a good idea for beginners.
as for your comment:
I am looking again at my example and I think the answer is a lot easier than I originally thought. The exe is loaded at base address 0x004.... and goes up to 0x009.... This simply means that the every address will contain a 0x00 which is probably a show stopper for each C like program...
trying to overrun previous addresses with new addresses that contain '\0' characters may cause problems for exploits that use an overrun in string based functions (eg. strcpy or gets) to overflow the buffer because those stop on '\0' characters.
if you can overflow your buffer with logic that is not limited by '\0' characters (eg memcpy) you may use those addresses as well.

Why are stackoverflow errors chaotic?

This simple C program rarely terminates at the same call depth:
#include <stdio.h>
#include <stdlib.h>
void recursive(unsigned int rec);
int main(void)
{
recursive(1);
return 0;
}
void recursive(unsigned int rec) {
printf("%u\n", rec);
recursive(rec + 1);
}
What could be the reasons behind this chaotic behavior?
I am using fedora (16GiB ram, stack size of 8192), and compiled using cc without any options.
EDIT
I am aware that this program will throw a stackoverflow
I know that enabling some compiler optimizations will change the behavior and that the program will reach integer overflow.
I am aware that this is undefined behavior, the purpose of this question is to understand/get an overview of the implementation specific internal behaviors that might explain what we observe there.
The question is more, given that on Linux the thread stack size is fixed and given by ulimit -s, what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
EDIT 2
#BlueMoon always sees the same output on his CentOS, while on my Fedora, with a stack of 8M, I see different outputs (last printed integer 261892 or 261845, or 261826, or ...)

Change the printf call to:
printf("%u %p\n", rec, &rec);
This forces gcc to put rec on the stack and gives you its address which is a good indication of what's going on with the stack pointer.
Run your program a few times and note what's going on with the address that's being printed at the end. A few runs on my machine shows this:
261958 0x7fff82d2878c
261778 0x7fffc85f379c
261816 0x7fff4139c78c
261926 0x7fff192bb79c
First thing to note is that the stack address always ends in 78c or 79c. Why is that? We should crash when crossing a page boundary, pages are 0x1000 bytes long and each function eats 0x20 bytes of stack so the address should end with 00X or 01X. But looking at this closer, we crash in libc. So the stack overflow happens somewhere inside libc, from this we can conclude that calling printf and everything else it calls needs at least 0x78c = 1932 (possibly plus X*4096) bytes of stack to work.
The second question is why does it take a different number of iterations to reach the end of the stack? A hint is in the fact that the addresses we get are different on every run of the program.
1 0x7fff8c4c13ac
1 0x7fff0a88f33c
1 0x7fff8d02fc2c
1 0x7fffbc74fd9c
The position of the stack in memory is randomized. This is done to prevent a whole family of buffer overflow exploits. But since memory allocations, especially at this level, can only be done in multiple of pages (4096 bytes) all initial stack pointers would be aligned at 0x1000. This would reduce the number of random bits in the randomized stack address, so additional randomness is added by just wasting a random amount of bytes at the top of the stack.
The operating system can only account the amount of memory you use, including the limit on the stack, in whole pages. So even though the stack starts at a random address, the last accessible address on the stack will always be an address ending in 0xfff.
The short answer is: to increase the amount of randomness in the randomized memory layout a bunch of bytes on the top of the stack are deliberately wasted, but the end of the stack has to end on a page boundary.

You won't have the same behaviour between executions because it depends on the current memory available. The more memory you have available, the further you'll go in this recursive function.

Your program runs infinitely as there is no base condition in your recursive function. Stack will grow continuously by each function call and will result in stack overflow.
If it would be the case of tail-recursion optimization (with option -O2), then stack overflow will occur for sure. Its invoke undefined behavior.
what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
When stack overflow occurs it invokes undefined behavior. Nothing can be said about the result in this case.

Your recursive call is not necessarily going to cause undefined behaviour due to stackoverflow (but will due to integer overflow) in practice. An optimizing compiler could simply turn your compiler into an infinite "loop" with a jump instruction:
void recursive(int rec) {
loop:
printf("%i\n", rec);
rec++;
goto loop;
}
Note that this is going to cause undefined behaviour since it's going to overflow rec (signed int overflow is UB). For example, if rec is of an unsigned int, for example, then the code is valid and in theory, should run forever.

The above code can cause two issue:
Stack Overflow.
Integer overflow.
Stack Overflow: When a recursive function is called, all its variable is pushed onto the call stack including its return address. As there is no base condition which will terminate the recursion and the stack memory is limited, the stack will exhausted resulting Stack Overflow exception. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash.
Note that, every time a function exits/return, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables. But for recursive function, the return address are still on the stack until the recursion terminates. Moreover, automatic local variables are allocated as a single block and stack pointer advanced far enough to account for the sum of their sizes. You maybe interested at Recursive Stack in C.
Integer overflow: As every recursive call of recursive() increments rec by 1, there is a chance that Integer Overflow can occur. For that, you machine must have a huge stack memory as the range of unsigned integer is: 0 to 4,294,967,295. See reference here.

There is a gap between the stack segment and the heap segment. Now because the size of heap is variable( keeps on changing during execution), therefore the extent to which your stack will grow before stackoverflow occurs is also variable and this is the reason why your program rarely terminates at the same call depth.

When a process loads a program from an executable, typically it allocates areas of memory for the code, the stack, the heap, initialised and uninitialised data.
The stack space allocated is typically not that large, (10s of megabytes probably) and so you would imagine that physical RAM exhaustion would not be an issue on a modern system and the stack overflow would always happen at the same depth of recursion.
However, for security reasons, the stack isn't always in the same place. Address Space Layout Randomisation ensures that the base of the stack's location varies between invocations of the program. This means that the program may be able to do more (or fewer) recursions before the top of the stack hits something inaccessible like the program code.
That's my guess as to what is happening, anyway.

How Process Size is determined? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am very new to these concepts but I want to ask you all a question that is very basic I think, but I am confused, So I am asking it.
The question is...
How is the size of a process determined by the OS?
Let me clear it first, suppose that I have written a C program and I want to know that how much memory it is going to take, how can I determine it? secondly I know that there are many sections like code section, data section, BSS of a process. Now does the size of these are predetermined? secondly how the size of Stack and heap are determined. does the size of stack and heap also matters while the Total size of process is calculated.
Again we say that when we load the program , an address space is given to the process ( that is done by base and limit register and controlled by MMU, I guess) and when the process tries to access a memory location that is not in its address space we get segmentation fault. How is it possible for a process to access a memory that is not in its address space. According to my understanding when some buffer overflows happens then the address gets corrupted. Now when the process wants to access the corrupted location then we get the segmentation fault. Is there any other way of Address violation.
and thirdly why the stack grows downward and heap upwards.Is this process is same with all the OS. How does it affects the performance.why can't we have it in other way?
Please correct me, if I am wrong in any of the statement.
Thanks
Sohrab

When a process is started it gets his own virtual address space. The size of the virtual address space depends on your operating system. In general 32bit processes get 4 GiB (4 giga binary) addresses and 64bit processes get 18 EiB (18 exa binary) addresses.
You cannot in any way access anything that is not mapped into your virtual address space as by definition anything that is not mapped there does not have an address for you. You may try to access areas of your virtual address space that are currently not mapped to anything, in which case you get a segfault exception.
Not all of the address space is mapped to something at any given time. Also not all of it may be mapped at all (how much of it may be mapped depends on the processor and the operating system). On current generation intel processors up to 256 TiB of your address space may be mapped. Note that operating systems can limit that further. For example for 32 bit processes (having up to 4 GiB addresses) Windows by default reserves 2 GiB for the system and 2 GiB for the application (but there's a way to make it 1 GiB for the system and 3 GiB for the application).
How much of the address space is being used and how much is mapped changes while the application runs. Operating system specific tools will let you monitor what the currently allocated memory and virtual address space is for an application that is running.
Code section, data section, BSS etc. are terms that refer to different areas of the executable file created by the linker. In general code is separate from static immutable data which is separate from statically allocated but mutable data. Stack and heap are separate from all of the above. Their size is computed by the compiler and the linker. Note that each binary file has his own sections, so any dynamically linked libraries will be mapped in the address space separately each with it's own sections mapped somewhere. Heap and stack, however, are not part of the binary image, there generally is just one stack per process and one heap.
The size of the stack (at least the initial stack) is generally fixed. Compilers and/or linkers generally have some flags you can use to set the size of the stack that you want at runtime. Stacks generally "grow backward" because that's how the processor stack instructions work. Having stacks grow in one direction and the rest grow in the other makes it easier to organize memory in situations where you want both to be unbounded but do not know how much each can grow.
Heap, in general, refers to anything that is not pre-allocated when the process starts. At the lowest level there are several logical operations that relate to heap management (not all are implemented as I describe here in all operating systems).
While the address space is fixed, some OSs keep track of which parts of it are currently reclaimed by the process. Even if this is not the case, the process itself needs to keep track of it. So the lowest level operation is to actually decide that a certain region of the address space is going to be used.
The second low level operation is to instruct the OS to map that region to something. This in general can be
some memory that is not swappable
memory that is swappable and mapped to the system swap file
memory that is swappable and mapped to some other file
memory that is swappable and mapped to some other file in read only mode
the same mapping that another virtual address region is mapped to
the same mapping that another virtual address region is mapped to, but in read only mode
the same mapping that another virtual address region is mapped to, but in copy on write mode with the copied data mapped to the default swap file
There may be other combinations I forgot, but those are the main ones.
Of course the total space used really depends on how you define it. RAM currently used is different than address space currently mapped. But as I wrote above, operating system dependent tools should let you find out what is currently happening.

The sections are predetermined by the executable file.
Besides that one, there may be those of any dynamically linked libraries. While the code and constant data of a DLL is supposed to be shared across multiple processes using it and not be counted more than once, its process-specific non-constant data should be accounted for in every process.
Besides, there can be dynamically allocated memory in the process.
Further, if there are multiple threads in the process, each of them will have its own stack.
What's more, there are going to be per-thread, per-process and per-library data structures in the process itself and in the kernel on its behalf (thread-local storage, command line params, handles to various resources, structures for those resources as well and so on and so forth).
It's difficult to calculate the full process size exactly without knowing how everything is implemented. You might get a reasonable estimate, though.
W.r.t. According to my understanding when some buffer overflows happens then the address gets corrupted. It's not necessarily true. First of all, the address of what? It depends on what happens to be in the memory near the buffer. If there's an address, it can get overwritten during a buffer overflow. But if there's another buffer nearby that contains a picture of you, the pixels of the picture can get overwritten.
You can get segmentation or page faults when trying to access memory for which you don't have necessary permissions (e.g. the kernel portion that's mapped or otherwise present in the process address space). Or it can be a read-only location. Or the location can have no mapping to the physical memory.
It's hard to tell how the location and layout of the stack and heap are going to affect performance without knowing the performance of what we're talking about. You can speculate, but the speculations can turn out to be wrong.
Btw, you should really consider asking separate questions on SO for separate issues.

"How is it possible for a process to access a memory that is not in its address space?"
Given memory protection it's impossible. But it might be attempted. Consider random pointers or access beyond buffers. If you increment any pointer long enough, it almost certainly wanders into an unmapped address range. Simple example:
char *p = "some string";
while (*p++ != 256) /* Always true. Keeps incrementing p until segfault. */
;
Simple errors like this are not unheard of, to make an understatement.

I can answer to questions #2 and #3.
Answer #2
When in C you use pointers you are really using a numerical value that is interpreted as address to memory (logical address on modern OS, see footnotes). You can modify this address at your will. If the value points to an address that is not in your address space you have your segmentation fault.
Consider for instance this scenario: your OS gives to your process the address range from 0x01000 to 0x09000. Then
int * ptr = 0x01000;
printf("%d", ptr[0]); // * prints 4 bytes (sizeof(int) bytes) of your address space
int * ptr = 0x09100;
printf("%d", ptr[0]); // * You are accessing out of your space: segfault
Mostly the causes of segfault, as you pointed out, are the use of pointers to NULL (that is mostly 0x00 address, but implementation dependent) or the use of corrupted addresses.
Note that, on linux i386, base and limit register are not used as you may think. They are not per-process limits but they point to two kind of segments: user space or kernel space.
Answer #3
The stack growth is hardware dependent and not OS dependent. On i386 assembly instruction like push and pop make the stack grow downwards with regard to stack related registers. For instance the stack pointer automatically decreases when you do a push, and increases when you do a pop. OS cannot deal with it.
Footnotes
In a modern OS, a process uses the so called logic address. This address is mapped with physical address by the OS. To have a note of this compile yourself this simply program:
#include <stdio.h>
int main()
{
int a = 10;
printf("%p\n", &a);
return 0;
}
If you run this program multiple times (even simultaneously) you would see, even for different instances, the same address printed out. Of course this is not the real memory address, but it is a logical address that will be mapped to physical address when needed.

Why does this stack overflow so quickly?

So I wrote a toy C program that would intentionally cause a stack overflow, just to play around with the limits of my system:
#include <stdio.h>
int kefladhen(int i) {
int j = i + 1;
printf("j is %d\n",j);
kefladhen(j);
}
int main() {
printf("Hello!:D\n");
kefladhen(0);
}
I was surprised to find that the last line printed before a segmentation fault was "j is 174651". Of course the exact number it got to varied a little each time I ran it, but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop. I thought that maybe printf was incurring some overhead, but printf returns before I call kefladhen() recursively so the stack pointer should be back where it was before. I'm storing exactly one int per call, so each stack frame should only be 8 bytes total, right? So 174-thousand odd of them is only about a megabyte and a half of actual memory used, which seems way low to me. What am I misunderstanding here?

...but in general I'm surprised that 174-thousand odd stack frames are enough to exhaust the memory for a process on my 4GB linux laptop...
Note that the stack isn't the general memory pool. The stack is a chunk pre-allocated for the purpose of providing the stack. It could be just 1MB out of those 4GB of memory on the machine. My guess is your stack size is about 1.3MB; that would be enough for 174,651 eight-byte frames (four bytes for return address, four bytes for the int).

I think that the key misunderstanding here is that the stack does not grow dynamically by itself. It is set statically to a relatively small number, but you can change it in runtime (here is a link to an answer explaining how it is done with setrlimit call).

Others have already discussed the size and allocation of the stack. The reason why the "the exact number it got to varied a little each time I ran it" has to do with cache performance on multi-threaded systems.
In general, you can expect the memory pre-allocated for the stack to be page aligned. However, the starting stack pointer will vary from thread-to-thread/process-to-process/task-to-task. This is to help avoid cache line flushes, invalidations and loads. If all the tasks/threads/processes had the same virtual address for the stack pointer, you would expect that there would be more cache collisions whenever a context switch occurs. In an attempt to reduce this likelihood, many OSes will have the starting stack pointer somewhere within the starting stack page, but not necessarily at the very top, or at the same position. Thus, when a context switch occurs and a subsequent stack access occurs, there is ...
a better chance the variable will already be in the cache
a better chance that there will not be a cache collision
Hope this helps.

How will I know when my memory is full?

I'm writing a firmware for a Atmel XMEGA microcontroller in c and I think I filled up the 4 KB of SRAM. As far as I know I only do have static/global data and local stack variables (I don't use malloc within my code).
I use a local variable to buffer some pixel data. If I increase the buffer to 51 bytes my display is showing strange results - a buffer of 6 bytes is doing fine. This is why I think my ram is full and the stack is overwriting something.
Creating more free memory is not my problem because I can just move some static data into the flash and only load it when its needed. What bothers me is the fact that I could have never discovered that the memory got full.
Is it somehow possible to dected (e.g. by reseting the microcontroller) when the memory got filled up instead of letting it overwrite some other data?

It can be very difficult to predict exactly how much stack you'll need (some toolchains can have a go at this if you turn on the right options, but it's only ever a rough guide).
A common way of checking on the state of the stack is to fill it completely with a known value at startup, run the code as hard/long as you can, and then see how much had not been overwritten.
The startup code for your toolchain might even have an option to fill the stack for you.
Unfortunately, although the concepts are very simple: fill the stack with a known value, count the number of those values which remain, the reality of implementing it can require quite a deep understanding of the way your specific tools (particularly the startup code and the linker) work.
Crude ways to check if stack overflow is what's causing your problem are to make all your local arrays 'static' and/or to hugely increase the size of the stack and then see if things work better. These can both be difficult to do on small embedded systems.

"Is it somehow possible to dected (e.g.
by reseting the microcontroller) when
the memory got filled up instead of
letting it overwrite some other data?"
I suppose currently you have a memory mapping like (1).
When stack and/or variable space grow to much, they collide and overwrite each other (*).
Another possibility is a memory mapping like (2).
When stack or variable space exceeds the maximum space, they hit the not mapped addr space (*).
Depending on the controller (I am not sure about AVR family) this causes a reset/trap or similar (= what you desired).
[not mapped addr space][ RAM mapped addr space ][not mapped addr space]
(1) [variables ---> * <--- stack]
(2) *[ <--- stack variables ---> ]*
(arrows indicate growing direction if more variable/stack is used)
Of course it is better to make sure beforehand that RAM is big enough.

Typically the linker is responsible for allocating the memory for code, constants, static data, stacks and heaps. Often you must specify required stack sizes (and available memory) to the linker which will then flag an error if it can't fit everything in.
Note also that if you're dealing with a multithreaded application, then each thread has it's own stack and these are frequently allocated off the heap as the thread starts.
Unless your processor has some hardware checking for stack overflow on it (unlikely), there are a couple of tricks you can use to monitor the stack usage.
Fill the stack with a known marker pattern, and examine the stack memory (as allocated by the linker) to determine how much of the marker remains uncorrupted.
In a timer interrupt (or similar) compare the main thread stack pointer with the base of the stack to check for overflow
Both of these approaches are useful in debugging, but they are not guaranteed to catch all problems and will typically only flag a problem AFTER the stack has already corrupted something else...

Usually your programming tool knows the parameters of the controller, so you should be warned if you used more (without mallocs, it is known at compile time).
But you should be careful with pixeldata, because most displays don't have linear address space.
EDIT: usually you can specify the stack size manually. Leave just enough memory for static variables, and reserve the rest for stack.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight