Lately I've been studying the linker scripts used in auto-generated STM32 projects, and I'm a little bit confused about how the stack and heap memory segments are defined.
As an example, I've been looking at the files provided in ST's "CubeMX" firmware package for their F0 lines of chips, which have ARM Cortex-M0 cores. I'd paste a whole script if the files' licenses allowed it, but you can download the whole package from ST for free if you're curious1. Anyways, here are the parts relevant to my question:
/* Highest address of the user mode stack */
_estack = 0x20001000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
<...>
SECTIONS {
<...>
.bss :
{
<...>
} >RAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
<...>
}
So here's my probably-incorrect understanding of the linker's behavior:
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
The '_Min_Heap_Size' and '_Min_Stack_Size' values seem like they are supposed to define the minimum amount of space that should be dedicated to the stack and heap for the program to use. Programs that allocate a lot of dynamic memory may need more 'Heap' space, and programs that call deeply-nested functions may need more 'Stack' space.
My question is, how is this supposed to work? Are '_Min_x_Space' special labels, or are those names maybe slightly confusing? Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
Also, the space defined for the Stack does not appear to necessarily define a contiguous segment between its start and the '_estack' value defined above. If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'.
The only explanation I can think of is that the 'Heap' and 'Stack' segments might have no actual meaning, and are only defined as a compile-time safeguard so that the linker throws an error when there is significantly less dynamic memory available than expected. If that's the case, should I think of it as more of a minimum 'Combined Heap/Stack' size?
Or honestly, should I just drop the 'Heap' segment if my application won't use malloc or its ilk? It seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
You ask the question where to place the stack and the heap. On uC the answer is not as obvious as #a2f stated for many reasons.
the stack
First of many ARM uC have two stacks. One is called Master Stack and the second one Process Stack. Of course you do not need to enable this option.
Another problem is that the Cortex uC may have (for example STM32F3, many F4, F7, H7) many SRAM blocks. It is up to developer to decide where to place the stack and the heap.
Where to place the stack?
I would suggest to place MSP at the beginning of the chosen RAM. Why?
If the stack is placed at the end you do not have any control of the stack usage. When stack overflows it may silently overwrite your variables and the behavior of the program becomes unpredictable. It is not the issue if it is the LED blink thing. But imagine a large machine controller or car breaks computer.
When you place the stack at the beginning of the RAM (as beginning I mean RAM start address + stack size) when the stack is going to overflow the hardware exception is generated. You are in the full control of the uC, you can see what caused the problem (for example damaged sensor flooding the uC with data) and start the emergency routine (for example stop the machine, put the car into the service mode etc etc). The stack overflow will not happen undetected.
the Heap.
Dynamic allocation has to be used with the caution on the uCs. First problem is the possible memory fragmentation of the available memory as uC have very limited resources. Use of the dynamically allocated memory has to be considered very carefully otherwise it can be a source of serious problems. Some time ago USB HAL library was using dynamic allocation in the interrupt routine - a fraction of a second was sometimes enough to fragment the heap enough disallowing any further allocation.
Another problem is wrong implementation of the sbrk in the most of the available toolchains. The only one I know with the correct one is the BleedingEdge toolchain maintained by our colleague from this forum #Freddie Chopin.
The problem is that the implementations assume that the heap and the stack grow towards each other and eventually can meet - which is of course wrong. Another problem is improper use and initialization of the static variables with the addresses of the heap start and end.
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
As the stack here would grow downwards (from high to low addresses), it's actually the start of the stack memory region.
Are '_Min_x_Space' special labels, or are those names maybe slightly confusing?
The thing special about them is that symbols starting with an underscore followed by an uppercase letter are reserved for the implementation. e.g. min_stack_space could clash with user-defined symbols.
Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
That's the minimum size. Both the stack and the heap break may grow.
If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'
It leaves exactly 0x400 bytes, which is _Min_Stack_Size. Remeber stack grows downwards here (and often elsewhere as well).
seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
Not everything is safety-critical. You're free to not use the heap if you don't want/need/are allowed to. (Ok, not that free in the latter)
Related
I'm working on an embedded project with FreeRTOS, where I only use static memory allocation.
Looking at my linker script, I find that the following are taking up RAM space:
.data
.bss
._user_heap_stack
To my knowledge, ._user_heap_stack is used during the linking process to see if there is enough RAM space for the user-specified minimum MSP stack size. Here is a relevant snippet in my linker script:
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
I believe that MSP will always be initialized to point to the end of RAM regardless of _Min_Stack_Size, and decrement from there and data is pushed onto the stack. I see that my startup .S file configures sp as follows:
_estack = 0x20004000; /* end of RAM */
Reset_Handler:
ldr sp, =_estack /* Atollic update: set stack pointer */
As for FreeRTOS tasks, they each have stack space that is statically allocated, so it has nothing to do with _user_heap_stack I think?
My question is, with the RAM allocated .data, .bss, and _user_heap_stack, I still have some unallocated RAM, so what happens to those RAM? Is it used by anything? Is it ever useful to reserve some free RAM (i.e. non-statically allocated RAM) or is it just wasted? Or perhaps it is just extra space for MSP to use if the main stack ever grows larger in size than what's specified in _Min_Stack_Size?
TL;DR - The remaining RAM is used by the stack.
Or perhaps it is just extra space for MSP to use if the main stack ever grows larger in size than what's specified in _Min_Stack_Size?
Yes, this seem correct. See the last paragraph for more; it is not just the stack that is bigger.
See: this part,
_estack = 0x20004000; /* end of RAM */
Reset_Handler:
ldr sp, =_estack /* Atollic update: set stack pointer */
So at least the BOOT sp will be at the end of RAM.
The part with . = . + _Min_Stack_Size; just makes sure you have a MINIMUM stack or a linker error happens. Your stack is actually bigger and it is used at least at boot. I know nothing about FreeRTOS, but I suspectnote1 it is the system stack and you have user stacks. Each mode on the ARM has a separate stack. If FreeRTOS has any memory protection or privileged levels, then you will have multiple stacks. So one task crashing (due to stackoverflow, etc) won't crash the entire system. Just that tasksnote2 stack is corrupt, and not the one that manages the entire system.
It is a common idiom to have stack and heap together. With heap growing up and stack growing down. In this way, the MIN heap size and MIN stack size are imaginary. Eventually they will collide when the size of both are the total size. But things maybe okay if the stack goes into heap logical space or heap goes in to stack logical space AS long as it is not in use by the other. By space, I mean the constants in your linker file and not actual in use values.
Note1: It would kind of be insane to both waste memory and have your RTOS code using the same stack as all tasks. At least it would not be a robust OS.
Note2: By task I mean a schedulable entity. Maybe a process, task, thread, fiber, etc.
I just found out that my decoder library fails to initialize as malloc() fails to allocate memory and returns to the caller with "NULL".
I tried many possible scenarios, with or without casting and referred to a lot of other threads about malloc(), but nothing has worked, until I changed the heap size to 0x00001400, which has apparently solved the problem.
Now, the question is, how can I tell how much heap needed, or left for the program? The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means? Changing that to 0x00002000 (8192 bytes) would lead to dozens of the following error:
Error: L6406E: No space in execution regions with .ANY selector
Isn't 8KB of RAM is fraction of fraction of what the device has? Why I can't add more to the heap beyond the 0x00001800?
The program size reported by Keil after compilation is:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
The error Error: L6406E, is because no enough RAM on your target to support in linker file, there is no magic way to get more RAM, both stack and heap are using RAM memory, But in you case it seems to have more than enough memory but compiler is not aware of same.
My suggestion is to use linker response files with the Keil µVision IDE and update required memory section according to the use..
The linker command (or response) file contains only linker directives. The .OBJ files and .LIB files that are to be linked are not listed in the command file. These are obtained by µVision automatically from your project file.
The best way to start using a linker command file is to have µVision create one for you automatically and then begin making the necessary changes.
To generate a Command File from µVision...
Go to the Project menu and select the Options for Target item.
Click on the L166 Misc or L51 Misc tab to open the miscellaneous linker options.
Check the use linker control file checkbox.
Click on the Create... button. This creates a linker control file.
Click on the Edit... button. This opens the linker control file for editing.
Edit the command file to include the directives you need.
When you create a linker command file, the file created includes the directives you currently have selected.
Regarding malloc() issue you are facing,
The sizes of heap required is based on how much memory required in a application, especially the memory required dynamic memory allocation using malloc and calloc.
please note that some of the C library like "printf" functions are also using dynamic memory allocation under the hood.
If you are using the keil IDE for compiling your source code then you can increase the heap size by modifying the startup file.
;******************************************************************************
;
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
;
;******************************************************************************
Heap EQU 0x00000000
;******************************************************************************
;
; Allocate space for the heap.
;
;******************************************************************************
AREA HEAP, NOINIT, READWRITE, ALIGN=3
__heap_base
HeapMem
SPACE Heap
__heap_limit
;******************************************************************************
if you are using the make enveromennt to build the applicatation then simpely change the HEAP sizse in liner file.
Details regarding same you can get directly from Keil official website, Please check following links,
https://www.keil.com/pack/doc/mw/General/html/mw_using_stack_and_heap.html
http://www.keil.com/forum/11132/heap-size-bss-section/
http://www.keil.com/forum/14201/
BR
Jerry James.
Now, the question is, how can I tell how much heap needed, or left for the program?
That is two separate questions.
The amount of heap needed is generally non-deterministic (one reason for avoiding dynamic memory allocation in most cases in embedded systems with very limited memory) - it depends entirely on the behaviour of your program, and if your program has a memory leak bug, even knowledge of the intended behaviour won't help you.
However, any memory not allocated statically by your application can generally be allocated to the heap, otherwise it will remain unused by the C runtime in any case. In other toolchains, it is common for the linker script to automatically allocate all unused memory to the heap, so that it is as large as possible, but the default script and start-up code generated by Keil's ARM MDK does not do that; and if you make it as large as possible, then modify the code you may have to adjust the allocation each time - so it is easiest during development at least to leave a small margin for additional static data.
The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means?
Another problem is that the ARM MDK C library's malloc() implementation requires a contiguous heap and does not support the addition of arbitrary memory blocks (as far as I have been able to determine in any case), so the 64Kb CCM block cannot be used as heap memory unless the entire heap is allocated there. The memory is in fact segmented as follows:
SRAM1 112 kb
SRAM2 16 kb
CCM 64 kb
BKUPSRAM 4 kb
SRAM 1/2 are contiguous but on separate buses (which can be exploited to support DMA operations without introducing wait-states for example).
The CCM mmeory cannot be used for DMA or bit-banding, and the default ARM-MDK generated linker script does not map it at all, so to utilise it you must use a custom linker script, and then ensure that any DMA or bit-banded data are explicitly located in one of the other regions. If your heap need not be more than 64kb you could locate it there but to do that needs a modification of the start-up assembler code that allocates the heap.
The 4Kb backup SRAM is accessed as a peripheral and is mapped in the peripheral register space.
With respect to determining how much heap remains at run-time, the ARM library provides a somewhat cumbersome __heapstats function. Unfortunately it does not simply return the available freespace (it is not quite as simple as that because heap free space is not on its own particularly useful since block fragmentation can least to allocation failure even if cumulatively there is sufficient memory). __heapstats requires a pointer to an fprintf()-like function to output formatted text information on heap state. For example:
void heapinfo()
{
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
}
Then you might write:
mem = malloc( some_space ) ;
if( mem == NULL )
{
heapinfo() ;
for(;;) ; // wait for watchdog or debugger attach
}
// memory allocated successfully
Given:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
You have used 129920 of the available 131652 bytes, so could in theory add 1152 bytes to the heap, but you would have to keep changing this as the ammount of static data changed as you modified your code. Part of the ZI (zero initialised) data is your heap allocation, everything else is your application stack and static data with no explicit initialiser. The full link map generated by the linker will show what is allocated statically.
It may be possible to increase heap size by reducing stack allocation. The ARM linker can generate stack usage analysis in the link map (as described here) to help "right-size" your stack. If you have excessive stack allocation, this may help. However stack-overflow errors are even more difficult to detect and debug than memory allocation failure and call by function-pointer and interrupt processing will confound such analysis, so leave a safety margin.
It would perhaps be better to use a customised linker script and modify the heap location in the start-up code to locate the heap in the otherwise unused CCM segment (and be sure you do not use dynamic memory for either DMA or bit-banding). You can then safely create a 64Kb heap assuming you locate nothing else there.
Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.
You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.
In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!
In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.
I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.
I'm writing a firmware for a Atmel XMEGA microcontroller in c and I think I filled up the 4 KB of SRAM. As far as I know I only do have static/global data and local stack variables (I don't use malloc within my code).
I use a local variable to buffer some pixel data. If I increase the buffer to 51 bytes my display is showing strange results - a buffer of 6 bytes is doing fine. This is why I think my ram is full and the stack is overwriting something.
Creating more free memory is not my problem because I can just move some static data into the flash and only load it when its needed. What bothers me is the fact that I could have never discovered that the memory got full.
Is it somehow possible to dected (e.g. by reseting the microcontroller) when the memory got filled up instead of letting it overwrite some other data?
It can be very difficult to predict exactly how much stack you'll need (some toolchains can have a go at this if you turn on the right options, but it's only ever a rough guide).
A common way of checking on the state of the stack is to fill it completely with a known value at startup, run the code as hard/long as you can, and then see how much had not been overwritten.
The startup code for your toolchain might even have an option to fill the stack for you.
Unfortunately, although the concepts are very simple: fill the stack with a known value, count the number of those values which remain, the reality of implementing it can require quite a deep understanding of the way your specific tools (particularly the startup code and the linker) work.
Crude ways to check if stack overflow is what's causing your problem are to make all your local arrays 'static' and/or to hugely increase the size of the stack and then see if things work better. These can both be difficult to do on small embedded systems.
"Is it somehow possible to dected (e.g.
by reseting the microcontroller) when
the memory got filled up instead of
letting it overwrite some other data?"
I suppose currently you have a memory mapping like (1).
When stack and/or variable space grow to much, they collide and overwrite each other (*).
Another possibility is a memory mapping like (2).
When stack or variable space exceeds the maximum space, they hit the not mapped addr space (*).
Depending on the controller (I am not sure about AVR family) this causes a reset/trap or similar (= what you desired).
[not mapped addr space][ RAM mapped addr space ][not mapped addr space]
(1) [variables ---> * <--- stack]
(2) *[ <--- stack variables ---> ]*
(arrows indicate growing direction if more variable/stack is used)
Of course it is better to make sure beforehand that RAM is big enough.
Typically the linker is responsible for allocating the memory for code, constants, static data, stacks and heaps. Often you must specify required stack sizes (and available memory) to the linker which will then flag an error if it can't fit everything in.
Note also that if you're dealing with a multithreaded application, then each thread has it's own stack and these are frequently allocated off the heap as the thread starts.
Unless your processor has some hardware checking for stack overflow on it (unlikely), there are a couple of tricks you can use to monitor the stack usage.
Fill the stack with a known marker pattern, and examine the stack memory (as allocated by the linker) to determine how much of the marker remains uncorrupted.
In a timer interrupt (or similar) compare the main thread stack pointer with the base of the stack to check for overflow
Both of these approaches are useful in debugging, but they are not guaranteed to catch all problems and will typically only flag a problem AFTER the stack has already corrupted something else...
Usually your programming tool knows the parameters of the controller, so you should be warned if you used more (without mallocs, it is known at compile time).
But you should be careful with pixeldata, because most displays don't have linear address space.
EDIT: usually you can specify the stack size manually. Leave just enough memory for static variables, and reserve the rest for stack.