Stack and heap confusion for embedded 8051 - c

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?

The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).

I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.

Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);

Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS

In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Related

How the function calls works in terms of Stack and Code memory?

I have been trying to understand how memory works, what happens step by step inside execution of the application in terms of memory, specially in embedded system. More of context in C/C++
Out of Stack, heap, static and Code memory of a application, which is Stored in RAM or volatile memory and which part is stored in non-volatile memory? Or when a application is executed, the whole application is copied to RAM or volatile memory?
When a function is called, does all the assembly instruction of that function gets copied to the stack or only memory is allocated to function?
If only memory is allocated to the function in real time, that means the address of those variable has to be added to the assembly code of the function, how does that happen?
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C? There is no OS in the MCU to do memory management for us so who manages this memory allocation during function calls in MCU
Out of Stack, heap, static and Code memory of a application, which is
Stored in RAM or volatile memory and which part is stored in
non-volatile memory? Or when a application is executed, the whole
application is copied to RAM or volatile memory?
When a function is called, does all the assembly instruction of that
function gets copied to the stack or only memory is allocated to
function?
If only memory is allocated to the function in real time, that means
the address of those variable has to be added to the assembly code of
the function, how does that happen?
For a comprehensive and correct understanding of code execution in bare-metal (without OS, and OS environments (embedded systems), different memory structures and all their internals, I would recommend you to cover this book - Extreme C (Auth: Kamran Amini), especially these sections:
Chapter 2: From Source to Binary
Chapter 4: Process Memory Structure
Chapter 5: Stack and Heap
In my experience, heresay and random comments will hurt your understanding instead and you will be barely able to make sense out of it. Consult authentic published content.
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C? There is no OS in the MCU to do memory management for us so who manages this memory allocation during function calls in MCU
The programmer's interface in C is the main.c file:
int main (){
// User code here
return 0;
}
But technically, your IDE (Keil uVision or Code Composer Studio etc.) along with it's RTE (Run-time environment) component, places a boilperlate code of initialization and de-initialization files (commonly known as MCU startup code) before and after this main.c file, during building (compilation) phase of your source code for that particular target hardware such as Tiva C TM4C123GH6PM or STM32F400 etc.
The code inside these startup/initialization files initializes your stack and heap memory, clock-gating controls of your mcu, setups up different peripherals (gpio, serial, i2c etc.), interrupt vector table, PC (program counter), SP (stack pointer) etc. and a lot more other things.
You can open a debug session of a simple program for your target MCU, while configuring it to 'stop at first assembly instruction/startup' and then performing stepping operations to go through all the code until you reach main.c file.
The findings are fantastic.
in embedded system
I am answering assuming a system without MMU, assuming some small microcontroller. Overall, note the that answers highly depend on specific configuration of specific system. In the end, you can store everything in volatile memory if you want, and you can configure microcontrollers that way.
Out of Stack, heap, static and Code memory of a application, which is Stored in RAM or volatile memory and which part is stored in non-volatile memory?
There are sections, see like https://www.embeddedtutor.com/2019/07/memory-layout-executable-in-embedded.html . TBH it's really trivial - if something changes, it's volatile, if something does not change and persist between power losses, it's non-volatile. Let's say:
what
where
stack
volatile
heap
volatile
static
initialization of static variables is stored in non-volatile memory, static variables themselves are stored in volatile memory. I.e. data segment
Code
We live on Harvard architectures and assembly instructions do not change, so they go to non-volatile memory
Or when a application is executed, the whole application is copied to RAM or volatile memory?
No, it is not copied.
But, anyway, you can configure a microcontroller to execute instructions from volatile memory, like STM32 code_in_ram. Still there is no copy (there can be, you can write code for that...), the instructions are there. I.e. it all depends on specific configuration.
If only memory is allocated to the function in real time, that means the address of those variable has to be added to the assembly code of the function, how does that happen?
There is a stack pointer that contains the location of the top of the stack. It is incremented or decremented depending on if adding or removing data from the stack (and note, that for example on x86 stack grows towards decreasing memory addresses, like upside down). The functions themselves manage the stack, whereas functions are created by a C compiler and the C compiler creates that code to manage that stack.
There are standards that specify how it specifically works on specific architectures - these standard specify architecture ABI. For example, x86-64 ABI or ARM ABI. They specify what registers are used, how stack is used and passed, etc. Then compilers "implement that ABI" - create assembly code that adheres to the rules given in ABI standards.
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C?
C code itself manages memory by itself. Compiler glues it together.
so who manages this memory allocation during function calls in MCU
The code itself, I guess? (Nowadays) C standard libraries are written in C, and some are open source. Newlib is a popular free C standard library implementation for embedded systems - you can inspect it's malloc implementation.

Way to detect that stack area is not overlapping RAM area during runtime

Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.
You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.
In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!
In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.

Beginner's confusion about x86 stack

First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.
I've been told that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.
Typically, a computer program uses four kinds of memory areas (also called sections or segments):
The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).
The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.
Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.
This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.
Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.
For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

How will I know when my memory is full?

I'm writing a firmware for a Atmel XMEGA microcontroller in c and I think I filled up the 4 KB of SRAM. As far as I know I only do have static/global data and local stack variables (I don't use malloc within my code).
I use a local variable to buffer some pixel data. If I increase the buffer to 51 bytes my display is showing strange results - a buffer of 6 bytes is doing fine. This is why I think my ram is full and the stack is overwriting something.
Creating more free memory is not my problem because I can just move some static data into the flash and only load it when its needed. What bothers me is the fact that I could have never discovered that the memory got full.
Is it somehow possible to dected (e.g. by reseting the microcontroller) when the memory got filled up instead of letting it overwrite some other data?
It can be very difficult to predict exactly how much stack you'll need (some toolchains can have a go at this if you turn on the right options, but it's only ever a rough guide).
A common way of checking on the state of the stack is to fill it completely with a known value at startup, run the code as hard/long as you can, and then see how much had not been overwritten.
The startup code for your toolchain might even have an option to fill the stack for you.
Unfortunately, although the concepts are very simple: fill the stack with a known value, count the number of those values which remain, the reality of implementing it can require quite a deep understanding of the way your specific tools (particularly the startup code and the linker) work.
Crude ways to check if stack overflow is what's causing your problem are to make all your local arrays 'static' and/or to hugely increase the size of the stack and then see if things work better. These can both be difficult to do on small embedded systems.
"Is it somehow possible to dected (e.g.
by reseting the microcontroller) when
the memory got filled up instead of
letting it overwrite some other data?"
I suppose currently you have a memory mapping like (1).
When stack and/or variable space grow to much, they collide and overwrite each other (*).
Another possibility is a memory mapping like (2).
When stack or variable space exceeds the maximum space, they hit the not mapped addr space (*).
Depending on the controller (I am not sure about AVR family) this causes a reset/trap or similar (= what you desired).
[not mapped addr space][ RAM mapped addr space ][not mapped addr space]
(1) [variables ---> * <--- stack]
(2) *[ <--- stack variables ---> ]*
(arrows indicate growing direction if more variable/stack is used)
Of course it is better to make sure beforehand that RAM is big enough.
Typically the linker is responsible for allocating the memory for code, constants, static data, stacks and heaps. Often you must specify required stack sizes (and available memory) to the linker which will then flag an error if it can't fit everything in.
Note also that if you're dealing with a multithreaded application, then each thread has it's own stack and these are frequently allocated off the heap as the thread starts.
Unless your processor has some hardware checking for stack overflow on it (unlikely), there are a couple of tricks you can use to monitor the stack usage.
Fill the stack with a known marker pattern, and examine the stack memory (as allocated by the linker) to determine how much of the marker remains uncorrupted.
In a timer interrupt (or similar) compare the main thread stack pointer with the base of the stack to check for overflow
Both of these approaches are useful in debugging, but they are not guaranteed to catch all problems and will typically only flag a problem AFTER the stack has already corrupted something else...
Usually your programming tool knows the parameters of the controller, so you should be warned if you used more (without mallocs, it is known at compile time).
But you should be careful with pixeldata, because most displays don't have linear address space.
EDIT: usually you can specify the stack size manually. Leave just enough memory for static variables, and reserve the rest for stack.

Resources