Why do MCU compilers for chips like AVR or ESP (used widely by Arduino) keep all strings in SRAM heap by default? [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 months ago.
Improve this question
There is a common technique used in Arduino world, where you can use PROGMEM macros in order to keep strings and other similar data in flash memory instead of SRAM to keep lower RAM usage, while sacrificing some performance - https://www.arduino.cc/reference/en/language/variables/utilities/progmem/
Basically, instead of storing these in SRAM, there is just some reference to a FLASH address where the string is stored and loaded from on the fly, in order to save RAM.
But I can't understand why do MCU compilers put all strings including local strings from functions into heap memory and keep them there all the time in the first place. Also I don't understand how compiler can "store anything in RAM instead of flash" - RAM is volatile, so compiler can hardly "store" anything in there as it's cleared on every reset. These strings still must be present in program image stored on FLASH, so why does it copy them from FLASH to RAM on each launch of the MCU? I was thinking that maybe whole program image must be loaded into RAM for execution, but then that doesn't make sense as these chips use harvard architecture and program is executed from FLASH already (and most of these chips have much bigger FLASH than RAM anyway, so whole image would never fit into RAM).
While I understand how to use workarounds that prevent this behaviour, I can't understand why this behaviour exists in the first place. Can someone shed some light on it? Why are all strings loaded into HEAP on start of the program by default? Is that for performance reasons?

The AVR architecture is different from many other common architectures in that the the code and data exist in completely different memory spaces (though the program memory can be accessed as data, as shown in PROGMEM documentation page to which you linked). This is one type of modified Harvard architecture.
Most other architectures that you're likely to use present themselves to the user as having code and data exist in the same memory space. While this is often also done with a modified Harvard architecture, they present themselves to the user as a von Neumann architecture, having a unified code and data memory space.
On AVR, to make initialized global or static data available to use as any other in-memory data, part of the program startup code copies the initialization data from program memory into RAM. This is generally done to program segments with names like .data or .rodata, depending on whether or not the variables in question are const.
Note that, contrary to what you say in your question, this data is not copied to the heap, it's stored in some portion of RAM chosen during program linking.
Using PROGMEM and the associated functions, you can directly access the data stored in the flash memory of the AVR device. This constant data is placed in a segment that won't be copied to RAM on startup, like .progmem.data, and so doesn't have space in RAM reserved for it.
The case with the Xtensa architecture, used by the ESP8266 and some members of the ESP32 family, is completely different. Contrary to what you state in your question, I don't believe that static or global objects which are const are copied into RAM by default, only those which can be modified (the .data segment would be copied as initialization to RAM, while the .rodata segment would not be).

Related

How the function calls works in terms of Stack and Code memory?

I have been trying to understand how memory works, what happens step by step inside execution of the application in terms of memory, specially in embedded system. More of context in C/C++
Out of Stack, heap, static and Code memory of a application, which is Stored in RAM or volatile memory and which part is stored in non-volatile memory? Or when a application is executed, the whole application is copied to RAM or volatile memory?
When a function is called, does all the assembly instruction of that function gets copied to the stack or only memory is allocated to function?
If only memory is allocated to the function in real time, that means the address of those variable has to be added to the assembly code of the function, how does that happen?
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C? There is no OS in the MCU to do memory management for us so who manages this memory allocation during function calls in MCU
Out of Stack, heap, static and Code memory of a application, which is
Stored in RAM or volatile memory and which part is stored in
non-volatile memory? Or when a application is executed, the whole
application is copied to RAM or volatile memory?
When a function is called, does all the assembly instruction of that
function gets copied to the stack or only memory is allocated to
function?
If only memory is allocated to the function in real time, that means
the address of those variable has to be added to the assembly code of
the function, how does that happen?
For a comprehensive and correct understanding of code execution in bare-metal (without OS, and OS environments (embedded systems), different memory structures and all their internals, I would recommend you to cover this book - Extreme C (Auth: Kamran Amini), especially these sections:
Chapter 2: From Source to Binary
Chapter 4: Process Memory Structure
Chapter 5: Stack and Heap
In my experience, heresay and random comments will hurt your understanding instead and you will be barely able to make sense out of it. Consult authentic published content.
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C? There is no OS in the MCU to do memory management for us so who manages this memory allocation during function calls in MCU
The programmer's interface in C is the main.c file:
int main (){
// User code here
return 0;
}
But technically, your IDE (Keil uVision or Code Composer Studio etc.) along with it's RTE (Run-time environment) component, places a boilperlate code of initialization and de-initialization files (commonly known as MCU startup code) before and after this main.c file, during building (compilation) phase of your source code for that particular target hardware such as Tiva C TM4C123GH6PM or STM32F400 etc.
The code inside these startup/initialization files initializes your stack and heap memory, clock-gating controls of your mcu, setups up different peripherals (gpio, serial, i2c etc.), interrupt vector table, PC (program counter), SP (stack pointer) etc. and a lot more other things.
You can open a debug session of a simple program for your target MCU, while configuring it to 'stop at first assembly instruction/startup' and then performing stepping operations to go through all the code until you reach main.c file.
The findings are fantastic.
in embedded system
I am answering assuming a system without MMU, assuming some small microcontroller. Overall, note the that answers highly depend on specific configuration of specific system. In the end, you can store everything in volatile memory if you want, and you can configure microcontrollers that way.
Out of Stack, heap, static and Code memory of a application, which is Stored in RAM or volatile memory and which part is stored in non-volatile memory?
There are sections, see like https://www.embeddedtutor.com/2019/07/memory-layout-executable-in-embedded.html . TBH it's really trivial - if something changes, it's volatile, if something does not change and persist between power losses, it's non-volatile. Let's say:
what
where
stack
volatile
heap
volatile
static
initialization of static variables is stored in non-volatile memory, static variables themselves are stored in volatile memory. I.e. data segment
Code
We live on Harvard architectures and assembly instructions do not change, so they go to non-volatile memory
Or when a application is executed, the whole application is copied to RAM or volatile memory?
No, it is not copied.
But, anyway, you can configure a microcontroller to execute instructions from volatile memory, like STM32 code_in_ram. Still there is no copy (there can be, you can write code for that...), the instructions are there. I.e. it all depends on specific configuration.
If only memory is allocated to the function in real time, that means the address of those variable has to be added to the assembly code of the function, how does that happen?
There is a stack pointer that contains the location of the top of the stack. It is incremented or decremented depending on if adding or removing data from the stack (and note, that for example on x86 stack grows towards decreasing memory addresses, like upside down). The functions themselves manage the stack, whereas functions are created by a C compiler and the C compiler creates that code to manage that stack.
There are standards that specify how it specifically works on specific architectures - these standard specify architecture ABI. For example, x86-64 ABI or ARM ABI. They specify what registers are used, how stack is used and passed, etc. Then compilers "implement that ABI" - create assembly code that adheres to the rules given in ABI standards.
Who does all this in an embedded system stack memory allocation etc in a embedded system when we write code for it in C?
C code itself manages memory by itself. Compiler glues it together.
so who manages this memory allocation during function calls in MCU
The code itself, I guess? (Nowadays) C standard libraries are written in C, and some are open source. Newlib is a popular free C standard library implementation for embedded systems - you can inspect it's malloc implementation.

Way to detect that stack area is not overlapping RAM area during runtime

Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.
You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.
In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!
In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.

Stack and heap confusion for embedded 8051

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Beginner's confusion about x86 stack

First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.
I've been told that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.
Typically, a computer program uses four kinds of memory areas (also called sections or segments):
The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).
The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.
Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.
This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.
Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.
For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.

How much memory takes a C program [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am developping an in memory database as a side project which is supposed to be lightweight. I haven't been promming in C since school and my knowledge of computer architecure is limited...
I am wondering how can I calculate exactly how much memory my program will take and from which kind of memory (RAM, register, ... ).
The most obvious is everything I allocate through malloc. Sorry if the following questions are a bit random...
Global variables will be stored in RAM? Does the keyword static (to limit the scope) influence anything?
Are all global variable allocated at the same time or could it be lazy allocated on first access?
Is the executable loaded in memory? Does an executable of 1MB will take 1MB for the execution?
This subject is a pretty big one so don't hesitate to point me to a book or a website. I guess it's not only about C but more about the computer architecture, the assembly code etc.
I'm assuming typical computing platforms, not embedded systems.
Global variables will be stored in RAM? Does the keyword static (to limit the scope) influence anything?
Global variables will be stored in RAM only if the operating system thinks that's the best use for RAM. Scope has no effect.
Are all global variable allocated at the same time or could it be lazy allocated on first access?
It depends what you mean by "allocated". Typically virtual memory (address space) is allocated all at once, but physical memory (RAM) is allocated as needed.
Is the executable loaded in memory? Does an executable of 1MB will take 1MB for the execution?
It is mapped into memory at program start. It is actually loaded into physical memory as needed and evicted from physical memory as the OS deems appropriate.
I strongly suspect you are looking for simple answers to very complex questions.
Yes, but that doesn't mean they're all mapped at any given point in time.
They can't be lazily allocated, depending on what you mean by that. They will all mapped to virtual addresses, but then again if the program never accesses the variables the OS might never need to map those addresses to actual physical RAM.
It depends, but most modern desktop/server operating systems will page the code in as needed, I think.
Oups, that's an interesting question, but the answer is as usual : it depends !
Your questions are heavily implementation dependent. In old (now outdated) systems, existed the notion of overlays : parts of code were only loaded in memory when needed. I do not think it is still used with modern virtual memory systems, but it could have sense on embedded systems with llimited resources.
And some compilers generally have options to determine the size of the stack. It can be determinant for a lightweight program.
And there is obvious dependancy on architecture : on Unix-Linux, you have elf vs. a.out format with different memory requirement and management, on Windows, there is still the old .com format that can lead to really tiny executables.

Resources