RAM usage AT32UC3B0512 - c

I'm searching for a way to see the RAM usage of my application running on an at32uc3b0512.
arv32-size.exe foo.elf tells me:
text data bss dec hex filename
263498 11780 86524 361802 5854a foo.elf
According to 'google', RAM usage is .data + .bss. But .data + .bss is already (11780+86524)/1024 = 96kb, which would mean that my RAM is full (at32uc3b0512 -> 96kb SRAM). But the application works as desired. Am I wrong???

The chip you are using has 96kB of RAM and that is also the sum of your .bss and .data sections. This does not mean that all of your RAM is being used up, rather it is merely showing how the RAM is being allocated.

The program on MCU is usually located in FLASH
this is not true if you have some OS present
and load program to memory on runtime from somewhere like SD card
not all MCU's can do that
I suspect that is not your case
The program Flash is 512 KByte big (I guess from your IC's number)
The SDRAM is used for C engine/OS,stack and heap
your chip has 96 KByte
the C engine is something like OS handling
dynamic allocations,heap,stack,subroutine calls
and including RTL used during compilation
and of coarse dummy Interrupt sub routines for unused interrupts...
When you compile program to ELF/HEX what ever
the compiler/linker tells you only
how big the program code and data is (located in program FLASH memory)
how big static variables you have
the rest is unknown until runtime itself
So if you need to know how big chunk of memory you take
then you need to extract it from runtime
by some RTL call to get memory status
or by estimating it yourself based on knowledge of
what your program does
how much of dynamic memory is used
heap/stack trashing/usage
recursions level, etc...
Or you can try to increasingly allocate memory until you hit out of memory
and count how big chunk you allocated altogether
then release it of coarse
the used memory is then ~ 96KB - altogether_allocated_memory
(+/-) granularity ...

Related

How can C allocate memory without an OS? (Cases where OS is written in C)

I am following a course where we are writing an OS on a microcontroller.
The OS is written in C and the instructor initializes stack spaces for each thread in the following way.
int32_t TCB_STACK[NUM_OF_THREADS][STACK_SIZE];
How can the memory be allocated if there isn't any OS already running to service this in the first place? Am I missing something?
You don't have to "allocate" anything as such on a bare metal system, you have full access to all of the physical memory and there's no one else to share it with.
On such a system those arrays end up in statically allocated RAM memory known as .bss. Where .bss is located and how large it is, is determined by the linker script. The linker script decides which parts of the memory to reserve for what. Stack, .bss, .data and so on. Similarly the flash memory for the actual program may be divided in several parts.

Embedded linker scripts - proper placement of 'stack' and 'heap' regions?

Lately I've been studying the linker scripts used in auto-generated STM32 projects, and I'm a little bit confused about how the stack and heap memory segments are defined.
As an example, I've been looking at the files provided in ST's "CubeMX" firmware package for their F0 lines of chips, which have ARM Cortex-M0 cores. I'd paste a whole script if the files' licenses allowed it, but you can download the whole package from ST for free if you're curious1. Anyways, here are the parts relevant to my question:
/* Highest address of the user mode stack */
_estack = 0x20001000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
<...>
SECTIONS {
<...>
.bss :
{
<...>
} >RAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
<...>
}
So here's my probably-incorrect understanding of the linker's behavior:
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
The '_Min_Heap_Size' and '_Min_Stack_Size' values seem like they are supposed to define the minimum amount of space that should be dedicated to the stack and heap for the program to use. Programs that allocate a lot of dynamic memory may need more 'Heap' space, and programs that call deeply-nested functions may need more 'Stack' space.
My question is, how is this supposed to work? Are '_Min_x_Space' special labels, or are those names maybe slightly confusing? Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
Also, the space defined for the Stack does not appear to necessarily define a contiguous segment between its start and the '_estack' value defined above. If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'.
The only explanation I can think of is that the 'Heap' and 'Stack' segments might have no actual meaning, and are only defined as a compile-time safeguard so that the linker throws an error when there is significantly less dynamic memory available than expected. If that's the case, should I think of it as more of a minimum 'Combined Heap/Stack' size?
Or honestly, should I just drop the 'Heap' segment if my application won't use malloc or its ilk? It seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
You ask the question where to place the stack and the heap. On uC the answer is not as obvious as #a2f stated for many reasons.
the stack
First of many ARM uC have two stacks. One is called Master Stack and the second one Process Stack. Of course you do not need to enable this option.
Another problem is that the Cortex uC may have (for example STM32F3, many F4, F7, H7) many SRAM blocks. It is up to developer to decide where to place the stack and the heap.
Where to place the stack?
I would suggest to place MSP at the beginning of the chosen RAM. Why?
If the stack is placed at the end you do not have any control of the stack usage. When stack overflows it may silently overwrite your variables and the behavior of the program becomes unpredictable. It is not the issue if it is the LED blink thing. But imagine a large machine controller or car breaks computer.
When you place the stack at the beginning of the RAM (as beginning I mean RAM start address + stack size) when the stack is going to overflow the hardware exception is generated. You are in the full control of the uC, you can see what caused the problem (for example damaged sensor flooding the uC with data) and start the emergency routine (for example stop the machine, put the car into the service mode etc etc). The stack overflow will not happen undetected.
the Heap.
Dynamic allocation has to be used with the caution on the uCs. First problem is the possible memory fragmentation of the available memory as uC have very limited resources. Use of the dynamically allocated memory has to be considered very carefully otherwise it can be a source of serious problems. Some time ago USB HAL library was using dynamic allocation in the interrupt routine - a fraction of a second was sometimes enough to fragment the heap enough disallowing any further allocation.
Another problem is wrong implementation of the sbrk in the most of the available toolchains. The only one I know with the correct one is the BleedingEdge toolchain maintained by our colleague from this forum #Freddie Chopin.
The problem is that the implementations assume that the heap and the stack grow towards each other and eventually can meet - which is of course wrong. Another problem is improper use and initialization of the static variables with the addresses of the heap start and end.
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
As the stack here would grow downwards (from high to low addresses), it's actually the start of the stack memory region.
Are '_Min_x_Space' special labels, or are those names maybe slightly confusing?
The thing special about them is that symbols starting with an underscore followed by an uppercase letter are reserved for the implementation. e.g. min_stack_space could clash with user-defined symbols.
Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
That's the minimum size. Both the stack and the heap break may grow.
If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'
It leaves exactly 0x400 bytes, which is _Min_Stack_Size. Remeber stack grows downwards here (and often elsewhere as well).
seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
Not everything is safety-critical. You're free to not use the heap if you don't want/need/are allowed to. (Ok, not that free in the latter)

RAM & ROM memory segments

There are different memory segments such as .bss, .text, .data,.rodata,....
I've failed to know which of them locates in RAM and which of them locates in FLASH memory, many sources have mentioned them in both sections of (RAM & ROM) memories.
Please provide a fair explanation of the memory segments of RAM and flash.
ATMEL studio compiler
ATMEGA 32 platform
Hopefully you understand the typical uses of those section names. .text being code, .rodata read only data, .data being non-zero read/write data (global variables for example that have been initialized at compile time), .bss read/write data assumed to be zero, uninitialized. (global variables that were not initialized).
so .text and .rodata are read only so they can be in flash or ram and be used there. .data and .bss are read/write so they need to be USED in ram, but in order to put that information in ram it has to be in a non-volatile place when the power is off, then copied over to ram. So in a microcontroller the .data information will live in flash and the bootstrap code needs to copy that data to its home in ram where the code expects to find it. For .bss you dont need all those zeros you just need the starting address and number of bytes and the bootstrap can zero that memory.
so all of them can/do live in both. but the typical use case is the read only ones are USED in flash, and the read/write USED in ram.
They are located wherever your project's linker script defines them to be located.
Some targets locate and execute code in ROM, while others may copy code from ROM to RAM on start-up and execute from RAM - usually for performance reasons on faster processors. As such .text and .rodata may be located in R/W or R/O memory. However .bss and .data cannot by definition be located in R/O memory.
ROM cannot be written to, but RAM can be written to.
ROM holds the (BIOS) Basic Input / Output System, but RAM holds the programs running and the data used.
ROM is much smaller than RAM.
ROM is non-volatile (permanent), but RAM is volatile.

When does my microcontroler uses my Flash or my RAM?

I am currently developing an embedded application on the Atmel SAML21J microcontroller, and I have 256KB Flash memory, and a 40KB SRAM memory. When I program my app on the MCU, I have the following message :
Program Memory Usage 66428 bytes 24,6 % Full
Data Memory Usage 29112 bytes 71,1 % Full
It seems to mean that even before I start to run my code, I already have a 71% full RAM.
I would like to know the following things:
what is defined in the RAM, and what is defined in the Flash ?
can I do something to use more of my Flash (that is only 24% full) to save space on the SRAM, and how ?
I saw a ".ld" file that specifies the size of my stack : will it leave me more space in the RAM if I make it higher ?
In this .ld file, is the memory (Flash + SRAM) considered as one unique memory entity ? (meaning that the addresses of the SRAM starts and the end of the flash, for example ?)
Even if I read a lot of things on this subject, this is still shady to me, and I would really appreciate if you guys enligthened me on that.
Thanks.
Where and what placed (defined):
Stack (local variables placed in stack), all global variables, functions that specificied with special keyword (for ex. __ramfuc for IAR) as runned from RAM - are placed in RAM.
All functions (no differents where it's will run), all constants, variables initialization values are placed in Flash. Worth mentioning for AVR you need to use keyword PROGMEM to place any constant to Flash (functions don't need that), while for ARM keyword const will be enough.
For save RAM space you can (in order of effectiveness):
place big tables and text constants (debug messages too) in Flash
merge global buffers (with unions) and use it for differents task in different time
reduce size of stack, there could be problems with stack overflow - so you must reduce functions nesting
use bitmasks for global flags instead of bytes
If you insrease stack size: since stack placed in RAM so you increase RAM usage.
Flash and RAM memories have different address ranges, so from .ld file
you can know where each variable or function aligned by linker:
/* Memories definition in *.ld file */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
ROM (rx) : ORIGIN = 0x8000000, LENGTH = 1024K
}
/* Sections */
SECTIONS
{
/* The program code and other data into ROM memory */
.text :
{
...
} >ROM
}
There we have:
128Kb of RAM address range [0x20000000, 0x2001FFFF]
1Mb of Flash address range [0x08000000, 0x080FFFFF]
And example how section text placed to Flash memory.
And theh after success compile project you can open file ./[Release|Debug]/output.map for see where each functions and variables are placed:
.text.main 0x08000500 0xa4 src/main.o
0x08000500 main
...
.data 0x20000024 0x124 src/main.o
0x20000024 io_buffer
Function main is placed in Flash memory, global variable io_buffer is placed in RAM memory.

When is static data (.bss) allocated?

I have been looking into reducing the memory footprint of an application. Following on from a previous question: GDB - can I find large data elements in memory I have found and removed most of the biggest culprits.
nm --size-sort was invaluable finding the large items from the .bss section of the executables.
The memory footprint as viewed in pmap has dropped very substantially. But while continuing this work on another system (Ubuntu Pangolin, gcc 4.6.3), I have noticed the memory footprint of running processes is perfectly reasonable, and certainly much smaller than the .bss size.
Running the code through the debugger, it looks like the biggest symbols from the .bss section are not really being allocated until the data is accessed (i.e. I can set an array
element from one of the big symbols, and the memory footprint grows by 16MB).
The .bss section is just zero-initialised, so it is easy to imagine an implementation assigning virtual address space to it, but not actually assigning any real memory until it is used.
Is this a real difference in behaviour, or a difference in reporting between systems?
In Linux zero-initialized pages are all mapped to the same "zeroed" physical page in memory. Using a copy-on-write method, a page is copied and re-mapped to a new page when you write to the memory of that page, which in turn causes the memory footprint of the application to grow. Sounds like this is what is happening, as you suspect. This would hold for all Linux distros.

Resources