Split section into multiple memory regions - c

I'm developing an application on an ARM Cortex-M microcontroller which has two RAM banks à 64kB. The first bank is directly followed by the second bank in the memory map.
The memory banks are currently split into two regions in my linker script. The first region contains the sections .bss and .data. The second bank is used for .heap and .stack, which only take 1kB each (I'm using a different stack in FreeRTOS, which also manages it's own heap).
My problem is, that .bss is too large for the first bank. Therefore I'd like to move some of it's content to the second bank.
One way to accomplish this would be to create a new section, lets call it .secondbss, which is linked to the second bank. Single variables could then be added to this section using __attribute__((section(".secondbss"))).
The reasons why I am not using this solution are
I really want to maintain portability of my source code
There might be a whole lot of variables that would require this attribute and I don't want to choose the section for every single variable
Is there a better solution for this? I already thought of both memories as one region, but I don't know how to prevent the linker from misaligning the data across the boundary between both banks.
How can I solve my problem without using __attribute__ flags?
Thank you!

For example you have 2 banks at 0x20000000 and 0x20010000. You wants use Bank2 for heap and (main) stack. I assume that you have large .bss because of configTOTAL_HEAP_SIZE in FreeRTOSConfig.h. Now see heap sources in FreeRTOS/Source/portable/MemMang/. There are 5 implementations of pvPortMalloc() that do memory allocation.
Looks at lines in heap_X.c that you use
/* Allocate the memory for the heap. */
#if( configAPPLICATION_ALLOCATED_HEAP == 1 )
/* The application writer has already defined the array used for the RTOS
heap - probably so it can be placed in a special segment or address. */
extern uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];
#else
static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];
#endif /* configAPPLICATION_ALLOCATED_HEAP */
So you can set configAPPLICATION_ALLOCATED_HEAP at 1 and say to you linker to place ucHeap at 0x20010000.
Another way is writing headers for each device that includes addresses of heap and stack and edit sources.
For heap_1.c we can do next changes:
// somewhere in devconfig.h
#define HEAP_ADDR 0x20010000
// in heap_1.c
// remove code related ucHeap
//
// remove static uint8_t *pucAlignedHeap = NULL;
// and paste:
static uint8_t *pucAlignedHeap = (uint8_t *)HEAP_ADDR;
For heap_2.c and heap_4.c edit function prvHeapInit() as well.
Pay attention to heap_5.c that includes vPortDefineHeapRegions().
Now pvPortMalloc() will returns pointers to memory in Bank2. pvPortMalloc() used for allocations stacks of tasks, TCB and user varables. Read sources. Location of main stack depends of your device/architecture. For stm32 (ARM) see vector table or how to change MSP register.

Related

Embedded linker scripts - proper placement of 'stack' and 'heap' regions?

Lately I've been studying the linker scripts used in auto-generated STM32 projects, and I'm a little bit confused about how the stack and heap memory segments are defined.
As an example, I've been looking at the files provided in ST's "CubeMX" firmware package for their F0 lines of chips, which have ARM Cortex-M0 cores. I'd paste a whole script if the files' licenses allowed it, but you can download the whole package from ST for free if you're curious1. Anyways, here are the parts relevant to my question:
/* Highest address of the user mode stack */
_estack = 0x20001000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
<...>
SECTIONS {
<...>
.bss :
{
<...>
} >RAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
<...>
}
So here's my probably-incorrect understanding of the linker's behavior:
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
The '_Min_Heap_Size' and '_Min_Stack_Size' values seem like they are supposed to define the minimum amount of space that should be dedicated to the stack and heap for the program to use. Programs that allocate a lot of dynamic memory may need more 'Heap' space, and programs that call deeply-nested functions may need more 'Stack' space.
My question is, how is this supposed to work? Are '_Min_x_Space' special labels, or are those names maybe slightly confusing? Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
Also, the space defined for the Stack does not appear to necessarily define a contiguous segment between its start and the '_estack' value defined above. If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'.
The only explanation I can think of is that the 'Heap' and 'Stack' segments might have no actual meaning, and are only defined as a compile-time safeguard so that the linker throws an error when there is significantly less dynamic memory available than expected. If that's the case, should I think of it as more of a minimum 'Combined Heap/Stack' size?
Or honestly, should I just drop the 'Heap' segment if my application won't use malloc or its ilk? It seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
You ask the question where to place the stack and the heap. On uC the answer is not as obvious as #a2f stated for many reasons.
the stack
First of many ARM uC have two stacks. One is called Master Stack and the second one Process Stack. Of course you do not need to enable this option.
Another problem is that the Cortex uC may have (for example STM32F3, many F4, F7, H7) many SRAM blocks. It is up to developer to decide where to place the stack and the heap.
Where to place the stack?
I would suggest to place MSP at the beginning of the chosen RAM. Why?
If the stack is placed at the end you do not have any control of the stack usage. When stack overflows it may silently overwrite your variables and the behavior of the program becomes unpredictable. It is not the issue if it is the LED blink thing. But imagine a large machine controller or car breaks computer.
When you place the stack at the beginning of the RAM (as beginning I mean RAM start address + stack size) when the stack is going to overflow the hardware exception is generated. You are in the full control of the uC, you can see what caused the problem (for example damaged sensor flooding the uC with data) and start the emergency routine (for example stop the machine, put the car into the service mode etc etc). The stack overflow will not happen undetected.
the Heap.
Dynamic allocation has to be used with the caution on the uCs. First problem is the possible memory fragmentation of the available memory as uC have very limited resources. Use of the dynamically allocated memory has to be considered very carefully otherwise it can be a source of serious problems. Some time ago USB HAL library was using dynamic allocation in the interrupt routine - a fraction of a second was sometimes enough to fragment the heap enough disallowing any further allocation.
Another problem is wrong implementation of the sbrk in the most of the available toolchains. The only one I know with the correct one is the BleedingEdge toolchain maintained by our colleague from this forum #Freddie Chopin.
The problem is that the implementations assume that the heap and the stack grow towards each other and eventually can meet - which is of course wrong. Another problem is improper use and initialization of the static variables with the addresses of the heap start and end.
The '_estack' value is set to the end of RAM - this script is for an 'STM32F031K6' chip which has 4KB of RAM starting at 0x20000000. It is used in ST's example vector tables to define the starting stack pointer, so it seems like this is supposed to mark one end of the 'Stack' memory block.
As the stack here would grow downwards (from high to low addresses), it's actually the start of the stack memory region.
Are '_Min_x_Space' special labels, or are those names maybe slightly confusing?
The thing special about them is that symbols starting with an underscore followed by an uppercase letter are reserved for the implementation. e.g. min_stack_space could clash with user-defined symbols.
Because it looks like the linker script just appends memory segments of those exact sizes to the RAM without consideration for the program's actual usage.
That's the minimum size. Both the stack and the heap break may grow.
If there is no other RAM used, nm shows that the '_user_heap_stack' section ends at 0x20000600, which leaves a bunch of empty RAM before '_estack'
It leaves exactly 0x400 bytes, which is _Min_Stack_Size. Remeber stack grows downwards here (and often elsewhere as well).
seems like good practice to avoid dynamic memory allocation in embedded systems when you can, anyways.
Not everything is safety-critical. You're free to not use the heap if you don't want/need/are allowed to. (Ok, not that free in the latter)

XC8 : Cannot define auto array

I defined a auto char array as following :
char buffer[100];
When i compile it,the compiler return following error :
error: (1250) could not find space (100 bytes) for variable _buffer
But when i change it to:
static char buffer[100];
The program is compiled successfully.
Note 1:
My target device is 16f1829.
Note 2 :
Compiler version is 1.30.
All PIC16's have RAM banks which are 80 bytes of usable RAM per bank. This is described in section 3.2.4 Common RAM in the datasheet.
The problem you are seeing is not related to the size of the stack but rather the size of each item which can be allocated on the stack.
On XC8 auto variables cannot individually be larger than one bank of ram, which means the largest auto variable possible will be 80 bytes.
This is described in detail in the XC8 Compilers Users Guide in section 5.5.2.2.3 as follows :
Unlike with non-auto variables, it is not efficient to access auto variables within the compiled stack using the linear memory of enhanced mid-range devices. For all devices, including PIC18 and Enhanced mid-range PIC MCUs, each component of the compiled stack must fit entirely within one bank of data memory on the target device (however, you can have more than one component, each allocated to a different bank). This limits the size of objects within the stack to the maximum free space of the bank in which it is allocated. The more auto variables in the stack; the more restrictive the space is to large objects. Recall that SFRs on mid-range devices are usually present in each data bank, so the maximum amount of GPR available in each bank is typically less than the bank size for these devices.
Yes and it's not really a stack because xc8 functions are non-reentrant.
That means rather than use an actual stack to hold auto variables and parameters, it uses pre-allocated space in RAM.
Look at the call tree maps. You may have duplicated calls because if they can be called from both an interrupt and normal runtime they will be in different call trees and will therefore require separate allocations of memory (i.e. they will take up DOUBLE the space; potentially TRIPLE if you have three call tress)

Stack and heap confusion for embedded 8051

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Finding the last variable in __attribute__(section)

I'm currently working on an embedded system, and in order to meet time constraints I've needed to lock some code in cache. I've placed all the functions I will need to be locked cache into the section MEMORY_CACHEABLE by using the section variable attribute.
Because the board I'm using sets memory attributes for 1 megabyte chunks, I've made MEMORY_CACHEABLE 1MB in size.
When it comes to actually locking the code in cache, I need to determine the high address for the code inside of MEMORY_CACHEABLE, since it does not occupy the entire memory space and I don't want to lock unused memory in cache.
The way I've been doing this is by using a placeholder in MEMORY_CACHEABLE, that is defined in my C code after all of the other function placed in MEMORY_CACHEABLE. Every time I've debugged, I've confirmed that the placeholder has a higher address than the other function. I've been using this value as the high address, but it seems a little hacky.
I know there's no standard way of determining the size of a C function at runtime, but is there a more straightforward way to discover the high address of the code in this specific memory section?
Also, I'm cross-compiling using arm-xilinx-eabi-gcc.
Thanks!
You can use a linker script for that. Maybe you are already using one to specify the memory section attributes.
So, just add:
MEMORY_CACHEABLE :
{
BEGIN_MEMORY_CACHEABLE = .;
*(MEMORY_CACHEABLE)
END_MEMORY_CACHEABLE = .;
}
Then in the C code:
extern char BEGIN_MEMORY_CACHEABLE, END_MEMORY_CACHEABLE;
And use &BEGIN_MEMORY_CACHEABLE as a pointer to the beginning and &END_MEMORY_CACHEABLE a pointer to one-past-end of your cacheable memory.

What might be the point in putting a variable exactly in the "STACK" section with __attribute__ ((section("STACK"))?

In gcc doc one reason is given for using section. This reason is to map to special hardware. But this seems to be not my case.
So I have given a task to modify a shared library that we use on our project. It is a Linux library. There is variable declarations in the library that puzzeles me. They look like this (roughly):
static int my_var_1 __attribute__((section("STACK"))) = 0;
Update 1:
There are a dozen of variables defined in this way (__attribute__((section("STACK"))))
Update 2:
my_var_1 is not a constant. my_var_1 might be changed in code during initialization:
my_var_1 = atoi(getenv("MY_VAR_1") ? getenv("MY_VAR_1") : "0");
later in the library it is used like this:
inline void do_something() __attribute__((always_inline));
inline void do_something()
{
if (my_var_1)
do_something_else();
}
What might be the point in using __attribute__((section("STACK")))? I understand that section tells the compiler to put a variable in the particular section. However what might be the point in putting static int exactly in the "STACK" section?
Update 3
These lines are excerpt from the output from readelf -t my_lib.so
[23] .got.plt
PROGBITS 00000000002103f0 00000000000103f0 0
00000000000003a8 0000000000000008 0 8
[0000000000000003]: WRITE, ALLOC
[24] .data
PROGBITS 00000000002107a0 00000000000107a0 0
00000000000000b0 0000000000000000 0 16
[0000000000000003]: WRITE, ALLOC
[25] STACK
PROGBITS 0000000000210860 0000000000010860 0
00000000000860e0 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
[26] .bss
NOBITS 0000000000296940 0000000000096940 0
0000000000000580 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
Update 4
Managed to get information from the author of the shared library.
__attribute__((section("STACK"))) was added since he had not managed to build the library on Solaris. Then he found this workaround. Before the workaround the definition of the my_var_1 was like:
int my_var_1 = 0;
and everything was OK. Then he changed it since my_var_1 was in fact needed only in this translation unit:
static int my_var_1 = 0;
And after that change he did not manage to build the library on Solaris. So he added __attribute__((section("STACK"))) and it helped somehow.
First the STACK section won't be the stack of any running task.
Putting variables, functions in a specific Section allow to select a memory area for them (thanks to the linker script). On some (mostly embedded) architecture, you want put often accessed data in the faster memory.
Other solution, some development post-link script will set all the STACK section to 1: a development software will always do do_something_else(). And the released software may keep the default value of 0.
An other possibility, if there are other variables in the STACK section, the developer wants to keep them close in the memory. All Variable in the STACK section will be near each other. Maybe a cache optimization ?
There may be many reasons and it is difficult to tell without details. Some of the reasons might be:
The section marked STACK is linked in run-time to a closely coupled memory with faster access time then other RAMs. It makes sense to map the stack to such a RAM to avoid stalls during function calls. Now if you suddenly had a variable that is accessed a lot and you wanted to map it to the same fast access RAM putting it in the same section as the stack makes sense.
The section marked STACK might be mapped to a region of memory that is accessible when other parts of memory might not be. For example, boot loaders need to init the memory controller before they can access RAM. But you really want to be able to write the code that does that in C, which requires stack. So you find some special memory (such as programming the data cache to write-back mode) and map the stack there so you can run code to get the memory controller working so you can use RAM. Once again, if you now happen to have a global variable that still need to be accessed before RAM is available, you might decide to put it in the STACK section.
A better programmer would have renamed the STACK section to something else if it is used not only for stack.
In some operating systems, the same region of addressing space is used for every thread's stack; when execution switches between threads, the mapping of that space is changed accordingly. On such systems, every thread will have its own independent set of any static variables located within that region of address space. Putting variables that need to be maintained separately for each thread in such an address range will avoid the need to manually swap them with each task switch.
Another occasional use for forcing variables into a stack area is to add stack sentinels (variables that can be periodically checked to see if a stack overflow clobbered them).
A third use occurs on 8086 and 80286 platforms (probably not so much later chips in the family): the 8086 and 80286 are limited to efficiently accessing things in four segments without having to reload segment registers. If code needs to do something equivalent to
for (n=0; n<256; n++)
*dest++ = xlat[*src++];
and none of the items can be put in the code segment, being able to force one of the items into the stack segment can make code much faster. Hand-written assembly code would be required to achieve the speedup, but it can be extremely massive (nearly a factor of two in some real-world situations I've done on 8086, and perhaps even greater in some situations on the 80286).

Resources