I'm working on a project that involves getting the CMSIS-RTOS packaging of FreeRTOS working on an STM32F051C6. I'm writing and debugging the code with VisualGDB inside of Visual Studio, and generating the project code using the STM32CubeMX tool provided by ST. The RTOS is running incredibly well and I'm all smiles, however, I added a queue and a memory pool to handle sending and receiving messages between tasks and the compiler complained that the .bss memory section that was compiled/generated would not fit in the memory section set out in the linker. This was resolved by decreasing the heap size in the FreeRTOS configuration header.
I'm a little unhappy about where this may take me when I want to make the project more complex (more tasks, queues etc) since I may start having to decrease the stack even further to allow the .bss section to fit.
So my question is - would a solution to this be to extend the .bss section into the .data section (the section above) to allow for more heap and uninitialized data in the .bss section? After some looking around and experimenting, I found that only about 1% (if not, less) of the .data section is actually being used, according to VisualGDB's Memory Explorer window at build time, and it seems crazy to have all that unused RAM.
In an attempt to do this myself, I had a thorough look through both the linker scripts and the startup code and I could not find where to define the start and end of the .bss. Is it possible to define these boundaries, how would I be able to do so if possible? If not possible, how does the linker know where these boundaries are on the target chip?
Below are what I think are the relevant sections in the linker script:
.data :
{
. = ALIGN(4);
_sidata = .;
_sdata = _sidata;
PROVIDE(__data_start__ = _sdata);
*(.data)
*(.data*)
. = ALIGN(4);
_edata = .;
PROVIDE(__data_end__ = _edata);
} > SRAM
.bss :
{
. = ALIGN(4);
_sbss = .;
PROVIDE(__bss_start__ = _sbss);
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .;
PROVIDE(__bss_end__ = _ebss);
} > SRAM
PROVIDE(end = .);
...and the startup file:
extern void *_sidata, *_sdata, *_edata;
extern void *_sbss, *_ebss;
void __attribute__((naked, noreturn)) Reset_Handler()
{
//Normally the CPU should will setup the based on the value from the first entry in the vector table.
//If you encounter problems with accessing stack variables during initialization, ensure
//asm ("ldr sp, =_estack");
void **pSource, **pDest;
for (pSource = &_sidata, pDest = &_sdata; pDest != &_edata; pSource++, pDest++)
*pDest = *pSource;
for (pDest = &_sbss; pDest != &_ebss; pDest++)
*pDest = 0;
SystemInit();
__libc_init_array();
main();
for (;;) ;
}
I think you can not set size of .bss or .data section explicitly. It is only defined by your program. You can view how much memory each variable consumes from map-file.
Example:
.bss.xHeap 0x0000000020000690 0xa000 obj/heap_2.o
In my setup FreeRTOS configured with
#define configTOTAL_HEAP_SIZE ( ( size_t ) ( 40 * 1024 ) )
and I see all this 40 KB in my map-file.
Linker knows how much memory your MCU has and tries to place all memory objects to it. If it doesn't fit you see an error.
You can not "extend" .bss into .data section because the former is occupied by uninitialized statically-allocated variables and the latter by initialized statically-allocated variables. If you can get rid of some of statically-allocated variables (e.g. globals) your .bss or/and .data section sizes will reduce. 8 KB of RAM is not very much.
Maybe you should look for ._user_heap_stack section (or similar name) in you linker script and optimize heap (don't confuse with FreeRTOS heap) and stack size values. E.g. if you use only FreeRTOS memory allocation then you can set heap size as 0 and this will allow more space for .bss section.
From my project:
_Min_Heap_Size = 0; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
._user_heap_stack :
{
. = ALIGN(4);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(4);
} >RAM
Related
I'm presently working on adding a ".ccmbss" section into my linker script, where I'm trying to place some uninitialized variables/structs. This is separate from the normal ".bss" section, with the key differentiator being that .ccmbss is located in the core-coupled memory RAM and .bss is located in normal RAM.
When I try placing a large uninitialized struct (~32kB) within the .ccmbss section using the C __attribute__((section(.ccmbss))) designator, I notice that it gets an allocated section of flash of the same size. However, when I don't explicitly set it to move into the .ccmbss section, it gets allocated to the .bss section (as expected) and doesn't take up any flash space.
Any ideas as to how to guarantee that the ccmbss section does not get an associated flash region of memory?
Here is the key part of the linker file that I am working on:
.ccm : {
. = ALIGN(4);
_siccmram = .;
*(.ccm)
_eiccmram = .;
. = ALIGN(4);
} >ccm AT >app
_ccm_loadaddr = LOADADDR(.ccm);
.ccmbss : {
. = ALIGN(4);
*(.ccmbss)
*(.ccmbss*)
. = ALIGN(4);
_eccmbss = .;
_eccmram = .;
} >ccm
.data : {
_data = .;
*(.data*)
. = ALIGN(4);
_edata = .;
} >ram AT >app
_data_loadaddr = LOADADDR(.data);
.bss : {
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .;
} >ram
So it looks like Eugene and Ian were on the right track, but unfortunately it didn't work for me as I'm using the ARM-specific compiler (arm-none-eabi-gcc), which threw errors when coming across the first comma in __attribute__((section(".ccmbss,/"aw/",#nobits#))): Error: junk at end of line, first unrecognized character is `,'
After spending way too long trying different things to fix this (pragmas, the zero_init specifier for __attribute__((section()))), I did find a solution here. The resolution was to rename .ccmbss to .bss.ccm so that the section was defaulting to a %nobits section type, which prevents the section from taking up ROM space.
Thank you for your help!!
I have noticed that in the TI for the CC3200 (ARMv8 / ARM Cortex M4) examples of the startup_gcc.c the actual data section within the application image is copied to a different location. The application image itself is copied from flash to SRAM by the cc3200s internal bootloader.
The application image itself is loaded into SRAM and run this way.
So in my opinion this is a total waste of memory, as the copies the data section to another place in SRAM. Am I missing something? Would the removing the code section out of the ResetISR and altering the Linker file would work fine and just use the memory within the application image in SRAM itself?
ResetISR:
uint32_t *pui32Src, *pui32Dest;
pui32Src = &__init_data;
for(pui32Dest = &_data; pui32Dest < &_edata; )
{
*pui32Dest++ = *pui32Src++;
}
Linker:
.text :
{
_text = .;
KEEP(*(.intvecs))
*(.bss.gpCtlTbl)
*(.text*)
*(.ARM.extab* .gnu.linkonce.armextab.*)
. = ALIGN(8);
_etext = .;
} > SRAM
.rodata :
{
*(.rodata*)
} > SRAM
.ARM : {
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} > SRAM
__init_data = .;
.data : AT(__init_data)
{
_data = .;
*(.data*)
. = ALIGN (8);
_edata = .;
} > SRAM
Edited Linker without copy (and changing the linker):
.data
{
_data = .;
*(.data*)
. = ALIGN (8);
_edata = .;
} > SRAM
This kind of thing is normal when you are loading to ROM. I would expect __init_data to point to an address in ROM, in which case the copy loads it from there to RAM.
In your case it appears that everything is already in SRAM, so there is no need to do a copy of the initialized data.
The only question is, how does the internal bootloader know how big the image is and how much to copy? As long as it includes the data section in its image size then you should be fine to remove the copy loop, and the : AT(__init_data).
It should be easy to test, just define a static int x = 42; and then if (x == 42) { led(on); } or similar.
I am not aware of the capabilities of the particular processor you are using but on X86 for example doing this allows the image to be loaded read-only. The data is then copied to pages that can be written to (actually for X86 in particular, copy-on-write is generally used for these pages so that multiple processes can initialize .data from the same memory and not copy pages that aren't actually changed).
In order to not need this step the image would need to be written with the various sections padded to page-alignment but people generally prefer that the image be as small as possible while containing all needed information.
I am looking at trezor's bootloader linker script:
/* TREZORv2 bootloader linker script */
ENTRY(reset_handler)
MEMORY {
FLASH (rx) : ORIGIN = 0x08020000, LENGTH = 128K
CCMRAM (wal) : ORIGIN = 0x10000000, LENGTH = 64K
SRAM (wal) : ORIGIN = 0x20000000, LENGTH = 192K
}
main_stack_base = ORIGIN(CCMRAM) + LENGTH(CCMRAM); /* 8-byte aligned full descending stack */
/* used by the startup code to populate variables used by the C code */
data_lma = LOADADDR(.data);
data_vma = ADDR(.data);
data_size = SIZEOF(.data);
/* used by the startup code to wipe memory */
ccmram_start = ORIGIN(CCMRAM);
ccmram_end = ORIGIN(CCMRAM) + LENGTH(CCMRAM);
/* used by the startup code to wipe memory */
sram_start = ORIGIN(SRAM);
sram_end = ORIGIN(SRAM) + LENGTH(SRAM);
_codelen = SIZEOF(.flash) + SIZEOF(.data);
SECTIONS {
.header : ALIGN(4) {
KEEP(*(.header));
} >FLASH AT>FLASH
.flash : ALIGN(512) {
KEEP(*(.vector_table));
. = ALIGN(4);
*(.text*);
. = ALIGN(4);
*(.rodata*);
. = ALIGN(512);
} >FLASH AT>FLASH
.data : ALIGN(4) {
*(.data*);
. = ALIGN(512);
} >CCMRAM AT>FLASH
.bss : ALIGN(4) {
*(.bss*);
. = ALIGN(4);
} >CCMRAM
.stack : ALIGN(8) {
. = 4K; /* this acts as a build time assertion that at least this much memory is available for stack use */
} >CCMRAM
}
It can be found here.
I understand that the code needs to be 32bit ( ALIGN(4) ) aligned, because the ARM processor can crash if it tries to access unaligned address, but I do not understand why the stack alignment is 8 bytes and furthermore why the hell do you need to waste(?) 512 bytes for alignment of the flash section?!
I would like to understand how the alignment is decided when writing a linker script.
Thank you in advance for your answers!
EDIT:
I think i answered my own question:
1. .flash section:
It is aligned like that, because the vector table, that is inside it always needs to be "32-word aligned". This can also be seen be the case in Trezor's boardloader linker script. As you can see the vector table is 512 byte (4 x 32-word) aligned.
2. .stack section:
According to ARM's own documentation the stack section needs to always be 8 byte aligned.
P.S. Of course if this is not the case, please correct me.
Okay, so since cooperised confirmed my theory I can now close this question.
1. .flash section:
It is aligned like that, because the vector table, that is inside it always needs to be "32-word aligned". This can also be seen be the case in Trezor's boardloader linker script. As you can see the vector table is 512 byte (4 x 32-word) aligned.
2. .stack section:
According to ARM's own documentation the stack section needs to always be 8 byte aligned.
Thank you cooperised for the confirmation!
I'm writing a bare metal ARM boot loader and am trying to use some internal SRAM as a scratch pad to communicate to the application code. For my needs I don't need to initialise or zero the memory. Using this script I can place my desired variables in the memory just fine.
/**
* Linker script for secondary bootloader.
*
* Allocatest the first 1Mb of DRAM for its use.
* Scratchpad in internal SRAM.
*/
MEMORY
{
SRAM : o = 0x402F0400, l = 0x0000FC00 /* 63kB available internal SRAM */
DDR0 : o = 0x80000000, l = 1M /* 1Mb external DDR Bank 0 */
}
OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
SECTIONS
{
.startcode :
{
__AppBase = .;
. = ALIGN(4);
*init.o (.text)
} >DDR0
.text :
{
. = ALIGN(4);
*(.text*)
*(.rodata*)
} >DDR0
.data :
{
. = ALIGN(4);
*(.data*)
} >DDR0
.bss :
{
. = ALIGN(4);
_bss_start = .;
*(.bss*)
*(COMMON)
_bss_end = .;
} >DDR0
.stack :
{
. = ALIGN(4);
__StackLimit = . ;
*(.stack*)
. = __AppBase + 1M;
__StackTop = .;
} >DDR0
_stack = __StackTop;
.internal_ram :
{
. = ALIGN(4);
*(.internal_ram*)
} >SRAM
}
When using objcopy to create the raw binary, I'm getting huge files. I'm assuming this is because the first bytes of the raw binary are actually the internal memory with megabytes of padding up to the start of the .text section. Objdump -h shows that the internal_ram section being marked with the CONTENTS, LOAD, and DATA flags even though the variables placed there are not initialised.
I can clean this up in objcopy using --remove-section=.internal_ram but it seems there should be a way to get the linker to recognise that the data is not initialised.
Is there a way to mark the section appropriately?
The correct section declaration is:
.internal_ram (NOLOAD) :
{
. = ALIGN(4);
*(.internal_ram*)
} >SRAM
The NOLOAD section attribute is documented but speaks in terms of program loaders handling the section at load time. At first this doesn't seem to apply to bare metal images but, for that purpose, objcopy acts like a program loader and honors the flag settings in the object file, omitting the section from the raw image.
The other answer mentions this as well - the key is to make the section NOLOAD so that the data remains uninitialized.
The `(NOLOAD)’ directive will mark a section to not be loaded at run time. The linker will process the section normally, but will mark it so that a program loader will not load it into memory.
A quote from Ashley Duncan that you might find useful:
NOLOAD is useful in embedded projects for making sure a block of RAM is not initialised or zeroed. For example if you want the contents of that RAM to not lose its values during a software reset (e.g. if you want to set a variable with the reason you are resetting). Another useful application is to pass information from a boot loader to application without the application startup code overwriting the values of that memory area. Of course in this case both the boot loader and application linker files need to declare the exact same memory area location and size.
Some more explanation/story can be found here
I'm working on a university project where I'm writing software for an Atmel SAM7S256 microcontroller from the ground up. This is more in depth than other MCUs I've worked with before, as a knowledge of linker scripts and assembly language is necessary this time around.
I've been really scrutinizing example projects for the SAM7S chips in order to fully understand how to start a SAM7/ARM project from scratch. A notable example is Miro Samek's "Building Bare-Metal ARM Systems with GNU" tutorial found here (where the code in this question is from). I've also spent a lot of time reading the linker and assembler documentation from sourceware.org.
I'm quite happy that I understand the following linker script for the most part. There's just one thing involving the location counter that doesn't make sense to me. Below is the linker script provided with the above tutorial:
OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(_vectors)
MEMORY { /* memory map of AT91SAM7S64 */
ROM (rx) : ORIGIN = 0x00100000, LENGTH = 64k
RAM (rwx) : ORIGIN = 0x00200000, LENGTH = 16k
}
/* The sizes of the stacks used by the application. NOTE: you need to adjust */
C_STACK_SIZE = 512;
IRQ_STACK_SIZE = 0;
FIQ_STACK_SIZE = 0;
SVC_STACK_SIZE = 0;
ABT_STACK_SIZE = 0;
UND_STACK_SIZE = 0;
/* The size of the heap used by the application. NOTE: you need to adjust */
HEAP_SIZE = 0;
SECTIONS {
.reset : {
*startup.o (.text) /* startup code (ARM vectors and reset handler) */
. = ALIGN(0x4);
} >ROM
.ramvect : { /* used for vectors remapped to RAM */
__ram_start = .;
. = 0x40;
} >RAM
.fastcode : {
__fastcode_load = LOADADDR (.fastcode);
__fastcode_start = .;
*(.glue_7t) *(.glue_7)
*isr.o (.text.*)
*(.text.fastcode)
*(.text.Blinky_dispatch)
/* add other modules here ... */
. = ALIGN (4);
__fastcode_end = .;
} >RAM AT>ROM
.text : {
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
*(.glue_7) /* glue arm to thumb (NOTE: placed already in .fastcode) */
*(.glue_7t)/* glue thumb to arm (NOTE: placed already in .fastcode) */
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* global symbol at end of code */
} >ROM
.preinit_array : {
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(SORT(.preinit_array.*)))
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} >ROM
.init_array : {
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} >ROM
.fini_array : {
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(.fini_array*))
KEEP (*(SORT(.fini_array.*)))
PROVIDE_HIDDEN (__fini_array_end = .);
} >ROM
.data : {
__data_load = LOADADDR (.data);
__data_start = .;
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .;
} >RAM AT>ROM
.bss : {
__bss_start__ = . ;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = .;
} >RAM
PROVIDE ( end = _ebss );
PROVIDE ( _end = _ebss );
PROVIDE ( __end__ = _ebss );
.heap : {
__heap_start__ = . ;
. = . + HEAP_SIZE;
. = ALIGN(4);
__heap_end__ = . ;
} >RAM
.stack : {
__stack_start__ = . ;
. += IRQ_STACK_SIZE;
. = ALIGN (4);
__irq_stack_top__ = . ;
. += FIQ_STACK_SIZE;
. = ALIGN (4);
__fiq_stack_top__ = . ;
. += SVC_STACK_SIZE;
. = ALIGN (4);
__svc_stack_top__ = . ;
. += ABT_STACK_SIZE;
. = ALIGN (4);
__abt_stack_top__ = . ;
. += UND_STACK_SIZE;
. = ALIGN (4);
__und_stack_top__ = . ;
. += C_STACK_SIZE;
. = ALIGN (4);
__c_stack_top__ = . ;
__stack_end__ = .;
} >RAM
/* Remove information from the standard libraries */
/DISCARD/ : {
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
}
Throughout the example (such as in the .ramvect, .fastcode and .stack sections) there are symbol definitions such as __ram_start = .;. These addresses are used by the startup assembly code and initialization C code in order to initialize the correct locations in the MCU's RAM.
What I have a problem understanding, is how these symbol definitions result in the correct values being assigned. This does happen, the script is correct, I just don't understand how.
The way I understand it, when you use the location counter within a section, it only contains a relative offset from the virtual memory address (VMA) of the section itself.
So for example, in the line __ram_start = .;, I would expect __ram_start to be assigned a value of 0x0 - as it is assigned the value of the location counter at the very beginning of the .ramvect section. However, for the initialization code to work correctly (which it does), __ram_start must be getting assigned as 0x00200000 (the address for the beginning of RAM).
I would have thought this would only work as intended if the line was instead __ram_start = ABSOLUTE(.); or __ram_start = ADDR(.ramvect);.
The same goes for __fastcode_start and __stack_start__. They can't all be getting defined as address 0x0, otherwise the program wouldn't work. But the documentation linked here seems to suggest that that's what should be happening. Here's the quote from the documentation:
Note: . actually refers to the byte offset from the start of the current containing object. Normally this is the SECTIONS statement, whose start address is 0, hence . can be used as an absolute address. If . is used inside a section description however, it refers to the byte offset from the start of that section, not an absolute address.
So the location counter values during those symbol assignments should be offsets from the corresponding section VMAs. So those "_start" symbols should all be getting set to 0x0. Which would break the program.
So obviously I'm missing something. I suppose it could simply be that assigning the location counter value to a symbol (within a section) results in ABSOLUTE() being used by default. But I haven't been able to find a clear explanation anywhere that confirms this.
Thanks in advance if anybody can clear this up.
I think I may have figured out the answer to my own question. I'm not sure I'm right, but it's the first explanation I've been able to think of that actually makes sense. What made me rethink things was this page of the documentation. Particularly this quote:
Addresses and symbols may be section relative, or absolute. A section
relative symbol is relocatable. If you request relocatable output
using the `-r' option, a further link operation may change the value
of a section relative symbol. On the other hand, an absolute symbol
will retain the same value throughout any further link operations.
and this quote:
You can use the builtin function ABSOLUTE to force an expression to be
absolute when it would otherwise be relative. For example, to create
an absolute symbol set to the address of the end of the output section
.data:
SECTIONS
{
.data : { *(.data) _edata = ABSOLUTE(.); }
}
If ABSOLUTE were not used, _edata would be relative to the .data
section.
I had read them before, but this time I saw them from a new perspective.
So I think my misinterpretation was thinking that a symbol, when assigned a relative byte offset address, is simply set to the value of that offset while the base address information is lost.
That was based on this quote from my original question:
Note: . actually refers to the byte offset from the start of the
current containing object. Normally this is the SECTIONS statement,
whose start address is 0, hence . can be used as an absolute address.
If . is used inside a section description however, it refers to the
byte offset from the start of that section, not an absolute address.
Instead what I now understand to be happening is that the base address information is not lost. The symbol does not simply get assigned the value of the offset from the base address. The symbol will still eventually resolves to an absolute address, but only when there's no chance its base address can change.
So where I thought that something like __stack_start__ = . ; should have to be changed to __stack_start__ = ABSOLUTE(.) ;, which does work, I now think it is unnecessary. What's more, I understand from the first quote in this response that you can relink an ELF file?
So if I used __stack_start__ = ABSOLUTE(.) ;, ran the linker script to create the ELF executable, then tried to relink it and moved the .stack section somewhere else, the __stack_start__ symbol would still be pointing to the same absolute address from the first link, and thus be incorrect.
This is probably hard to follow, but I've written it as articulately as I could. I suspect I've got close to the right idea, but I still need someone who actually knows about this stuff to confirm or deny this.
The placement of the section is determined by the memory region after the closing brace (>RAM AT>ROM). So the execution address is in RAM at 0x00200000 and following, but the load address is in ROM (flash) at 0x00100000. The startup code must copy the .fastcode output section from its load to its execution address, that's what the symbols are for.
Note that these need not be at address 0, because the AT91SAM7S remaps either RAM or ROM to address 0. Usually it starts up with ROM mapped, and the startup code switches that to RAM.
This question also troubled me, Give my understanding:
.ramvect : { /* used for vectors remapped to RAM */
__ram_start = .;
. = 0x40;
} >RAM
The above statement tells the linker to place the __ram_start symbol at location counter, that is at the start of the .ramvect segment.
Since the __ram_start symbol is located at the head of the .ramvect segment, when the C code is used to get the __ramvect address, it will get the starting address of the.ramvect segment, i.e. its absolute address.