Stack initialisation in GNU ARM toolchain - arm

Checking the startup file provided as an example in the GNU ARM toolchain, I couldnt understand one thing.
Code snippets provided here are taken from examples included in GNU ARM Embedded Toolchain files downloaded from official website. Code compiles and everything seems to be good.
I am wondering why they wrote this code exactly like that, why they are using same names for example?
I am wondering why my linker is not complaining about multiple definition error for __StackTop and __StackLimit. Here is the part of the file startup_ARMCM0.S
.syntax unified
.arch armv6-m
.section .stack
.align 3
#ifdef __STACK_SIZE
.equ Stack_Size, _*emphasized text*_STACK_SIZE
#else
.equ Stack_Size, 0xc00
#endif
.globl __StackTop
.globl __StackLimit
__StackLimit:
.space Stack_Size
.size __StackLimit, . - __StackLimit
__StackTop:
.size __StackTop, . - __StackTop
If the linker is defining the same symbols: __StackTop and __StackLimit.
.stack_dummy (COPY):
{
*(.stack*)
} > RAM
/* Set stack top to end of RAM, and stack limit move down by
* size of stack_dummy section */
__StackTop = ORIGIN(RAM) + LENGTH(RAM);
__StackLimit = __StackTop - SIZEOF(.stack_dummy);
PROVIDE(__stack = __StackTop);
While checking linker documentation, it was written that, given the example:
SECTIONS
{
.text :
{
*(.text)
_etext = .;
PROVIDE(etext = .);
}
}
In this example, if the program defines _etext (with a leading
underscore), the linker will give a multiple definition error. If, on
the other hand, the program defines etext (with no leading
underscore), the linker will silently use the definition in
the program. If the program references etext but does not define
it, the linker will use the definition in the linker script.
Also, when using readelf -s just to check symbols generated from assembly file startup_ARMCM0.S without linking, I can see the symbol __StackTop and __StackLimit with one values. While, after linking they have the values set up by the linker (keeping in mind that the value of the linker is actually stored in address of the symbol).

Related

GCC Linker - Locate a section/constant at a specific address within the .text section

I would like to locate a 32bit constant value at a specific address (0x080017FC) within the .text (code) section.
To be honest, when it comes to modifying the linker script to this extent I'm naïve and feel like I do not have a clue what to do.
I've modified my linker script to contain this new section (.systemid) within the .text section.
.text :
{
. = ALIGN(4);
KEEP(*(.systemid))
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
To ensure it does not get optimized away, I used KEEP.
I then declared my constant in the new section (.systemid). This is where I start to wonder what am I supposed to do. If .systemid was a section on its own, I would have declared the constant as follows:
const uint32_t __attribute__((used, section (".systemid"))) SYSTEM_ID_U32 = 0x11223344;
But since this is a section within a section, should it not be?:
uint32_t __attribute__((used, section (".text.systemid"))) SYSTEM_ID_U32 = 0x11223344;
So the linker will locate the constant at the beginning of the .text section (0x000001A0). Great, it is inside the text section but not at the correct address. I would like to locate the constant at 0x08001F7C.
To try and achieve this, I pass the following to the linker:
-Wl,--section-start=.text.systemid=0x080017FC
Again I'm not sure if it should be .systemid or .text.systemid
Either way, it does not locate the constant at 0x080017FC
How do I get my constant to be located at 0x080017FC within the .text (code) section without any overlap errors?
It will not work this way. There is no way I am aware of placing section at the particular address without problems from the linker if it is part of another section. Linker is quite a simple program and will not optimize the memory to avoid your location.
I use two methods:
Place this id at the end of the FLASH. You cant do this at the beginning as there is the vector table.
const uint32_t __attribute__((used, section (".systemid"))) SYSTEM_ID_U32 = 0x11223344;
Place after all other sections in FLASH (it can be the last section definition
.systemid :
{
. = ORIGIN(FLASH) + LENGTH(FLASH) - 4;
KEEP(*(.systemid))
} >FLASH
or
.systemid ORIGIN(FLASH) + LENGTH(FLASH) - 4:
{
KEEP(*(.systemid))
} >FLASH

Understanding ARM Cortex-M0+ relocation

I'm just getting started with embedded arm development, and there's a snippet of code that's really bugging me:
/* Initialize the relocate segment */
pSrc = &_etext;
pDest = &_srelocate;
if (pSrc != pDest)
{
while (pDest < &_erelocate)
{
*pDest++ = *pSrc++;
}
}
Where _etext and _srelocate are symbols defined in the linker script:
. = ALIGN(4);
_etext = .;
.relocate : AT (_etext)
{
. = ALIGN(4);
_srelocate = .;
*(.ramfunc .ramfunc.*);
*(.data .data.*);
. = ALIGN(4);
_erelocate = .;
} > ram
Where ram is a memory segment whose origin is 0x20000000. The issue as I see it is that _etext is a symbol that marks the end boundary of the .text segment, which is a part of a different memory segment, rom. This means that unless the aforementioned memory segment was 100% full, _etext != _srelocate will always be true. What this means is that we're copying memory beyond the .text section where nothing is defined to live according to the linker script.
To me, this leads to one of three scenarios, either A) There is garbage present in rom beyond the .text section, and it gets copied into .relocate (and subsequently .data), or B) The memory beyond .text is empty following a chip erase operation prior to device programming, in which case .relocate is zeroed, or C) There is some slight of hand magic happening here where .data values are placed after .text in rom, and must be loaded into ram; in which case the comment should be s/relocate/data.
The third scenario seems the most likely, but according to the linker script this can't be true. Can someone shed some light on this?
You're right, it's the third option. The AT() tells the linker to put the .ramfunc and .data sections (both of which are read/write) in the object file starting at _etext. The "> ram" tells the linker to resolve relocations as if the sections were placed in RAM this is done using a MEMORY command as decribed here: https://sourceware.org/binutils/docs/ld/MEMORY.html#MEMORY The result is that the copy loop moves the data from the read only area to the read/write area when the program starts up.
Here's a link to the gnu ld documentation where controlling the LMA (load address) is described: https://sourceware.org/binutils/docs/ld/Output-Section-LMA.html

Local and static variables in C (cont'd)

Building on my last question i'm trying to figure out how .local and .comm directives work exactly and in particular how they affect linkage and duration in C.
So I've run the following experiment:
static int value;
which produces the following assembly code (using gcc):
.local value
.comm value,4,4
When initialized to zero yields the same assembly code (using gcc):
.local value
.comm value,4,4
This sounds logical because in both cases i would expect that the variable will be stored in the bss segment. Moreover, after investigating using ld --verbose it looks that all .comm variables are indeed placed in the bss segment:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
// ...
}
When i initialize however my variable to a value other than zero, the compiler defines the variable in the data segment as i would expected, but produces the following output:
.data
.align 4
.type value, #object
.size value, 4
value:
.long 1
Besides the different segments (bss and data respectively) which thanks to your help previously i now understand, my variable has been defined as .local and .comm in the first example but not in the second. Could anyone explain that difference between the two outputs produced from each case?
The .local directive marks a symbol as a local, non-externally-visible symbol, and creates it if it doesn't already exist. It's necessary for 0-initialized local symbols, because .comm declares but does not define symbols. For the 1-initialized variant, the symbol itself (value:) declares the symbol.
Using .local and .comm is essentially a bit of a hack (or at least a shorthand); the alternative would be to place the symbol into .bss explicitly:
.bss
.align 4
.type value, #object
.size value, 4
value:
.zero 4
Linux kernel zeros the virtual memory of a process after allocation due to security reasons. So, the compiler already knows that the memory will be filled with zeros and does an optimization: if some variable is initialized to 0, there's no need to keep space for it in a executable file (.data section actually takes some space in ELF executable, whereas .bss section stores only its length assuming that its initial contents will be zeros).

What does KEEP mean in a linker script?

The LD manual does not explain what the KEEP command does. Below is a snippet from a third-party linker script that features KEEP. What does the KEEP command do in ld?
SECTIONS
{
.text :
{
. = ALIGN(4);
_text = .;
PROVIDE(stext = .);
KEEP(*(.isr_vector))
KEEP(*(.init))
*(.text .text.*)
*(.rodata .rodata.*)
*(.gnu.linkonce.t.*)
*(.glue_7)
*(.glue_7t)
*(.gcc_except_table)
*(.gnu.linkonce.r.*)
. = ALIGN(4);
_etext = .;
_sidata = _etext;
PROVIDE(etext = .);
_fini = . ;
*(.fini)
} >flash
Afaik LD keeps the symbols in the section even if symbols are not referenced. (--gc-sections).
Usually used for sections that have some special meaning in the binary startup process, more or less to mark the roots of the dependency tree.
(For Sabuncu below)
Dependency tree:
If you eliminate unused code, you analyze the code and mark all reachable sections (code+global variables + constants).
So you pick a section, mark it as "used" and see what other (unused) section it references, then you mark those section as "used", and check what they reference etc.
The section that are not marked "used" are then redundant, and can be eliminated.
Since a section can reference multiple other sections (e.g. one procedure calling three different other ones), if you would draw the result you get a tree.
Roots:
The above principle however leaves us with a problem: what is the "first" section that is always used? The first node (root) of the tree so to speak? This is what "keep()" does, it tells the linker which sections (if available) are the first ones to look at. As a consequence these are always linked in.
Typically these are sections that are called from the program loader to perform tasks related to dynamic linking (can be optional, and OS/fileformat dependent), and the entry point of the program.
Minimal Linux IA-32 example that illustrates its usage
main.S
.section .text
.global _start
_start:
/* Dummy access so that after will be referenced and kept. */
mov after, %eax
/*mov keep, %eax*/
/* Exit system call. */
mov $1, %eax
/* Take the exit status 4 bytes after before. */
mov $4, %ebx
mov before(%ebx), %ebx
int $0x80
.section .before
before: .long 0
/* TODO why is the `"a"` required? */
.section .keep, "a"
keep: .long 1
.section .after
after: .long 2
link.ld
ENTRY(_start)
SECTIONS
{
. = 0x400000;
.text :
{
*(.text)
*(.before)
KEEP(*(.keep));
*(.keep)
*(.after)
}
}
Compile and run:
as --32 -o main.o main.S
ld --gc-sections -m elf_i386 -o main.out -T link.ld main.o
./main.out
echo $?
Output:
1
If we comment out the KEEP line the output is:
2
If we either:
add a dummy mov keep, %eax
remove --gc-sections
The output goes back to 1.
Tested on Ubuntu 14.04, Binutils 2.25.
Explanation
There is no reference to the symbol keep, and therefore its containing section .keep.
Therefore if garbage collection is enabled and we don't use KEEP to make an exception, that section will not be put in the executable.
Since we are adding 4 to the address of before, if the keep section is not present, then the exit status will be 2, which is present on the next .after section.
TODO: nothing happens if we remove the "a" from .keep, which makes it allocatable. I don't understand why that is so: that section will be put inside the .text segment, which because of it's magic name will be allocatable.
Force the linker to keep some specific sections
SECTIONS
{
....
....
*(.rodata .rodata.*)
KEEP(*(SORT(.scattered_array*)));
}

How link the same object into both data and program memory of a DSP?

I need to put the same object into different memory sections. I'm working on a DSP with separate data and program memory. The .text sections are normally stored inside the P-MEM. But I want to store the same code also inside the data memory. It is possible to copy it during run-time, but I think I should also be possible during link time.
This is what I'm looking for, but it's not working since I could not find a "copy" or "duplicate" instruction that would allow to put the same code in different sections.
MEMORY
{
/* MAP 1*/
VECS: org=0x00000000 len=0x00000400
PMEM: org=0x00000400 len=0x0000FC00
DMEM: org=0x80000000 len=0x0000F800
DMEM_FT: org=0x8000F800 len=0x00000800
}
SECTIONS
{
vectors > VECS
.text > PMEM <----- containing ALL code (also including func1.obj(.text) )
.bss > DMEM
.cinit > DMEM
.stack > DMEM
.far > DMEM
.switch > DMEM
.data > DMEM
.sysmem > DMEM
.const > DMEM
.cio > DMEM
dmem_mirror:
{
func1.obj(.text)
} > DMEM_FT
}
If I'm using the linker script above, it's clearly putting the func1.obj only inside the dmem_FT section (that`s what the linker is supposed to do!), but that is not what I want :-/ . I'm working with the Texas Instruments compiler and linker, but the syntax is the same as on a GCC linker.
A quick look at the GNU ld manual does not give an obvious solution. One possible solution does come to mind. You could do a partial (ld -r) link of func1.obj, sending all the sections except .text to the special section /DISCARD/ and only outputting the .text section to e.g. func1a.obj. Unfortunately, I think you'll see multiple symbol definition errors from the linker when you actually do the final link.

Resources