implict intialized variable not in .bss section - c

[Compiler gcc-arm-8.2-2019.01]
I have a variable in .c:int ep;I expect it is put in .bss section and having 0 initial value, but it is excluded from .bss section after I check the .map file.
If I change the line to:
int ep = 0;
The var is put into .bss section.
My guess is compiler found that the var is a writeonly, so no need initialize its value to 0.
But this var is read by assembly code, also read by another program, which compiler seems not knowing and make the wrong optimize choice.
Any help on this behavior?
excluded from .bss means, in linkscript file I write:
.bss :
{
. = ALIGN(64);
__bss_start = .;
*(.bss)
*(.bss.*)
__bss_end = .;
} > DRAM
but var ep is not between __bss_start and __bss_end, where assembly code use them to do memory clear before jump in C code

Related

GCC Linker - Locate a section/constant at a specific address within the .text section

I would like to locate a 32bit constant value at a specific address (0x080017FC) within the .text (code) section.
To be honest, when it comes to modifying the linker script to this extent I'm naïve and feel like I do not have a clue what to do.
I've modified my linker script to contain this new section (.systemid) within the .text section.
.text :
{
. = ALIGN(4);
KEEP(*(.systemid))
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
To ensure it does not get optimized away, I used KEEP.
I then declared my constant in the new section (.systemid). This is where I start to wonder what am I supposed to do. If .systemid was a section on its own, I would have declared the constant as follows:
const uint32_t __attribute__((used, section (".systemid"))) SYSTEM_ID_U32 = 0x11223344;
But since this is a section within a section, should it not be?:
uint32_t __attribute__((used, section (".text.systemid"))) SYSTEM_ID_U32 = 0x11223344;
So the linker will locate the constant at the beginning of the .text section (0x000001A0). Great, it is inside the text section but not at the correct address. I would like to locate the constant at 0x08001F7C.
To try and achieve this, I pass the following to the linker:
-Wl,--section-start=.text.systemid=0x080017FC
Again I'm not sure if it should be .systemid or .text.systemid
Either way, it does not locate the constant at 0x080017FC
How do I get my constant to be located at 0x080017FC within the .text (code) section without any overlap errors?
It will not work this way. There is no way I am aware of placing section at the particular address without problems from the linker if it is part of another section. Linker is quite a simple program and will not optimize the memory to avoid your location.
I use two methods:
Place this id at the end of the FLASH. You cant do this at the beginning as there is the vector table.
const uint32_t __attribute__((used, section (".systemid"))) SYSTEM_ID_U32 = 0x11223344;
Place after all other sections in FLASH (it can be the last section definition
.systemid :
{
. = ORIGIN(FLASH) + LENGTH(FLASH) - 4;
KEEP(*(.systemid))
} >FLASH
or
.systemid ORIGIN(FLASH) + LENGTH(FLASH) - 4:
{
KEEP(*(.systemid))
} >FLASH

Cannot assign address of variable defined in linker script

I found a solution, although I don't understand what went wrong. Here is the original question. The solution is at the end.
I am following this Raspberry PI OS tutorial with a few tweaks. As the title says, one assignment appears to fail.
Here is my C code:
extern int32_t __end;
static int32_t *arena;
void init() {
arena = &__end;
assert(0 != arena); // fails
...
The assert triggers! Surely the address shouldn't be 0. __end is declared in my linker script:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
.rodata : { *(.rodata) }
.data : { *(.data) }
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start=.;
.bss : { *(.bss) }
__bss_end=.;
/* First usable address for the allocator */
. = ALIGN(4);
__end = .;
}
Investigating in GDB (running it in QEMU):
Thread 1 hit Breakpoint 1, init () at os.c:75
75 arena = &__end;
(gdb) p &__end
$1 = (int32_t *) 0x9440
(gdb) p arena
$2 = (int32_t *) 0x0
(gdb) n
76 assert(0 != arena);
(gdb) p arena
$3 = (int32_t *) 0x0
GDB can find __end but my program cannot?
Here are a few other things I tried:
the tutorial's code works without an issue (implying that QEMU and the ARM compiler are working)
the assertion still fails when running without GDB (implying GDB is not the issue)
I am able to assign 0xccc to arena (implying arena is not the issue)
I am not able to assign &__end to a local variable (implying &__end is the issue).
As requested in the comments, this is how I tried to assign to a local variable:
void* arena2 = (void*)&__end;
assert(0 != arena2);
The assertion fails. In GDB:
Thread 1 hit Breakpoint 1, mem_init () at mem.c:77
77 void* arena2 = (void*)&__end;
(gdb) p arena2
$1 = (void *) 0x13
(gdb) p &__end
$2 = (int32_t *) 0x94a4
(gdb) n
78 assert(0 != arena2);
(gdb) p arena2
$3 = (void *) 0x0
(gdb) p &__end
$4 = (int32_t *) 0x94a4
assert(0 != &__end); succeeds (implying &__end is not the issue?)
N.B. This version of assert is not the same as the one in assert.h, but I don't think it causes the problem. It just checks a condition, prints the condition, and goes to a breakpoint. I can reproduce the issue in GDB with the assert commented out.
N.B.2. I previously included the ARM assembly of the C code in case there was a compiler bug
My solution is to edit the linker script to:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
. = ALIGN(4096);
.rodata : { *(.rodata) }
. = ALIGN(4096);
.data : { *(.data) }
. = ALIGN(4096);
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start = .;
.bss : { *(.bss) }
. = ALIGN(4096);
__bss_end = .;
/* First usable address for the allocator */
. = ALIGN(4096);
__end = .;
}
I don't understand why the additional ALIGNs are important.
The problem you're having here is because the "clear the BSS" loop in boot.S is also clearing some of the compiler-generated data in the ELF file that the C code is using at runtime. Notably, it is accidentally zeroing out the GOT (global offset table) which is in the .got ELF section and which is where the actual address of the __end label has been placed by the linker. So the linker correctly fills in the address in the ELF file, but then the boot.S code zeroes it, and when you try to read it from C then you get zero rather than what you were expecting.
Adding all that alignment in the linker script is probably working around this by coincidentally causing the GOT to not be in the area that gets zeroed.
You can see where the linker has put things by using 'objdump -x myos.elf'. In my test case based on the tutorial you link I see a SYMBOL TABLE which includes among other entries:
000080d4 l .bss 00000004 arena
00000000 l df *ABS* 00000000
000080c8 l O .got.plt 00000000 _GLOBAL_OFFSET_TABLE_
000080d8 g .bss 00000000 __bss_end
0000800c g F .text 00000060 kernel_main
00008000 g .text 00000000 __start
0000806c g .text.boot 00000000 _start
000080d8 g .bss 00000000 __end
00008000 g F .text 0000000c panic
000080c4 g .text.boot 00000000 __bss_start
So you can see that the linker script has set __bss_start to 0x80c4 and __bss_end to 0x80d8, which is a pity because the GOT is at 0x80c4/0x80c8. I think what has happened here is that because you didn't specify explicitly in your linker script where to put the .got and .got.plt sections, the linker has decided to put them after the __bss_start assignment and before the .bss section, so they get covered by the zeroing code.
You can see what the ELF file contents of the .got are with 'objdump --disassemble-all myos.elf', which among other things includes:
Disassembly of section .got:
000080c4 <.got>:
80c4: 000080d8 ldrdeq r8, [r0], -r8 ; <UNPREDICTABLE>
so you can see we have one GOT table entry, whose contents are the address 0x80d8 which is the __end value we want. When the boot.S code zeroes this out your C code reads a 0 rather than the constant it was expecting.
You should probably ensure that the bss start/end are at least 16-aligned, because the boot.S code works via a loop that clears 16 bytes at a time, but I think that if you fix your linker script to explicitly put the .got and .got.plt sections somewhere then you'll find you don't need the 4K alignments everywhere.
FWIW, I diagnosed this using: (1) the QEMU "-d in_asm,cpu,exec,int,unimp,guest_errors -singlestep" options to get a dump of register state and instruction execution and (2) objdump of the ELF file to figure out what the compiler's generated code was actually doing. I had a suspicion this was going to turn out to be either "accidentally zeroed data we shouldn't have" or "failed to include in the image or otherwise initialize data we should have" kind of bug, and so it turned out.
Oh, and the reason GDB was printing the right value for __end when your code wasn't was that GDB could just look directly in the debug/symbol info in the ELF file for the answer; it wasn't doing it by going via the in-memory GOT.

Understanding ARM Cortex-M0+ relocation

I'm just getting started with embedded arm development, and there's a snippet of code that's really bugging me:
/* Initialize the relocate segment */
pSrc = &_etext;
pDest = &_srelocate;
if (pSrc != pDest)
{
while (pDest < &_erelocate)
{
*pDest++ = *pSrc++;
}
}
Where _etext and _srelocate are symbols defined in the linker script:
. = ALIGN(4);
_etext = .;
.relocate : AT (_etext)
{
. = ALIGN(4);
_srelocate = .;
*(.ramfunc .ramfunc.*);
*(.data .data.*);
. = ALIGN(4);
_erelocate = .;
} > ram
Where ram is a memory segment whose origin is 0x20000000. The issue as I see it is that _etext is a symbol that marks the end boundary of the .text segment, which is a part of a different memory segment, rom. This means that unless the aforementioned memory segment was 100% full, _etext != _srelocate will always be true. What this means is that we're copying memory beyond the .text section where nothing is defined to live according to the linker script.
To me, this leads to one of three scenarios, either A) There is garbage present in rom beyond the .text section, and it gets copied into .relocate (and subsequently .data), or B) The memory beyond .text is empty following a chip erase operation prior to device programming, in which case .relocate is zeroed, or C) There is some slight of hand magic happening here where .data values are placed after .text in rom, and must be loaded into ram; in which case the comment should be s/relocate/data.
The third scenario seems the most likely, but according to the linker script this can't be true. Can someone shed some light on this?
You're right, it's the third option. The AT() tells the linker to put the .ramfunc and .data sections (both of which are read/write) in the object file starting at _etext. The "> ram" tells the linker to resolve relocations as if the sections were placed in RAM this is done using a MEMORY command as decribed here: https://sourceware.org/binutils/docs/ld/MEMORY.html#MEMORY The result is that the copy loop moves the data from the read only area to the read/write area when the program starts up.
Here's a link to the gnu ld documentation where controlling the LMA (load address) is described: https://sourceware.org/binutils/docs/ld/Output-Section-LMA.html

How to get the first address of initialized data segment

my program is working on linux using gcc. Through the manual page, I find edata, which represent the first address past the end of the initialized data segment.
But I want know the first address of initialized data segment
How can I get it?
I have tried treating etext as the first address of initialized data segment. Then I got a segment fault when I increase the address and access the variable stored in it. I think some address space between etext and edata was not mapped into virtual memory. Is that right?
That depends on your linker scripts. For example on some platforms you have the symbol __bss_start at the beginning of BSS. It's a symbol without any data associated with it, you can get a pointer to it by extern declaring a variable with that name (only for the sake of taking the address of that variable). For example:
#include <stdio.h>
extern char __bss_start;
int main()
{
printf("%p\n", &__bss_start);
return 0;
}
You find this by looking in the linker script, for example in /usr/lib/ldscripts/elf_x64_64.x:
.data :
{
*(.data .data.* .gnu.linkonce.d.*)
SORT(CONSTRUCTORS)
}
.data1 : { *(.data1) }
_edata = .; PROVIDE (edata = .);
__bss_start = .; /* <<<<< this is what you're looking for /*
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
You can also see the edata you mentioned, but as edata is not reserved for the implementation (the PROVIDE means only to create this symbol if it otherwise isn't used) you should probably use _edata instead.
If you want the address to the start of the data section you could modify the linker script:
__data_start = . ;
.data :
{
*(.data .data.* .gnu.linkonce.d.*)
SORT(CONSTRUCTORS)
}
.data1 : { *(.data1) }
_edata = .; PROVIDE (edata = .);
__bss_start = .; /* <<<<< this is what you're looking for /*
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
You probably want to make a copy of the linker script (look for the right one in /usr/lib/ldscripts, they are different depending on what kind of output you're targeting) and supply it when you compile:
gcc -o execfile source.c -Wl,-T ldscript
Another option if you don't want to modify the linker script could be to use the __executable_start and parse the ELF headers (hoping that the executable is sufficiently linearly mapped)
As for _etext, it is the end of the text section (you can read that in the linker script as well, but I didn't include it in the excerpt), but the text section is followed by rodata, trying to write there is probably going to segfault.
You can use the linux tool size (binutils package in Debian/Ubuntu).
Example
size -A /usr/bin/gcc
results in
/usr/bin/gcc :
section size addr
.interp 28 4194928
.note.ABI-tag 32 4194956
.note.gnu.build-id 36 4194988
.gnu.hash 240 4195024
.dynsym 4008 4195264
.dynstr 2093 4199272
.gnu.version 334 4201366
.gnu.version_r 160 4201704
.rela.dyn 720 4201864
.rela.plt 3240 4202584
.init 14 4205824
.plt 2176 4205840
.text 384124 4208016
.fini 9 4592140
.rodata 303556 4592160
.eh_frame_hdr 8540 4895716
.eh_frame 50388 4904256
.gcc_except_table 264 4954644
.tbss 16 7052632
.init_array 16 7052632
.fini_array 8 7052648
.jcr 8 7052656
.data.rel.ro 3992 7052672
.dynamic 480 7056664
.got 216 7057144
.got.plt 1104 7057384
.data 2520 7058496
.bss 80976 7061024
.gnu_debuglink 12 0
Total 849310

Meaning of arm loader script

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(_ram_entry)
SECTIONS
{
. = 0xA0008000;
. = ALIGN(4);
.text : { *(.text) }
. = ALIGN(4);
.rodata : { *(.rodata) }
. = ALIGN(4);
.data : { *(.data) }
. = ALIGN(4);
.got : { *(.got) }
. = ALIGN(4);
.bss : { *(.bss) }
}
I get the output_format, output_arch, entry... maybe meaning that the output will be as elf32-littlearm and so on.
But the Sections part is what I don't get.
this '. =' is the start.
and '. = ALIGN(4)' and .text : { *(.text) } ....
can anybody help me on this T_T
thanks for reading.
Actually, this linker description language is defined in the documentation for ld last I checked. It's not as bad as it looks. Basically, the '.' operator refers to the "current location pointer". So, the line
. = 0xA0008000
says move the location pointer to that value. The next entry, .text, basically says place all text objects into a .text section in the final ELF file starting at the location pointer (which is also adjusted to have a 4-byte (32-bit) alignment.) Note that the first use of the ALIGN is probably redundant since 0xA0008000 is already 32-bit aligned!
The next sections simply instructs the linker to emit the collection of all .rodata, .data, .got and .bss sections from all input objects into their final respective sections of the ELF binary in order, starting at 32-bit aligned addresses.
So the final ELF binary that the linker produces will have those five sections respectively and sequentially. You can see the structure of that final ELF binary using the readelf utility. It's quite useful and helps to make sense of all of this stuff. Usually there is a cross-version of readelf, something like arm-linux-gnueabi-readelf, or whatever prefix was used to generate the compiler/linker you are using. Start with readelf -S to get a summary of the sections that your ELF file contains. Then you can explore from there. Happy reading!
. = 0xA0008000;
I think, but I'm not 100% sure, is where the arm will start executing the binary
. = ALIGN(4);
defines how to align the following instruction.
.text, .data, .rodata, .got and .bss are the sections of the program. text is for instructions, data and rodata for initialized data sections and bss for uninitialized data section. got is global offset table.
.text : { *(.text) }
This copies all the instructions, similar commands are for data and global offset table.

Resources