In the ARM ABI, how are global variables accessed? - arm

I am writing a simple multitasking OS for the ARM Cortex M3. My threads always run using the Process Stack Pointer. I have an application that I inherited and that uses global variables. I am trying to call the functions in that application from my threading code but it is not accessing memory correctly. Are the following statements correct:
Those global variables are accessed via some kind of relative addressing, and that relative address is placed on the Main stack (using MSP)?
My threading code, using PSP, will never be able to access them
I need to switch to MSP when calling these functions, then back to PSP when using my threads?
**EDIT: Clarified that this is for a Cortex M

Global variables have nothing to do with the stack, even static locals.
So you need to just look at the output of the compiler, it will tell you everything.
Your question is very vague you could be asking one of many different questions. I will show some basics and maybe I will get lucky.
Note that this should in general have nothing to do with the processor, mode, etc. arm, thumb, x86, whatever. Much more to do with the toolchain.
If this is too basic and you are asking some very advanced question it is not obvious to me I will delete or rewrite, no problem.
Throwaway code is always a good idea to figure things out.
flash.s
.thumb
.syntax unified
.word 0x20001000
.word reset
.thumb_func
reset:
bl notmain
b .
notmain.c
unsigned int x;
unsigned int y=5;
void notmain ( void )
{
unsigned int z=7;
x=++y;
z--;
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00080000, LENGTH = 0x00001000
ram : ORIGIN = 0x20000000, LENGTH = 0x00001000
}
SECTIONS
{
.text : { *(.text) } > rom
.bss : { *(.bss) } > ram
.data : { *(.data) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -c notmain.c -o notmain.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o flash.elf
arm-none-eabi-objdump -D flash.elf > flash.list
arm-none-eabi-objcopy -O binary flash.elf flash.bin
examine
Disassembly of section .text:
00080000 <reset-0x8>:
80000: 20001000 andcs r1, r0, r0
80004: 00080009 andeq r0, r8, r9
00080008 <reset>:
80008: f000 f802 bl 80010 <notmain>
8000c: e7fe b.n 8000c <reset+0x4>
...
00080010 <notmain>:
80010: 4b04 ldr r3, [pc, #16] ; (80024 <notmain+0x14>)
80012: 4905 ldr r1, [pc, #20] ; (80028 <notmain+0x18>)
80014: 681a ldr r2, [r3, #0]
80016: 3201 adds r2, #1
80018: 601a str r2, [r3, #0]
8001a: 600a str r2, [r1, #0]
8001c: 685a ldr r2, [r3, #4]
8001e: 3a01 subs r2, #1
80020: 605a str r2, [r3, #4]
80022: 4770 bx lr
80024: 20000004 andcs r0, r0, r4
80028: 20000000 andcs r0, r0, r0
Disassembly of section .bss:
20000000 <x>:
20000000: 00000000 andeq r0, r0, r0
Disassembly of section .data:
20000004 <y>:
20000004: 00000005 andeq r0, r0, r5
20000008 <z.3645>:
20000008: 00000007 andeq r0, r0, r7
This is basic not relocatable, etc.
80010: 4b04 ldr r3, [pc, #16] ; (80024 <notmain+0x14>)
80014: 681a ldr r2, [r3, #0]
80016: 3201 adds r2, #1
80018: 601a str r2, [r3, #0]
80024: 20000004 andcs r0, r0, r4
Disassembly of section .data:
20000004 <y>:
20000004: 00000005 andeq r0, r0, r5
We can see the y++. r3 gets the address to y, r2 gets the value of y
r2 increments, and then is saved back to memory.
And you can see how x and z are handled as well.
Now this cannot work for an mcu for a couple of reasons. The 0x20000000
address information will not be there. Only what is in non-volatile storage
will be there when the chip powers up and comes out of reset. The above is relevant depending on what your real question is.
MEMORY
{
rom : ORIGIN = 0x00080000, LENGTH = 0x00001000
ram : ORIGIN = 0x20000000, LENGTH = 0x00001000
}
SECTIONS
{
.text : { *(.text) } > rom
.bss : { *(.bss) } > ram AT > rom
.data : { *(.data) } > ram AT > rom
}
The program does not change, but the binary does
00000000 00 10 00 20 09 00 08 00 00 f0 02 f8 fe e7 00 00 |... ............|
00000010 04 4b 05 49 1a 68 01 32 1a 60 0a 60 5a 68 01 3a |.K.I.h.2.`.`Zh.:|
00000020 5a 60 70 47 04 00 00 20 00 00 00 20 05 00 00 00 |Z`pG... ... ....|
00000030 07 00 00 00 |....|
00000034
At 0x2C we see the preload value for y and at 0x30 for z.
The .bss value is not located here. Normally what you do is add a whole lot
more stuff to the linker script to get the addresses of things. Data start and stop, and bss start and size or stop. Then a bootstrap that copies from flash to ram so that the initialized values are in ram and the read/write works.
So if your project, call it an operating system or not, is just one large body of code that is compiled and linked all together. Then without doing special things like lots of sections or something. The above is what you are looking at and the stack is not related to globals. Because it never is normally.
(msp/psp does not work the way arm implies they do, I have yet to see a use case for the second stack pointer, IF the processor even has it they do not all have it implemented)
Now if your threads are actually separately built programs that you load runtime...Then they completely live in ram. So
MEMORY
{
rom : ORIGIN = 0x00080000, LENGTH = 0x00001000
ram : ORIGIN = 0x20000000, LENGTH = 0x00001000
}
SECTIONS
{
.text : { *(.text) } > ram
.bss : { *(.bss) } > ram
.data : { *(.data) } > ram
}
and we add -fPIC
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -fPIC -c notmain.c -o notmain.o
Disassembly of section .text:
20000000 <reset-0x8>:
20000000: 20001000 andcs r1, r0, r0
20000004: 20000009 andcs r0, r0, r9
20000008 <reset>:
20000008: f000 f802 bl 20000010 <notmain>
2000000c: e7fe b.n 2000000c <reset+0x4>
...
20000010 <notmain>:
20000010: 4a07 ldr r2, [pc, #28] ; (20000030 <notmain+0x20>)
20000012: 4b08 ldr r3, [pc, #32] ; (20000034 <notmain+0x24>)
20000014: 447a add r2, pc
20000016: 58d1 ldr r1, [r2, r3]
20000018: 680b ldr r3, [r1, #0]
2000001a: 3301 adds r3, #1
2000001c: 600b str r3, [r1, #0]
2000001e: 4906 ldr r1, [pc, #24] ; (20000038 <notmain+0x28>)
20000020: 5852 ldr r2, [r2, r1]
20000022: 6013 str r3, [r2, #0]
20000024: 4a05 ldr r2, [pc, #20] ; (2000003c <notmain+0x2c>)
20000026: 447a add r2, pc
20000028: 6813 ldr r3, [r2, #0]
2000002a: 3b01 subs r3, #1
2000002c: 6013 str r3, [r2, #0]
2000002e: 4770 bx lr
20000030: 00000034 andeq r0, r0, r4, lsr r0
20000034: 00000004 andeq r0, r0, r4
20000038: 00000000 andeq r0, r0, r0
2000003c: 0000001a andeq r0, r0, sl, lsl r0
Disassembly of section .bss:
20000040 <x>:
20000040: 00000000 andeq r0, r0, r0
Disassembly of section .data:
20000044 <z.3645>:
20000044: 00000007 andeq r0, r0, r7
20000048 <y>:
20000048: 00000005 andeq r0, r0, r5
Disassembly of section .got:
2000004c <.got>:
2000004c: 20000040 andcs r0, r0, r0, asr #32
20000050: 20000048 andcs r0, r0, r8, asr #32
Disassembly of section .got.plt:
20000054 <_GLOBAL_OFFSET_TABLE_>:
...
Because you may need to be able to load the program anywhere in ram (within rules).
The code is all relative, but the data because of the nature of compiling and linking needs some hardcoding. So they setup a global offset table GOT. The location of the got is relative to the code, you cannot change that.
20000010: 4a07 ldr r2, [pc, #28] ; (20000030 <notmain+0x20>)
20000012: 4b08 ldr r3, [pc, #32] ; (20000034 <notmain+0x24>)
20000014: 447a add r2, pc
20000016: 58d1 ldr r1, [r2, r3]
20000018: 680b ldr r3, [r1, #0]
2000001a: 3301 adds r3, #1
2000001c: 600b str r3, [r1, #0]
There is your y++ when built position independent.
r2 gets an offset, r3 gets another offset. r2 is the relative offset to
the got from the code, (you cannot separate them and move one around and not
the other, not what position independent means) so now r2 points to the
GOT. r3 is the offset in the GOT to the address of y. r1 gets the address
of y and now it is like before get y in r3, add one, save y to memory.
Now IF you were to relocate this to an address that is not 0x20000000 your
bootstrap needs to go to the GOT and patch up all the addresses so you need
linker magic to get where the got is and how bit it is, etc...Use the pc to
figure out where you are and then make the adjustments. If loaded into memory at 0x20002000 then you need to add 0x2000 to each of the entries
in the table and then it will all just work. (still no stack stuff, stack is not related).
A little trick if you have the space.
Notice I put bss before data, and I have at least one .data item. If you can guarantee that (force a .data in your bootstrap for example).
00000000 00 10 00 20 09 00 00 20 00 f0 02 f8 fe e7 00 00 |... ... ........|
00000010 07 4a 08 4b 7a 44 d1 58 0b 68 01 33 0b 60 06 49 |.J.KzD.X.h.3.`.I|
00000020 52 58 13 60 05 4a 7a 44 13 68 01 3b 13 60 70 47 |RX.`.JzD.h.;.`pG|
00000030 34 00 00 00 04 00 00 00 00 00 00 00 1a 00 00 00 |4...............|
00000040 00 00 00 00 07 00 00 00 05 00 00 00 40 00 00 20 |............#.. |
00000050 48 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 |H.. ............|
00000060
20000040 <x>:
20000040: 00000000 andeq r0, r0, r0
Objdump pads the binary for a -O binary with zeros for .bss If you put it last then it is not assumed to work.
So I do not know how this code you have uses threads and globals, does it try to keep variables specific to each thread? If so does it use static locals up front then pass the address on the stack (and even there the stack pointer you use does not matter unless you are not properly using the stack in general, if not then globals are not your problem.).
If you start off the thread or any code on one stack pointer and implying
completely separate stacks (memory address spaces). And then switch, abandoning stack information needed for the code to work in and out of
functions, and then if you return from functions after switching stacks all
the code would break not just pointers to static locals that are passed along.
So a minimal example that demonstrates the problem can confirm for us what is really going on and what your questions really are and what the problem is. If you want to use the two stack pointers for a cortex-m you need to carefully read up and you need to also write some throwaway code examples to see how it works, and then apply that to the code the tools are generating.
Again if this is too elementary and I am miles away from the real question, I will certainly delete this no problem.

Related

How to crosscompile for STM32L4 cortex-m4 mcu using Clang/LLVM on Windows

i've been trying to compile a simple application implementing a USB CDC device for a stm32l4 micro using Clang on Windows.
The code was generated by STMCubeMX with some minor changes so that it just echos whatever is sent via the virtual com port.
Compiling using the arm-none-eabi-gcc toolchain from ubuntu using the generated make file works just fine. After flashing to the micro controller it does exactly what it's supposed to do. To compile with clang from windows, i made this fancy build script (generate objects from asm/c sources, link those. Using these options:
set TARGET_TRIPE=--target=arm-none-eabi
set ARCH=-march=armv7e-m
set CPU=-mcpu=cortex-m4
set FPU=-mfpu=fpv4-sp-d16 -mfloat-abi=hard
set MCU=%TARGET_TRIPE% %ARCH% %CPU% %FPU% -mthumb -mlittle-endian
set COMMON_FLAGS=-Wall %OPTIMIZATIONS% --sysroot=%SYSROOT% -fdata-sections -ffunction-sections -O0
set C_FLAGS=%MCU% %C_DEFINES% %COMMON_FLAGS% %C_INCLUDES% -c
set ASM_FLAGS=%MCU% %ASM_DEFINES% %COMMON_FLAGS% -x assembler-with-cpp -c
set LD_FLAGS=%MCU% %COMMON_FLAGS% -nostdlib -nostartfiles -fuse-ld=lld -T%LD_SCRIPT% -Wl,-Map=%BUILD_DIR%/%PROJECT_NAME%.map,--cref,--gc-sections %LIBDIRS% %LIBS%
).
Compilation finishes successfully, the firmware image seems fine, however after flashing it, the micro controller just does nothing. Not even the external crystal starts up, so the firmware images are obviously faulty. I have no idea, why that is. Binaries generated by Clang/GCC. Repo with source.
Edit: The clang binary image is ~400MB big, that doesn't seem right either.
Edit 2:
This is the Linker script im using:
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = 0x2000A000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 40K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
}
/* Define output sections */
SECTIONS
{
/* The startup code goes first into FLASH */
.isr_vector :
{
. = ALIGN(8);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(8);
} >FLASH
/* The program code and other data goes into FLASH */
.text :
{
. = ALIGN(8);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(8);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data goes into FLASH */
.rodata :
{
. = ALIGN(8);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(8);
} >FLASH
.ARM.extab :
{
. = ALIGN(8);
*(.ARM.extab* .gnu.linkonce.armextab.*)
. = ALIGN(8);
} >FLASH
.ARM : {
. = ALIGN(8);
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
. = ALIGN(8);
} >FLASH
.preinit_array :
{
. = ALIGN(8);
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
. = ALIGN(8);
} >FLASH
.init_array :
{
. = ALIGN(8);
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
. = ALIGN(8);
} >FLASH
.fini_array :
{
. = ALIGN(8);
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
. = ALIGN(8);
} >FLASH
/* used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections goes into RAM, load LMA copy after code */
.data :
{
. = ALIGN(8);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(8);
_edata = .; /* define a global symbol at data end */
} >RAM AT> FLASH
/* Uninitialized data section */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >RAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
/* Remove information from the standard libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
This is the starup code i am using:
.syntax unified
.cpu cortex-m4
.fpu softvfp
.thumb
.global g_pfnVectors
.global Default_Handler
/* start address for the initialization values of the .data section.
defined in linker script */
.word _sidata
/* start address for the .data section. defined in linker script */
.word _sdata
/* end address for the .data section. defined in linker script */
.word _edata
/* start address for the .bss section. defined in linker script */
.word _sbss
/* end address for the .bss section. defined in linker script */
.word _ebss
.equ BootRAM, 0xF1E0F85F
/**
* #brief This is the code that gets called when the processor first
* starts execution following a reset event. Only the absolutely
* necessary set is performed, after which the application
* supplied main() routine is called.
* #param None
* #retval : None
*/
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
ldr sp, =_estack /* Set stack pointer */
/* Copy the data segment initializers from flash to SRAM */
movs r1, #0
b LoopCopyDataInit
CopyDataInit:
ldr r3, =_sidata
ldr r3, [r3, r1]
str r3, [r0, r1]
adds r1, r1, #4
LoopCopyDataInit:
ldr r0, =_sdata
ldr r3, =_edata
adds r2, r0, r1
cmp r2, r3
bcc CopyDataInit
ldr r2, =_sbss
b LoopFillZerobss
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2], #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call CRT entry point */
//bl _mainCRTStartup
bl __libc_init_array
bl main
LoopForever:
b LoopForever
.size Reset_Handler, .-Reset_Handler
/**
* #brief This is the code that gets called when the processor receives an
* unexpected interrupt. This simply enters an infinite loop, preserving
* the system state for examination by a debugger.
*
* #param None
* #retval : None
*/
.section .text.Default_Handler,"ax",%progbits
Default_Handler:
Infinite_Loop:
b Infinite_Loop
.size Default_Handler, .-Default_Handler
/******************************************************************************
*
* The minimal vector table for a Cortex-M4. Note that the proper constructs
* must be placed on this to ensure that it ends up at physical address
* 0x0000.0000.
*
******************************************************************************/
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
g_pfnVectors:
.word _estack
.word Reset_Handler
.word NMI_Handler
.word HardFault_Handler
.word MemManage_Handler
.word BusFault_Handler
.word UsageFault_Handler
.word 0
.word 0
.word 0
.word 0
.word SVC_Handler
.word DebugMon_Handler
.word 0
.word PendSV_Handler
.word SysTick_Handler
.word WWDG_IRQHandler
.word PVD_PVM_IRQHandler
.word TAMP_STAMP_IRQHandler
.word RTC_WKUP_IRQHandler
.word FLASH_IRQHandler
.word RCC_IRQHandler
.word EXTI0_IRQHandler
.word EXTI1_IRQHandler
.word EXTI2_IRQHandler
.word EXTI3_IRQHandler
.word EXTI4_IRQHandler
.word DMA1_Channel1_IRQHandler
.word DMA1_Channel2_IRQHandler
.word DMA1_Channel3_IRQHandler
.word DMA1_Channel4_IRQHandler
.word DMA1_Channel5_IRQHandler
.word DMA1_Channel6_IRQHandler
.word DMA1_Channel7_IRQHandler
.word ADC1_2_IRQHandler
.word 0
.word 0
.word 0
.word 0
.word EXTI9_5_IRQHandler
.word TIM1_BRK_TIM15_IRQHandler
.word TIM1_UP_TIM16_IRQHandler
.word TIM1_TRG_COM_IRQHandler
.word TIM1_CC_IRQHandler
.word TIM2_IRQHandler
.word 0
.word 0
.word I2C1_EV_IRQHandler
.word I2C1_ER_IRQHandler
.word I2C2_EV_IRQHandler
.word I2C2_ER_IRQHandler
.word SPI1_IRQHandler
.word SPI2_IRQHandler
.word USART1_IRQHandler
.word USART2_IRQHandler
.word USART3_IRQHandler
.word EXTI15_10_IRQHandler
.word RTC_Alarm_IRQHandler
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word TIM6_IRQHandler
.word 0
.word DMA2_Channel1_IRQHandler
.word DMA2_Channel2_IRQHandler
.word DMA2_Channel3_IRQHandler
.word DMA2_Channel4_IRQHandler
.word DMA2_Channel5_IRQHandler
.word 0
.word 0
.word 0
.word COMP_IRQHandler
.word LPTIM1_IRQHandler
.word LPTIM2_IRQHandler
.word USB_IRQHandler
.word DMA2_Channel6_IRQHandler
.word DMA2_Channel7_IRQHandler
.word LPUART1_IRQHandler
.word QUADSPI_IRQHandler
.word I2C3_EV_IRQHandler
.word I2C3_ER_IRQHandler
.word 0
.word 0
.word 0
.word TSC_IRQHandler
.word 0
.word AES_IRQHandler
.word RNG_IRQHandler
.word FPU_IRQHandler
.word CRS_IRQHandler
/*******************************************************************************
*
* Provide weak aliases for each Exception handler to the Default_Handler.
* As they are weak aliases, any function with the same name will override
* this definition.
*
*******************************************************************************/
.weak NMI_Handler
.thumb_set NMI_Handler,Default_Handler
.weak HardFault_Handler
.thumb_set HardFault_Handler,Default_Handler
.weak MemManage_Handler
.thumb_set MemManage_Handler,Default_Handler
.weak BusFault_Handler
.thumb_set BusFault_Handler,Default_Handler
.weak UsageFault_Handler
.thumb_set UsageFault_Handler,Default_Handler
.weak SVC_Handler
.thumb_set SVC_Handler,Default_Handler
.weak DebugMon_Handler
.thumb_set DebugMon_Handler,Default_Handler
.weak PendSV_Handler
.thumb_set PendSV_Handler,Default_Handler
.weak SysTick_Handler
.thumb_set SysTick_Handler,Default_Handler
.weak WWDG_IRQHandler
.thumb_set WWDG_IRQHandler,Default_Handler
.weak PVD_PVM_IRQHandler
.thumb_set PVD_PVM_IRQHandler,Default_Handler
.weak TAMP_STAMP_IRQHandler
.thumb_set TAMP_STAMP_IRQHandler,Default_Handler
.weak RTC_WKUP_IRQHandler
.thumb_set RTC_WKUP_IRQHandler,Default_Handler
.weak FLASH_IRQHandler
.thumb_set FLASH_IRQHandler,Default_Handler
.weak RCC_IRQHandler
.thumb_set RCC_IRQHandler,Default_Handler
.weak EXTI0_IRQHandler
.thumb_set EXTI0_IRQHandler,Default_Handler
.weak EXTI1_IRQHandler
.thumb_set EXTI1_IRQHandler,Default_Handler
.weak EXTI2_IRQHandler
.thumb_set EXTI2_IRQHandler,Default_Handler
.weak EXTI3_IRQHandler
.thumb_set EXTI3_IRQHandler,Default_Handler
.weak EXTI4_IRQHandler
.thumb_set EXTI4_IRQHandler,Default_Handler
.weak DMA1_Channel1_IRQHandler
.thumb_set DMA1_Channel1_IRQHandler,Default_Handler
.weak DMA1_Channel2_IRQHandler
.thumb_set DMA1_Channel2_IRQHandler,Default_Handler
.weak DMA1_Channel3_IRQHandler
.thumb_set DMA1_Channel3_IRQHandler,Default_Handler
.weak DMA1_Channel4_IRQHandler
.thumb_set DMA1_Channel4_IRQHandler,Default_Handler
.weak DMA1_Channel5_IRQHandler
.thumb_set DMA1_Channel5_IRQHandler,Default_Handler
.weak DMA1_Channel6_IRQHandler
.thumb_set DMA1_Channel6_IRQHandler,Default_Handler
.weak DMA1_Channel7_IRQHandler
.thumb_set DMA1_Channel7_IRQHandler,Default_Handler
.weak ADC1_2_IRQHandler
.thumb_set ADC1_2_IRQHandler,Default_Handler
.weak EXTI9_5_IRQHandler
.thumb_set EXTI9_5_IRQHandler,Default_Handler
.weak TIM1_BRK_TIM15_IRQHandler
.thumb_set TIM1_BRK_TIM15_IRQHandler,Default_Handler
.weak TIM1_UP_TIM16_IRQHandler
.thumb_set TIM1_UP_TIM16_IRQHandler,Default_Handler
.weak TIM1_TRG_COM_IRQHandler
.thumb_set TIM1_TRG_COM_IRQHandler,Default_Handler
.weak TIM1_CC_IRQHandler
.thumb_set TIM1_CC_IRQHandler,Default_Handler
.weak TIM2_IRQHandler
.thumb_set TIM2_IRQHandler,Default_Handler
.weak I2C1_EV_IRQHandler
.thumb_set I2C1_EV_IRQHandler,Default_Handler
.weak I2C1_ER_IRQHandler
.thumb_set I2C1_ER_IRQHandler,Default_Handler
.weak I2C2_EV_IRQHandler
.thumb_set I2C2_EV_IRQHandler,Default_Handler
.weak I2C2_ER_IRQHandler
.thumb_set I2C2_ER_IRQHandler,Default_Handler
.weak SPI1_IRQHandler
.thumb_set SPI1_IRQHandler,Default_Handler
.weak SPI2_IRQHandler
.thumb_set SPI2_IRQHandler,Default_Handler
.weak USART1_IRQHandler
.thumb_set USART1_IRQHandler,Default_Handler
.weak USART2_IRQHandler
.thumb_set USART2_IRQHandler,Default_Handler
.weak USART3_IRQHandler
.thumb_set USART3_IRQHandler,Default_Handler
.weak EXTI15_10_IRQHandler
.thumb_set EXTI15_10_IRQHandler,Default_Handler
.weak RTC_Alarm_IRQHandler
.thumb_set RTC_Alarm_IRQHandler,Default_Handler
.weak TIM6_IRQHandler
.thumb_set TIM6_IRQHandler,Default_Handler
.weak DMA2_Channel1_IRQHandler
.thumb_set DMA2_Channel1_IRQHandler,Default_Handler
.weak DMA2_Channel2_IRQHandler
.thumb_set DMA2_Channel2_IRQHandler,Default_Handler
.weak DMA2_Channel3_IRQHandler
.thumb_set DMA2_Channel3_IRQHandler,Default_Handler
.weak DMA2_Channel4_IRQHandler
.thumb_set DMA2_Channel4_IRQHandler,Default_Handler
.weak DMA2_Channel5_IRQHandler
.thumb_set DMA2_Channel5_IRQHandler,Default_Handler
.weak COMP_IRQHandler
.thumb_set COMP_IRQHandler,Default_Handler
.weak LPTIM1_IRQHandler
.thumb_set LPTIM1_IRQHandler,Default_Handler
.weak LPTIM2_IRQHandler
.thumb_set LPTIM2_IRQHandler,Default_Handler
.weak USB_IRQHandler
.thumb_set USB_IRQHandler,Default_Handler
.weak DMA2_Channel6_IRQHandler
.thumb_set DMA2_Channel6_IRQHandler,Default_Handler
.weak DMA2_Channel7_IRQHandler
.thumb_set DMA2_Channel7_IRQHandler,Default_Handler
.weak LPUART1_IRQHandler
.thumb_set LPUART1_IRQHandler,Default_Handler
.weak QUADSPI_IRQHandler
.thumb_set QUADSPI_IRQHandler,Default_Handler
.weak I2C3_EV_IRQHandler
.thumb_set I2C3_EV_IRQHandler,Default_Handler
.weak I2C3_ER_IRQHandler
.thumb_set I2C3_ER_IRQHandler,Default_Handler
.weak TSC_IRQHandler
.thumb_set TSC_IRQHandler,Default_Handler
.weak AES_IRQHandler
.thumb_set AES_IRQHandler,Default_Handler
.weak RNG_IRQHandler
.thumb_set RNG_IRQHandler,Default_Handler
.weak FPU_IRQHandler
.thumb_set FPU_IRQHandler,Default_Handler
.weak CRS_IRQHandler
.thumb_set CRS_IRQHandler,Default_Handler
Both, startup code and linker script were generated by STM32CubeMX
Here is the main function that gets called from the startup code:
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USB_DEVICE_Init();
while (1)
{
}
}
Initializes the peripherals and the CDC usb device.
Disassembly of vector table:
Disassembly of section .isr_vector:
08000000 <g_pfnVectors>:
8000000: 00 a0 adr r0, #0
8000002: 00 20 movs r0, #0
8000004: 81 53 strh r1, [r0, r6]
8000006: 00 08 lsrs r0, r0, #32
8000008: a5 04 lsls r5, r4, #18
800000a: 00 08 lsrs r0, r0, #32
800000c: a7 04 lsls r7, r4, #18
800000e: 00 08 lsrs r0, r0, #32
8000010: ab 04 lsls r3, r5, #18
8000012: 00 08 lsrs r0, r0, #32
8000014: af 04 lsls r7, r5, #18
8000016: 00 08 lsrs r0, r0, #32
8000018: b3 04 lsls r3, r6, #18
800001a: 00 08 lsrs r0, r0, #32
...
800002c: b7 04 lsls r7, r6, #18
800002e: 00 08 lsrs r0, r0, #32
8000030: b9 04 lsls r1, r7, #18
8000032: 00 08 lsrs r0, r0, #32
8000034: 00 00 movs r0, r0
8000036: 00 00 movs r0, r0
8000038: bb 04 lsls r3, r7, #18
800003a: 00 08 lsrs r0, r0, #32
800003c: bd 04 lsls r5, r7, #18
800003e: 00 08 lsrs r0, r0, #32
8000040: d1 53 strh r1, [r2, r7]
8000042: 00 08 lsrs r0, r0, #32
8000044: d1 53 strh r1, [r2, r7]
8000046: 00 08 lsrs r0, r0, #32
8000048: d1 53 strh r1, [r2, r7]
800004a: 00 08 lsrs r0, r0, #32
800004c: d1 53 strh r1, [r2, r7]
800004e: 00 08 lsrs r0, r0, #32
8000050: d1 53 strh r1, [r2, r7]
8000052: 00 08 lsrs r0, r0, #32
8000054: d1 53 strh r1, [r2, r7]
8000056: 00 08 lsrs r0, r0, #32
8000058: d1 53 strh r1, [r2, r7]
800005a: 00 08 lsrs r0, r0, #32
800005c: d1 53 strh r1, [r2, r7]
800005e: 00 08 lsrs r0, r0, #32
8000060: d1 53 strh r1, [r2, r7]
8000062: 00 08 lsrs r0, r0, #32
8000064: d1 53 strh r1, [r2, r7]
8000066: 00 08 lsrs r0, r0, #32
8000068: d1 53 strh r1, [r2, r7]
800006a: 00 08 lsrs r0, r0, #32
800006c: d1 53 strh r1, [r2, r7]
800006e: 00 08 lsrs r0, r0, #32
8000070: d1 53 strh r1, [r2, r7]
8000072: 00 08 lsrs r0, r0, #32
8000074: d1 53 strh r1, [r2, r7]
8000076: 00 08 lsrs r0, r0, #32
8000078: d1 53 strh r1, [r2, r7]
800007a: 00 08 lsrs r0, r0, #32
800007c: d1 53 strh r1, [r2, r7]
800007e: 00 08 lsrs r0, r0, #32
8000080: d1 53 strh r1, [r2, r7]
8000082: 00 08 lsrs r0, r0, #32
8000084: d1 53 strh r1, [r2, r7]
8000086: 00 08 lsrs r0, r0, #32
8000088: d1 53 strh r1, [r2, r7]
800008a: 00 08 lsrs r0, r0, #32
...
800009c: d1 53 strh r1, [r2, r7]
800009e: 00 08 lsrs r0, r0, #32
80000a0: d1 53 strh r1, [r2, r7]
80000a2: 00 08 lsrs r0, r0, #32
80000a4: d1 53 strh r1, [r2, r7]
80000a6: 00 08 lsrs r0, r0, #32
80000a8: d1 53 strh r1, [r2, r7]
80000aa: 00 08 lsrs r0, r0, #32
80000ac: d1 53 strh r1, [r2, r7]
80000ae: 00 08 lsrs r0, r0, #32
80000b0: d1 53 strh r1, [r2, r7]
80000b2: 00 08 lsrs r0, r0, #32
...
80000bc: d1 53 strh r1, [r2, r7]
80000be: 00 08 lsrs r0, r0, #32
80000c0: d1 53 strh r1, [r2, r7]
80000c2: 00 08 lsrs r0, r0, #32
80000c4: d1 53 strh r1, [r2, r7]
80000c6: 00 08 lsrs r0, r0, #32
80000c8: d1 53 strh r1, [r2, r7]
80000ca: 00 08 lsrs r0, r0, #32
80000cc: d1 53 strh r1, [r2, r7]
80000ce: 00 08 lsrs r0, r0, #32
80000d0: d1 53 strh r1, [r2, r7]
80000d2: 00 08 lsrs r0, r0, #32
80000d4: d1 53 strh r1, [r2, r7]
80000d6: 00 08 lsrs r0, r0, #32
80000d8: d1 53 strh r1, [r2, r7]
80000da: 00 08 lsrs r0, r0, #32
80000dc: d1 53 strh r1, [r2, r7]
80000de: 00 08 lsrs r0, r0, #32
80000e0: d1 53 strh r1, [r2, r7]
80000e2: 00 08 lsrs r0, r0, #32
80000e4: d1 53 strh r1, [r2, r7]
80000e6: 00 08 lsrs r0, r0, #32
...
8000118: d1 53 strh r1, [r2, r7]
800011a: 00 08 lsrs r0, r0, #32
800011c: 00 00 movs r0, r0
800011e: 00 00 movs r0, r0
8000120: d1 53 strh r1, [r2, r7]
8000122: 00 08 lsrs r0, r0, #32
8000124: d1 53 strh r1, [r2, r7]
8000126: 00 08 lsrs r0, r0, #32
8000128: d1 53 strh r1, [r2, r7]
800012a: 00 08 lsrs r0, r0, #32
800012c: d1 53 strh r1, [r2, r7]
800012e: 00 08 lsrs r0, r0, #32
8000130: d1 53 strh r1, [r2, r7]
8000132: 00 08 lsrs r0, r0, #32
...
8000140: d1 53 strh r1, [r2, r7]
8000142: 00 08 lsrs r0, r0, #32
8000144: d1 53 strh r1, [r2, r7]
8000146: 00 08 lsrs r0, r0, #32
8000148: d1 53 strh r1, [r2, r7]
800014a: 00 08 lsrs r0, r0, #32
800014c: c7 04 lsls r7, r0, #19
800014e: 00 08 lsrs r0, r0, #32
8000150: d1 53 strh r1, [r2, r7]
8000152: 00 08 lsrs r0, r0, #32
8000154: d1 53 strh r1, [r2, r7]
8000156: 00 08 lsrs r0, r0, #32
8000158: d1 53 strh r1, [r2, r7]
800015a: 00 08 lsrs r0, r0, #32
800015c: d1 53 strh r1, [r2, r7]
800015e: 00 08 lsrs r0, r0, #32
8000160: d1 53 strh r1, [r2, r7]
8000162: 00 08 lsrs r0, r0, #32
8000164: d1 53 strh r1, [r2, r7]
8000166: 00 08 lsrs r0, r0, #32
...
8000174: d1 53 strh r1, [r2, r7]
8000176: 00 08 lsrs r0, r0, #32
8000178: 00 00 movs r0, r0
800017a: 00 00 movs r0, r0
800017c: d1 53 strh r1, [r2, r7]
800017e: 00 08 lsrs r0, r0, #32
8000180: d1 53 strh r1, [r2, r7]
8000182: 00 08 lsrs r0, r0, #32
8000184: d1 53 strh r1, [r2, r7]
8000186: 00 08 lsrs r0, r0, #32
8000188: d1 53 strh r1, [r2, r7]
800018a: 00 08 lsrs r0, r0, #32
800018c: 00 00 movs r0, r0
800018e: 00 00 movs r0, r0
So this inspired me to try clang/llvm after a few years off...
Now I am on Linux not Windows, but you should be able to adapt to Windows (or of course dual boot Linux or put Linux in a vm or whatever).
Derived from build instructions on the clang/llvm site(s)
rm -rf /opt/llvmv6m
rm -rf llvm-project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_CROSSCOMPILING=True -DCMAKE_INSTALL_PREFIX=/opt/llvmv6m -DLLVM_DEFAULT_TARGET_TRIPLE=armv6m-none-eabi -DLLVM_TARGET_ARCH=ARM -DLLVM_TARGETS_TO_BUILD=ARM -G "Unix Makefiles" ../llvm
make
sudo make install
Is how I built it. Yes, I know yours is a cortex-m4 all the cortex-ms (so far) support armv6-m, you can easily make this armv7m. Was an experiment based on those web pages and interestingly now I don't have to specify stuff on the command line to specify the architecture or cpu, curious to know if this is still a generic clang cross compiler and armv6m is just the default. Anyway...
This is a little more complicated than a simple infinite loop, but playing with llvm features that you don't get in gnu.
start.s
.thumb
.cpu cortex-m0
.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.thumb_func
reset:
bl notmain
.thumb_func
loop:
b .
notmain.c
unsigned int fun ( void );
unsigned int notmain ( void )
{
return(fun());
}
fun.c
unsigned int fun ( void )
{
return(5);
}
memmap
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Makefile
all :
arm-none-eabi-as start.s -o start.o
clang -O2 -fomit-frame-pointer -c notmain.c -o notmain.o
clang -O2 -fomit-frame-pointer -c fun.c -o fun.o
arm-none-eabi-ld -T memmap start.o notmain.o fun.o -o basic.elf
arm-none-eabi-objdump -D basic.elf > basic.list
clang -fomit-frame-pointer -c -emit-llvm notmain.c -o notmain.bc
clang -fomit-frame-pointer -c -emit-llvm fun.c -o fun.bc
llc $(LLCOPS) notmain.bc -filetype=obj -o notmain.not.o
llc $(LLCOPS) fun.bc -filetype=obj -o fun.not.o
arm-none-eabi-ld -T memmap start.o notmain.not.o fun.not.o -o not.elf
arm-none-eabi-objdump -D not.elf > not.list
llvm-link notmain.bc fun.bc -o notmain.not.bc
opt -O2 notmain.not.bc -o notmain.opt.bc
llc $(LLCOPS) notmain.opt.bc -filetype=obj -o notmain.opt.o
arm-none-eabi-ld -T memmap start.o notmain.opt.o -o opt.elf
arm-none-eabi-objdump -D opt.elf > opt.list
clean:
rm -f *.S
rm -f *.o
rm -f *.list
rm -f *.elf
rm -f *.bc
basic.list
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000015 stmdaeq r0, {r0, r2, r4}
800000c: 08000015 stmdaeq r0, {r0, r2, r4}
08000010 <reset>:
8000010: f000 f802 bl 8000018 <notmain>
08000014 <loop>:
8000014: e7fe b.n 8000014 <loop>
...
08000018 <notmain>:
8000018: b580 push {r7, lr}
800001a: f000 f801 bl 8000020 <fun>
800001e: bd80 pop {r7, pc}
08000020 <fun>:
8000020: 2005 movs r0, #5
8000022: 4770 bx lr
not.list
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000015 stmdaeq r0, {r0, r2, r4}
800000c: 08000015 stmdaeq r0, {r0, r2, r4}
08000010 <reset>:
8000010: f000 f802 bl 8000018 <notmain>
08000014 <loop>:
8000014: e7fe b.n 8000014 <loop>
...
08000018 <notmain>:
8000018: b580 push {r7, lr}
800001a: f000 f801 bl 8000020 <fun>
800001e: bd80 pop {r7, pc}
08000020 <fun>:
8000020: 2005 movs r0, #5
8000022: 4770 bx lr
opt.list
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000015 stmdaeq r0, {r0, r2, r4}
800000c: 08000015 stmdaeq r0, {r0, r2, r4}
08000010 <reset>:
8000010: f000 f802 bl 8000018 <notmain>
08000014 <loop>:
8000014: e7fe b.n 8000014 <loop>
...
08000018 <notmain>:
8000018: b580 push {r7, lr}
800001a: f000 f802 bl 8000022 <fun>
800001e: 2005 movs r0, #5
8000020: bd80 pop {r7, pc}
08000022 <fun>:
8000022: 2005 movs r0, #5
8000024: 4770 bx lr
The fun part here is that you can optimize across files/objects which you cant do with gnu tools AFAIK. So, actually LLVM did a really bad job there going to have to look into this.
Now I used gnus linker and assembler, still not sure how to get around that I get an error when trying to build just with clang.
These are all generic enough to run on your processor as shown here, key traps to look for with a new project or tool.
08000000 <_start>:
8000000: 20001000
8000004: 08000011
8000008: 08000015
800000c: 08000015
08000010 <reset>:
08000014 <loop>:
For a cortex-m to boot properly and not immediately fail to work, the vector table needs to have the lsbit set for the vectors, starting with reset in this case reset is at 0x08000010 so the vector table entry needs to be 0x08000011 for that code to be run. and that is what we see here so we won't fail due to that.
While some mcus don't have 0x1000 bytes I assume yours does, so 0x20001000 is an okay starting place for the stack pointer.
From there this is again not just an infinite loop it is more complicated but this should run on your processor and not fail. If you change it to this:
.thumb
.cpu cortex-m0
.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.thumb_func
reset:
mov r0,#0
b reset
.thumb_func
loop:
b .
(granted this becomes an gnu tool project not llvm/clang what assembler are you using and what linker?)
then with a debugger (stlink plus openocd plus telnet) you can stop and resume and examine r0 to see that it is running.
.thumb
.cpu cortex-m0
.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.thumb_func
reset:
bl notmain
.thumb_func
loop:
b .
.thumb_func
bounce:
bx lr
void bounce ( unsigned int );
unsigned int notmain ( void )
{
for(ra=0;;ra++) bounce(ra);
return(0);
}
which adds a little clang/llvm to it and see that r0 is changing if you stop/resume.
some chips will look to see that the vectors are there if it sees 0xFFs then it may go into a bootloader so with the debugger you can also examine 0x00000000 and 0x08000000, try reset halt on the telnet/openocd command line and then mdw 0 20 to see what the cpu is going to see at address zero to see if it is your vector table.
If you get past these very simple but very fatal common problems then you may be dealing with something else like clang doesn't like while(1) loops, maybe they finally fixed that bug but when I filed it they refused to so if there is code waiting for a status bit to change that uses a while(1) then maybe that's the problem. I would take baby steps after the above adding one thing at a time to main as you have been try the clocks, perhaps have an infinite loop ((asm) function) you call after clock init and see if clock init is running to completion and returning back to main().
Are you using clang/llvm to build the libraries you are using or are they pre-built for you to use with clang/llvm?
Edit
Did more work, in case the above is relevant to anyone and doesn't get deleted.
change to
-DLLVM_ENABLE_PROJECTS='clang;lld'
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
/DISCARD/ : {
*(.ARM.exidx*)
}
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
and maybe another couple of things. and now the disassembly resembles yours too with the data in individual little endian bytes.
Disassembly of section .text:
08000000 _start:
8000000: 00 10 asrs r0, r0, #32
8000002: 00 20 movs r0, #0
8000004: 11 00 movs r1, r2
8000006: 00 08 lsrs r0, r0, #32
8000008: 15 00 movs r5, r2
800000a: 00 08 lsrs r0, r0, #32
800000c: 15 00 movs r5, r2
800000e: 00 08 lsrs r0, r0, #32
08000010 reset:
8000010: 00 f0 02 f8 bl #4
08000014 loop:
8000014: fe e7 b #-4 <loop>
8000016: d4 d4 bmi #-88 <start.c+0x7ffffc2>
08000018 notmain:
8000018: 80 b5 push {r7, lr}
800001a: 00 f0 02 f8 bl #4
800001e: 05 20 movs r0, #5
8000020: 80 bd pop {r7, pc}
08000022 fun:
8000022: 05 20 movs r0, #5
8000024: 70 47 bx lr
Now on to your comment about 400MB.
0x20000000 - 0x08000000 = 0x18000000 = 402653184.
And this is probably your problem, so it sounds like you have some .data.
Let me start a new one:
start.s
.text
/*.syntax unified*/
.cpu cortex-m0
.code 16
.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.thumb_func
reset:
bl notmain
.thumb_func
loop:
b .
notmain.c
unsigned int notmain ( void )
{
return(7);
}
Makefile
all :
clang -c start.s -o start.o
clang -O2 -fomit-frame-pointer -c notmain.c -o notmain.o
ld.lld -T memmap start.o notmain.o -o basic.elf
llvm-objdump -D basic.elf > basic.list
llvm-objcopy -O binary basic.elf basic.bin
clean:
rm -f *.o
rm -f *.list
rm -f *.elf
memmap
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
/DISCARD/ : {
*(.ARM.exidx*)
}
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
and this produces a 28 byte basic.bin file
08000000 _start:
8000000: 00 10 asrs r0, r0, #32
8000002: 00 20 movs r0, #0
8000004: 11 00 movs r1, r2
8000006: 00 08 lsrs r0, r0, #32
8000008: 15 00 movs r5, r2
800000a: 00 08 lsrs r0, r0, #32
800000c: 15 00 movs r5, r2
800000e: 00 08 lsrs r0, r0, #32
08000010 reset:
8000010: 00 f0 02 f8 bl #4
08000014 loop:
8000014: fe e7 b #-4 <loop>
8000016: d4 d4 bmi #-88 <start.c+0x7ffffc2>
08000018 notmain:
8000018: 07 20 movs r0, #7
800001a: 70 47 bx lr
Now let's add .data:
unsigned int x = 5;
unsigned int notmain ( void )
{
return(7);
}
now my basic.bin is 402653188 bytes.
What is going on is the objcopy is making a binary memory image that starts at the first loadable or relevant space and ends with the last one so
Disassembly of section .text:
08000000 _start:
8000000: 00 10 asrs r0, r0, #32
8000002: 00 20 movs r0, #0
8000004: 11 00 movs r1, r2
8000006: 00 08 lsrs r0, r0, #32
8000008: 15 00 movs r5, r2
800000a: 00 08 lsrs r0, r0, #32
800000c: 15 00 movs r5, r2
800000e: 00 08 lsrs r0, r0, #32
08000010 reset:
8000010: 00 f0 02 f8 bl #4
08000014 loop:
8000014: fe e7 b #-4 <loop>
8000016: d4 d4 bmi #-88 <start.c+0x7ffffc2>
08000018 notmain:
8000018: 07 20 movs r0, #7
800001a: 70 47 bx lr
Disassembly of section .data:
20000000 x:
20000000: 05 00 movs r5, r0
20000002: 00 00 movs r0, r0
from 0x08000000 to 0x20000002 inclusive
0x20000003 - 0x08000000 = 402653187 so they padded it to the nearest word (or halfword).
You cannot load this into your microcontroller it wouldn't work anyway, your program needs to be contained in non volatile memory...flash...
first step:
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
/DISCARD/ : {
*(.ARM.exidx*)
}
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram AT > rom
}
changing it to ram AT rom.
the basic.bin file is 32 bytes now.
00000000 00 10 00 20 11 00 00 08 15 00 00 08 15 00 00 08 |... ............|
00000010 00 f0 02 f8 fe e7 d4 d4 07 20 70 47 05 00 00 00 |......... pG....|
00000020
Disassembly of section .text:
08000000 _start:
8000000: 00 10 asrs r0, r0, #32
8000002: 00 20 movs r0, #0
8000004: 11 00 movs r1, r2
8000006: 00 08 lsrs r0, r0, #32
8000008: 15 00 movs r5, r2
800000a: 00 08 lsrs r0, r0, #32
800000c: 15 00 movs r5, r2
800000e: 00 08 lsrs r0, r0, #32
08000010 reset:
8000010: 00 f0 02 f8 bl #4
08000014 loop:
8000014: fe e7 b #-4 <loop>
8000016: d4 d4 bmi #-88 <start.c+0x7ffffc2>
08000018 notmain:
8000018: 07 20 movs r0, #7
800001a: 70 47 bx lr
Disassembly of section .data:
20000000 x:
20000000: 05 00 movs r5, r0
20000002: 00 00 movs r0, r0
notice the end of the binary file:
70 47 05 00 00 00
it has the last .text item 70 47 then the .data item.
and let the tools do the work for you
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
/DISCARD/ : {
*(.ARM.exidx*)
}
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
__data_rom_start__ = .;
.data : {
__data_start__ = .;
*(.data*)
} > ram AT > rom
__data_end__ = .;
__data_size__ = __data_end__ - __data_start__;
}
basic.bin still currently 32 bytes but
llvm-nm basic.elf
20000004 D __data_end__
0800001c T __data_rom_start__
00000004 A __data_size__
20000000 D __data_start__
08000000 T _start
08000014 t loop
08000018 T notmain
08000010 t reset
20000000 D x
Now we know that in the flash at address 0x0800001c the embedded .data starts, it is 4 bytes in size and its destination in ram is 0x20000000 so now the bootstrap code can copy .data from flash to ram before calling the C entry point.
Now you have already done all of this, and I assume knew all of this with respect to linker scripts and bootstrap, but you are getting that 400MB binary which indicates there is something else leaking into the ram address space.
Examine the disassembly (objdump -D) and.or readelf and/or nm outputs to find out what is out there and add it to the linker script along with bootstrap code to copy it.
Adding some .bss
unsigned int x = 5;
unsigned int y;
unsigned int notmain ( void )
{
return(7);
}
From objdump
Disassembly of section .bss:
20000000 y:
...
Disassembly of section .data:
20000004 x:
20000004: 05 00 movs r5, r0
20000006: 00 00 movs r0, r0
From readelf
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x08000000 0x08000000 0x0001c 0x0001c R E 0x1000
LOAD 0x002000 0x20000000 0x20000000 0x00000 0x00004 RW 0x1000
LOAD 0x002004 0x20000004 0x0800001c 0x00004 0x00004 RW 0x1000
the .bss one looks a little scary but it doesn't end up in the binary which is again 32 bytes. But we see here that .data is physically in the flash but wants to be in ram which is what we desire for this type of platform. Maybe from readelf you can find the leak into ram.
Your objcopy -O binary output should fit in flash and contain 100% of your program and data otherwise it won't work if you were to extract only the flash part from that 400MByte file there would or might be some data items that are not going to be there that the software expected to be there to operate. Or maybe it's some silly string table thing or some other item that is not really meant for the binary but happens to have a section name so far not handled in the linker script.
Sorry, the 400MB should have been obvious to me from the start, yet another common linker script trap when prepping a new project. I personally never use .data nor rely on .bss so don't have these issues, but your experience may be different, but I am still very aware of it. (It's more fun when you have .text at 0 and ram at 0x80000000 or even higher you get files that are gigabytes in size).

Difference between position dependent and position independent code?

I understand that the current gcc compilers by default generate position independent code. However, to get an understanding of how position dependent code looked like, I compiled this
int Add(int x, int y) {
return x+y;
}
int Subtract(int x, int y) {
return x-y;
}
int main() {
bool flag = false;
int x=10,y=5,z;
if (flag) {
z = Add(x,y);
}
else {
z = Subtract(x,y);
}
}
as g++ -c check.cpp -no-pie. However, the generated code is identical with or without the -no-pie flag. <main+0x34> looks to be a relative offset.
26: 55 push %rbp
27: 48 89 e5 mov %rsp,%rbp
2a: 48 83 ec 10 sub $0x10,%rsp
2e: c6 45 f3 00 movb $0x0,-0xd(%rbp)
32: c7 45 f4 0a 00 00 00 movl $0xa,-0xc(%rbp)
39: c7 45 f8 05 00 00 00 movl $0x5,-0x8(%rbp)
40: 80 7d f3 00 cmpb $0x0,-0xd(%rbp)
44: 74 14 je 5a <main+0x34>
46: 8b 55 f8 mov -0x8(%rbp),%edx
49: 8b 45 f4 mov -0xc(%rbp),%eax
4c: 89 d6 mov %edx,%esi
4e: 89 c7 mov %eax,%edi
50: e8 00 00 00 00 callq 55 <main+0x2f>
55: 89 45 fc mov %eax,-0x4(%rbp)
58: eb 12 jmp 6c <main+0x46>
5a: 8b 55 f8 mov -0x8(%rbp),%edx
5d: 8b 45 f4 mov -0xc(%rbp),%eax
60: 89 d6 mov %edx,%esi
62: 89 c7 mov %eax,%edi
64: e8 00 00 00 00 callq 69 <main+0x43>
69: 89 45 fc mov %eax,-0x4(%rbp)
6c: b8 00 00 00 00 mov $0x0,%eax
71: c9 leaveq
72: c3 retq
is the objdump in both cases for just the main. Am I not using the correct flag or is the assembly code supposed to be same for PIC and non-PIC for this code chunk. If it is supposed to be the same, could you please provide a snippet for which it isn't!
You have to access items that are outside the module or section to see a difference.
unsigned int x;
void fun ( void )
{
x = 5;
}
so this crosses over .text to .data.
position dependent.
00000000 <fun>:
0: e3a02005 mov r2, #5
4: e59f3004 ldr r3, [pc, #4] ; 10 <fun+0x10>
8: e5832000 str r2, [r3]
c: e12fff1e bx lr
10: 00000000
position independent
00000000 <fun>:
0: e3a02005 mov r2, #5
4: e59f3010 ldr r3, [pc, #16] ; 1c <fun+0x1c>
8: e59f1010 ldr r1, [pc, #16] ; 20 <fun+0x20>
c: e08f3003 add r3, pc, r3
10: e7933001 ldr r3, [r3, r1]
14: e5832000 str r2, [r3]
18: e12fff1e bx lr
1c: 00000008
20: 00000000
In the first case the linker will fill in the address to the memory location
8: e5832000 str r2, [r3]
c: e12fff1e bx lr
10: 00000000 <--- here
the pc relative addressing from 4: to 10: is within the .text section so dependent or independent are fine.
4: e59f3004 ldr r3, [pc, #4] ; 10 <fun+0x10>
8: e5832000 str r2, [r3]
c: e12fff1e bx lr
10: 00000000
it gets the address to the external entity, filled in by the linker, and then directly access that item at that address.
4: e59f3010 ldr r3, [pc, #16] ; 1c <fun+0x1c>
8: e59f1010 ldr r1, [pc, #16] ; 20 <fun+0x20>
c: e08f3003 add r3, pc, r3
10: e7933001 ldr r3, [r3, r1]
14: e5832000 str r2, [r3]
18: e12fff1e bx lr
1c: 00000008
20: 00000000
is easier to see linked (-Ttext=0x1000 -Tdata=0x2000)
00001000 <fun>:
1000: e3a02005 mov r2, #5
1004: e59f3010 ldr r3, [pc, #16] ; 101c <fun+0x1c>
1008: e59f1010 ldr r1, [pc, #16] ; 1020 <fun+0x20>
100c: e08f3003 add r3, pc, r3
1010: e7933001 ldr r3, [r3, r1]
1014: e5832000 str r2, [r3]
1018: e12fff1e bx lr
101c: 00010010
1020: 0000000c
Disassembly of section .got:
00011024 <_GLOBAL_OFFSET_TABLE_>:
...
11030: 00002000
Disassembly of section .bss:
00002000 <x>:
2000: 00000000
(clearly I should have also specified where the GOT goes).
While the global offset table and .bss are different sections once linked they are fixed relative to each other. What position independence gives is the ability to move .bss (or .data, etc) relative to .text. So if you think about the position dependent solution, if .data were to move and you had say 1000 references sprinkled all through the binary, in order to move .bss you would have to patch every one of those.
Instead the global offset table here provides a single location where the address of the variable x lives, and all access to variable x will essentially use double indirection to access. It may not be obvious but a position dependent way to get at a table like this would be for the linker to fill in its absolute address, but that would not be independent and this was compiled to be independent so pc relative math has to be done to find the global offset table, so for this instruction set when executing the instruction at 0x100c the program counter is 0x100c+8.
100c: e08f3003 add r3, pc, r3
So we are adding 0x100C+8+0x00010010 = 0x11024 and adding 0x0000000c to that giving 0x11030. So compute the address to the GOT then the offset within that, and THAT gives us the address to the item. 0x2000. So you do the second indirection there to get at the item.
If you were to place .text at an address other than 0x1000 but don't move .bss that is fine this will all work so long that the GOT moves to the same relative offset from .text. If you were to leave .text but move .bss then you have to update the GOT, if you move .bss from 0x2000 to 0x3000 then that is a difference of +0x1000 so you then go through the GOT and add 0x1000 to each item to cover that difference.
Position independence essentially has to do double indirection instead of single indirection (or one more level than would have been needed for position dependent) in order to access distant items or items not position dependent relative to .text. Which means more code, more memory access. It is more code and slower.
For it to work .text reaching out to other .text items cant use fixed addresses it has to use indirect/computed addresses. Likewise the GOT as used here (by GNU) has to be at a fixed relative position to .text. Then from there you can move data relative to code and still access it. So you have to have some rules. .text being code and assumed read only cant support this offset table which needs to be in ram, so it cant simply be built into the .text section.

Are C compilers like gcc smart enough to bit shifts in place?

Unlike assembly code, in C there is no way to bit shift a value in place. To shift the bits in variable an assignment must always be performed:
x = x << 3;
Are compilers like gcc smart enough to realize that this is an in-place bit shift and compile it like this:
shl x, 3
or will the compiler put the result first in a register, then move it back into x (which would require two extra unnecessary instructions).
Any good compiler with optimization turned on will handle bit shifts efficiently.
Compilers will keep small objects in registers when feasible and efficient and will not store them to memory even if you write assignment statements, until they are forced to by circumstances.
Additionally, it is not desirable on typical modern processors to try to shift the bits of a value in memory. Generally, memory hardware does not have any capability to manipulate stored values. To change the value of something in memory, it must be moved to the processor (loaded), changed, and moved back (stored). Whether this is done in one instruction or several is not generally an indication of how fast or efficient it is, because the processor still has to do the individual load, shift, store operations, and the performance of those is highly dependent on the processor model.
Except in exceptional programming situations, you should not be worrying about performance at this level.
what did you see when you tried it? why not just try it?
unsigned int fun ( unsigned int x )
{
return (x<<3);
}
Disassembly of section .text:
00000000 <fun>:
0: e1a00180 lsl r0, r0, #3
4: e12fff1e bx lr
Disassembly of section .text:
00000000 <_fun>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 1d40 0004 mov 4(r5), r0
8: 0cc0 asl r0
a: 0cc0 asl r0
c: 0cc0 asl r0
e: 1585 mov (sp)+, r5
10: 0087 rts pc
Disassembly of section .text:
0000000000000000 <fun>:
0: 531d7000 lsl w0, w0, #3
4: d65f03c0 ret
Disassembly of section .text:
0000000000000000 <fun>:
0: 8d 04 fd 00 00 00 00 lea 0x0(,%rdi,8),%eax
7: c3 retq
00000000 <fun>:
0: 42 18 0c 5c rpt #3 { rlax.w r12 ;
4: 30 41 ret
Disassembly of section .text:
00000000 <fun>:
0: 050e slli x10,x10,0x3
2: 8082 ret
unsigned int x;
void fun ( void )
{
x=x<<3;
}
Disassembly of section .text:
00000000 <fun>:
0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14>
4: e5923000 ldr r3, [r2]
8: e1a03183 lsl r3, r3, #3
c: e5823000 str r3, [r2]
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
and so on

Bare metal programming Raspberry Pi 3.

I was going through some bare metal programming tutorials. While reading about C code execution I came to know that we need to setup C execution environment like initializing stack zeroing bss etc.
In some cases you have to copy data in ram , and need to provide startup code for that as well. Link of tutorial which says copy data in RAM.
Now I have two doubts.
If we need to copy data in RAM then why don't we copy code ie text segment. If we don't copy text segment doest it mean code is executed from SD card itself in case of Raspberry pi 3(Arm embedded processor).
When we specify linker script like below, does it suggest to copy those section in RAM or these sections will be mapped in RAM address?
Sorry I am really confuse.
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
Any help is appreciated.
vectors.s
.globl _start
_start:
mov sp,#0x8000
bl notmain
b .
notmain.c
unsigned int x;
unsigned int y=0x12345678;
void notmain ( void )
{
x=y+7;
}
memmap
MEMORY
{
bob : ORIGIN = 0x80000000, LENGTH = 0x1000
ted : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ted
.rodata : { *(.rodata*) } > ted
.bss : { *(.bss*) } > ted
.data : { *(.data*) } > ted
}
build
arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c notmain.c -o notmain.o
arm-none-eabi-ld vectors.o notmain.o -T memmap -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf -O binary kernel.img
you can add/remove options, and name it the right kernelX.img (and if you are venturing into 64 bit then use aarch64-whatever-gcc instead of arm-whatever-gcc...
Looking at the dissassembly
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <notmain>
8008: eafffffe b 8008 <_start+0x8>
0000800c <notmain>:
800c: e59f3010 ldr r3, [pc, #16] ; 8024 <notmain+0x18>
8010: e5933000 ldr r3, [r3]
8014: e59f200c ldr r2, [pc, #12] ; 8028 <notmain+0x1c>
8018: e2833007 add r3, r3, #7
801c: e5823000 str r3, [r2]
8020: e12fff1e bx lr
8024: 00008030 andeq r8, r0, r0, lsr r0
8028: 0000802c andeq r8, r0, r12, lsr #32
Disassembly of section .bss:
0000802c <x>:
802c: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008030 <y>:
8030: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
and comparing that to the kernelX.img file
hexdump -C kernel.img
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 10 30 9f e5 |.............0..|
00000010 00 30 93 e5 0c 20 9f e5 07 30 83 e2 00 30 82 e5 |.0... ...0...0..|
00000020 1e ff 2f e1 30 80 00 00 2c 80 00 00 00 00 00 00 |../.0...,.......|
00000030 78 56 34 12 |xV4.|
00000034
Note that because I put .data after .bss in the linker script it put them in that order in the image. there are four bytes of zeros after the last word in .text and the 0x12345678 of .data
If you swap the positions of .bss and .data in the linker script
0000802c <y>:
802c: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
Disassembly of section .bss:
00008030 <x>:
8030: 00000000 andeq r0, r0, r0
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 10 30 9f e5 |.............0..|
00000010 00 30 93 e5 0c 20 9f e5 07 30 83 e2 00 30 82 e5 |.0... ...0...0..|
00000020 1e ff 2f e1 2c 80 00 00 30 80 00 00 78 56 34 12 |../.,...0...xV4.|
00000030
Ooops, no freebie. Now .bss is not zeroed and you would need to zero it in your bootstrap (if you have a .bss area and as a programming style you assume those items are zero when you first use them).
Okay so how do you find where .bss is? well that is what the tutorial and countless others are showing you.
.globl _start
_start:
mov sp,#0x8000
bl notmain
b .
linker_stuff:
.word hello_world
.word world_hello
MEMORY
{
bob : ORIGIN = 0x80000000, LENGTH = 0x1000
ted : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ted
.rodata : { *(.rodata*) } > ted
.data : { *(.data*) } > ted
hello_world = .;
.bss : { *(.bss*) } > ted
world_hello = .;
}
build and disassemble
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000002 bl 8014 <notmain>
8008: eafffffe b 8008 <_start+0x8>
0000800c <linker_stuff>:
800c: 00008038 andeq r8, r0, r8, lsr r0
8010: 0000803c andeq r8, r0, r12, lsr r0
00008014 <notmain>:
8014: e59f3010 ldr r3, [pc, #16] ; 802c <notmain+0x18>
8018: e5933000 ldr r3, [r3]
801c: e59f200c ldr r2, [pc, #12] ; 8030 <notmain+0x1c>
8020: e2833007 add r3, r3, #7
8024: e5823000 str r3, [r2]
8028: e12fff1e bx lr
802c: 00008034 andeq r8, r0, r4, lsr r0
8030: 00008038 andeq r8, r0, r8, lsr r0
Disassembly of section .data:
00008034 <y>:
8034: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
Disassembly of section .bss:
00008038 <x>:
8038: 00000000 andeq r0, r0, r0
so digging more into toolchain specific stuff we can now know either the start and end of .bss or can use math in the linker script to get size and length. From which you can write a small loop that zeros that memory (in assembly language of course, chicken and egg, in the bootstrap before you branch to the C entry point of your program).
Now say for some reason you wanted .data at some other address 0x10000000
.globl _start
_start:
mov sp,#0x8000
bl notmain
b .
MEMORY
{
bob : ORIGIN = 0x10000000, LENGTH = 0x1000
ted : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ted
.rodata : { *(.rodata*) } > ted
.bss : { *(.bss*) } > ted
.data : { *(.data*) } > bob
}
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <notmain>
8008: eafffffe b 8008 <_start+0x8>
0000800c <notmain>:
800c: e59f3010 ldr r3, [pc, #16] ; 8024 <notmain+0x18>
8010: e5933000 ldr r3, [r3]
8014: e59f200c ldr r2, [pc, #12] ; 8028 <notmain+0x1c>
8018: e2833007 add r3, r3, #7
801c: e5823000 str r3, [r2]
8020: e12fff1e bx lr
8024: 10000000 andne r0, r0, r0
8028: 0000802c andeq r8, r0, r12, lsr #32
Disassembly of section .bss:
0000802c <x>:
802c: 00000000 andeq r0, r0, r0
Disassembly of section .data:
10000000 <y>:
10000000: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
so what is the kernel.img or -O binary format? it is just a memory image starting at the lowest address (0x8000 in this case) and filled OR PADDED to the highest address, in this case 0x10000003, so it is a 0x10000004-0x8000 byte file.
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 10 30 9f e5 |.............0..|
00000010 00 30 93 e5 0c 20 9f e5 07 30 83 e2 00 30 82 e5 |.0... ...0...0..|
00000020 1e ff 2f e1 00 00 00 10 2c 80 00 00 00 00 00 00 |../.....,.......|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
0fff8000 78 56 34 12 |xV4.|
0fff8004
That is a massive waste of disk space for this program, they padded the hell out of that. Now if for some reason you wanted to do something like this, various reasons (that generally do not apply to bare metal on the pi), you could do this instead:
MEMORY
{
bob : ORIGIN = 0x10000000, LENGTH = 0x1000
ted : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ted
.rodata : { *(.rodata*) } > ted
.bss : { *(.bss*) } > ted
.data : { *(.data*) } > bob AT > ted
}
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 10 30 9f e5 |.............0..|
00000010 00 30 93 e5 0c 20 9f e5 07 30 83 e2 00 30 82 e5 |.0... ...0...0..|
00000020 1e ff 2f e1 00 00 00 10 2c 80 00 00 00 00 00 00 |../.....,.......|
00000030 78 56 34 12 |xV4.|
00000034
Disassembly of section .bss:
0000802c <x>:
802c: 00000000 andeq r0, r0, r0
Disassembly of section .data:
10000000 <y>:
10000000: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
what it has done is the code is compiled and linked for .data at 0x10000000 but the binary that you carry around and load has the .data data bundled up tight, it is the job of the bootstrap to copy that data to its correct landing spot of 0x10000000 and again you have to use toolchain specific linker scripty stuff
.globl _start
_start:
mov sp,#0x8000
bl notmain
b .
linker_stuff:
.word data_start
.word data_end
MEMORY
{
bob : ORIGIN = 0x10000000, LENGTH = 0x1000
ted : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ted
.rodata : { *(.rodata*) } > ted
.bss : { *(.bss*) } > ted
data_start = .;
.data : { *(.data*) } > bob AT > ted
data_end = .;
}
0000800c <linker_stuff>:
800c: 00008038 andeq r8, r0, r8, lsr r0
8010: 10000004 andne r0, r0, r4
and clearly that didnt quite work so you have to do more linker scripy stuff to figure it out.
there is no good reason to need any of this for the raspberry pi, at best if you have .bss and dont have any .data and/or you put .bss last if you have a lot of it, then you can either take advantage of the toolchain accidentally zero padding and solving the .bss problem for you or if that is too big of a binary then you can see above how to find the .bss offset and size then add the few lines of code to zero it (ultimately costing load time either way, but not costing sd card space).
where you definitely need to learn such things is for when you are on a microcontroller where the non-volatile is treated as read-only flash, if you choose to program with a style that requires .data and/or .bss and you assume those items are implemented then you have to do the toolchain specific work to link then zero and/or copy from non-volatile flash to read/write ram before branching into the first or only C entry point of your application.
I am sure someone could come up with reasons to not pack a pi bare metal binary up nice and neat, there is always an exception...but for now you dont need to worry about those exceptions, put .bss first then .data and always make sure you have a .data item even if it is something you never use.

Different Static Global Variables Share the Same Memory Address

Summary
I have several C source files that all declare individual identically named static global variables. My understanding is that the static global variable in each file should be visible only within that file and should not have external linkage applied, but in fact I can see when debugging that the identically named variables share the same memory address.
It is like the static keyword is being ignored and the global variables are being treated as extern instead. Why is this?
Example Code
foo.c:
/* Private variables -----------------------------------*/
static myEnumType myVar = VALUE_A;
/* Exported functions ----------------------------------*/
void someFooFunc(void) {
myVar = VALUE_B;
}
bar.c:
/* Private variables -----------------------------------*/
static myEnumType myVar = VALUE_A;
/* Exported functions ----------------------------------*/
void someBarFunc(void) {
myVar = VALUE_C;
}
baz.c:
/* Private variables -----------------------------------*/
static myEnumType myVar = VALUE_A;
/* Exported functions ----------------------------------*/
void someBazFunc(void) {
myVar = VALUE_D;
}
Debugging Observations
Set breakpoints on the myVar = ... line inside each function.
Call someFooFunc, someBarFunc, and someBazFunc in that order from main.
Inside someFooFunc myVar initially is set to VALUE_A, after stepping over the line it is set to VALUE_B.
Inside someBarFunc myVar is for some reason initally set to VALUE_B before stepping over the line, not VALUE_A as I'd expect, indicating the linker may have merged the separate global variables based on them having an identical name.
The same goes for someBazFunc when it is called.
If I use the debugger to evaluate the value of &myVar when at each breakpoint the same address is given.
Tools & Flags
Toolchain: GNU ARM GCC (6.2 2016q4)
Compiler options:
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mlong-calls -O1 -fmessage-length=0 -fsigned-char -ffunction-sections -fdata-sections -ffreestanding -fno-move-loop-invariants -Wall -Wextra -g3 -DDEBUG -DTRACE -DOS_USE_TRACE_ITM -DSTM32L476xx -I"../include" -I"../system/include" -I"../system/include/cmsis" -I"../system/include/stm32l4xx" -I"../system/include/cmsis/device" -I"../foo/inc" -std=gnu11 -MMD -MP -MF"foo/src/foo.d" -MT"foo/src/foo.o" -c -o "foo/src/foo.o" "../foo/src/foo.c"
Linker options:
arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mlong-calls -O1 -fmessage-length=0 -fsigned-char -ffunction-sections -fdata-sections -ffreestanding -fno-move-loop-invariants -Wall -Wextra -g3 -T mem.ld -T libs.ld -T sections.ld -nostartfiles -Xlinker --gc-sections -L"../ldscripts" -Wl,-Map,"myProj.map" --specs=nano.specs -o ...
NOTE: I do understand that OP's target platform is ARM, but nevertheless I'm still posting an answer in terms of x86. The reason is, I have no ARM backend in handy, while the question is not limited to a particular architecture.
Here's a simple test stand. Note that I'm using int instead of custom enum typedef, since it should not matter at all.
foo.c
static int myVar = 1;
int someFooFunc(void)
{
myVar += 2;
return myVar;
}
bar.c
static int myVar = 1;
int someBarFunc(void)
{
myVar += 3;
return myVar;
}
main.c
#include <stdio.h>
int someFooFunc(void);
int someBarFunc(void);
int main(int argc, char* argv[])
{
printf("%d\n", someFooFunc());
printf("%d\n", someBarFunc());
return 0;
}
I'm compiling it on x86_64 Ubuntu 14.04 with GCC 4.8.4:
$ g++ main.c foo.c bar.c
$ ./a.out
3
4
Obtaining such results effectively means that myVar variables in foo.c and bar.c are different. If you look at the disassembly (by objdump -D ./a.out):
000000000040052d <_Z11someFooFuncv>:
40052d: 55 push %rbp
40052e: 48 89 e5 mov %rsp,%rbp
400531: 8b 05 09 0b 20 00 mov 0x200b09(%rip),%eax # 601040 <_ZL5myVar>
400537: 83 c0 02 add $0x2,%eax
40053a: 89 05 00 0b 20 00 mov %eax,0x200b00(%rip) # 601040 <_ZL5myVar>
400540: 8b 05 fa 0a 20 00 mov 0x200afa(%rip),%eax # 601040 <_ZL5myVar>
400546: 5d pop %rbp
400547: c3 retq
0000000000400548 <_Z11someBarFuncv>:
400548: 55 push %rbp
400549: 48 89 e5 mov %rsp,%rbp
40054c: 8b 05 f2 0a 20 00 mov 0x200af2(%rip),%eax # 601044 <_ZL5myVar>
400552: 83 c0 03 add $0x3,%eax
400555: 89 05 e9 0a 20 00 mov %eax,0x200ae9(%rip) # 601044 <_ZL5myVar>
40055b: 8b 05 e3 0a 20 00 mov 0x200ae3(%rip),%eax # 601044 <_ZL5myVar>
400561: 5d pop %rbp
400562: c3 retq
You can see that the actual addresses of static variables in different modules are indeed different: 0x601040 for foo.c and 0x601044 for bar.c. However, they are associated with a single symbol _ZL5myVar, which really screws up GDB logic.
You can double-check that by means of objdump -t ./a.out:
0000000000601040 l O .data 0000000000000004 _ZL5myVar
0000000000601044 l O .data 0000000000000004 _ZL5myVar
Yet again, different addresses, same symbols. How GDB will resolve this conflict is purely implementation-dependent.
I strongly believe that it's your case as well. However, to be double sure, you might want to try these steps in your environment.
so.s make the linker happy
.globl _start
_start: b _start
one.c
static unsigned int hello = 4;
static unsigned int one = 5;
void fun1 ( void )
{
hello=5;
one=6;
}
two.c
static unsigned int hello = 4;
static unsigned int two = 5;
void fun2 ( void )
{
hello=5;
two=6;
}
three.c
static unsigned int hello = 4;
static unsigned int three = 5;
void fun3 ( void )
{
hello=5;
three=6;
}
first off if you optimize then this is completely dead code and you should not expect to see any of these variables. The functions are not static so they dont disappear:
Disassembly of section .text:
08000000 <_start>:
8000000: eafffffe b 8000000 <_start>
08000004 <fun1>:
8000004: e12fff1e bx lr
08000008 <fun2>:
8000008: e12fff1e bx lr
0800000c <fun3>:
800000c: e12fff1e bx lr
If you dont optimize then
08000000 <_start>:
8000000: eafffffe b 8000000 <_start>
08000004 <fun1>:
8000004: e52db004 push {r11} ; (str r11, [sp, #-4]!)
8000008: e28db000 add r11, sp, #0
800000c: e59f3020 ldr r3, [pc, #32] ; 8000034 <fun1+0x30>
8000010: e3a02005 mov r2, #5
8000014: e5832000 str r2, [r3]
8000018: e59f3018 ldr r3, [pc, #24] ; 8000038 <fun1+0x34>
800001c: e3a02006 mov r2, #6
8000020: e5832000 str r2, [r3]
8000024: e1a00000 nop ; (mov r0, r0)
8000028: e28bd000 add sp, r11, #0
800002c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
8000030: e12fff1e bx lr
8000034: 20000000 andcs r0, r0, r0
8000038: 20000004 andcs r0, r0, r4
0800003c <fun2>:
800003c: e52db004 push {r11} ; (str r11, [sp, #-4]!)
8000040: e28db000 add r11, sp, #0
8000044: e59f3020 ldr r3, [pc, #32] ; 800006c <fun2+0x30>
8000048: e3a02005 mov r2, #5
800004c: e5832000 str r2, [r3]
8000050: e59f3018 ldr r3, [pc, #24] ; 8000070 <fun2+0x34>
8000054: e3a02006 mov r2, #6
8000058: e5832000 str r2, [r3]
800005c: e1a00000 nop ; (mov r0, r0)
8000060: e28bd000 add sp, r11, #0
8000064: e49db004 pop {r11} ; (ldr r11, [sp], #4)
8000068: e12fff1e bx lr
800006c: 20000008 andcs r0, r0, r8
8000070: 2000000c andcs r0, r0, r12
08000074 <fun3>:
8000074: e52db004 push {r11} ; (str r11, [sp, #-4]!)
8000078: e28db000 add r11, sp, #0
800007c: e59f3020 ldr r3, [pc, #32] ; 80000a4 <fun3+0x30>
8000080: e3a02005 mov r2, #5
8000084: e5832000 str r2, [r3]
8000088: e59f3018 ldr r3, [pc, #24] ; 80000a8 <fun3+0x34>
800008c: e3a02006 mov r2, #6
8000090: e5832000 str r2, [r3]
8000094: e1a00000 nop ; (mov r0, r0)
8000098: e28bd000 add sp, r11, #0
800009c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
80000a0: e12fff1e bx lr
80000a4: 20000010 andcs r0, r0, r0, lsl r0
80000a8: 20000014 andcs r0, r0, r4, lsl r0
Disassembly of section .data:
20000000 <hello>:
20000000: 00000004 andeq r0, r0, r4
20000004 <one>:
20000004: 00000005 andeq r0, r0, r5
20000008 <hello>:
20000008: 00000004 andeq r0, r0, r4
2000000c <two>:
2000000c: 00000005 andeq r0, r0, r5
20000010 <hello>:
20000010: 00000004 andeq r0, r0, r4
there are three hello variables created (you should notice by now that there is no reason to start up the debugger this can all be answered by simply examining the compiler and linker output, the debugger just gets in the way)
800000c: e59f3020 ldr r3, [pc, #32] ; 8000034 <fun1+0x30>
8000034: 20000000 andcs r0, r0, r0
8000044: e59f3020 ldr r3, [pc, #32] ; 800006c <fun2+0x30>
800006c: 20000008 andcs r0, r0, r8
800007c: e59f3020 ldr r3, [pc, #32] ; 80000a4 <fun3+0x30>
80000a4: 20000010 andcs r0, r0, r0, lsl r0
20000000 <hello>:
20000000: 00000004 andeq r0, r0, r4
20000008 <hello>:
20000008: 00000004 andeq r0, r0, r4
20000010 <hello>:
20000010: 00000004 andeq r0, r0, r4
each function is accessing its own separate version of the static global. They are not combined into one shared global.
The answers thus far have demonstrated that it should work as written, but the actual answer is only in the comments so I will post it as an answer.
What you’re seeing is a debugger artifact, not the real situation. In my experience, this should be your first guess of any truely wierd observation within the debugger. Verify the observation in the actual running program before going on. E.g. an old fashioned debug printf statement.

Resources