Enable GPIO on ARM STM32G030K6 - arm

I am trying to learn to program the STM32G030K6 by directly manipulating the registers (without relying on CubeMX). My program is intended to set pin PA5 to high.
// Target: STM32G030K6T6
// Goal: Set pin PA5 to high
#include "stm32g0xx.h" // Device header
int main(void)
{
RCC->IOPENR |= 1; // Enable GPIOA Clock
GPIOA->MODER |= 0x400; // Set GPIOA MODE5 to a general purpose output
GPIOA->ODR = 0x20; // Set PA5 high
while(1)
{
}
}
The program does not effect PA5 at all.
I have successfully tested the setup with a CubeMX blink program to prove it is not a hardware issue.
STM32G030K6: Data Sheet
STM32G030K6: Reference Manual

So what I have figured out from you so far is that you bought/acquired this part put it down on a breakout board. Have applied power and ground, added an led and resistor, and have an stlink hooked up. Can use CubeMX and make it work are using Kiel.
So I have made many a breakout board put the leds and such on the board because I got tired of wiring up items separately. The parts I have used you needed to make sure VDD and VDDA were connected but yours it is the same pin, check. VDD and VSS no doubt if you have it working. NRST pulled up for good measure although I think not required as there is an internal pull up, but BOOT0 did need a pull down, but this is an STM32G and you have pointed out that SWCLK and BOOT0 share the same pin. ST sadly is going away from the on chip bootloader or at least it is disabled by the factory
ST production value: 0xDFFF E1AA
Bit 24 nBOOT_SEL
0: BOOT0 signal is defined by BOOT0 pin value (legacy mode)
1: BOOT0 signal is defined by nBOOT0 option bit
So as shipped a new part BOOT0 is not something you can rely on to get into the bootloader and use a uart solution to download code into the flash, nor can you use it to get yourself unbricked while doing this level of work.
So the stlink is connected you said Kiel can talk to the part, so that is all in theory fine, not the problem.
I don't have Kiel off hand, but everyone can get a gnu cross compiler or build one from sources.
apt-get install binutils-arm-linux-gnueabi gcc-arm-linux-gnueabi
The code below does not care about arm-non-eabi- vs arm-linux-gnueabi- variations on the cross compiler it is independent of those differences, it just needs the compiler assembler and linker.
Now this will probably again get into a personal opinion battle with certain other SO users. Work through the noise. I am specifically avoiding CMSIS, I have seen the implementation, and you should inspect it to, for now you don't want to add that risk to your code, remove it and add it later as desired. This is my style it specifically controls the instruction used for access, everything about is based on a lot of experience even though you don't see that, designed for the reader to have a high chance of success. Make it your own if/when you get this to work and/or the side comments which is my real goal may help you examine the binary you are building with your own tool to eliminate common traps.
It is not simply a case of getting the C code in main() right for bare-metal code to work you need the whole thing from reset on to be right.
Flash based version:
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define RCC_BASE 0x40021000
#define RCC_IOPENR (RCC_BASE+0x34)
#define GPIOA_BASE 0x50000000
#define GPIOA_MODER (GPIOA_BASE+0x00)
#define GPIOA_OTYPER (GPIOA_BASE+0x04)
#define GPIOA_BSRR (GPIOA_BASE+0x18)
#define DCOUNT 2000000
int notmain ( void )
{
unsigned int ra;
unsigned int rx;
ra=GET32(RCC_IOPENR);
ra|=1<<0; //enable port a
PUT32(RCC_IOPENR,ra);
ra=GET32(GPIOA_MODER);
ra&=~(3<<(5<<1)); //clear bits 10,11
ra|= (1<<(5<<1)); //set bit 10
PUT32(GPIOA_MODER,ra);
ra=GET32(GPIOA_OTYPER);
ra&=~(1<<5); //clear bit 5
PUT32(GPIOA_OTYPER,ra);
for(rx=0;;rx++)
{
PUT32(GPIOA_BSRR, (1<<(5+ 0)) );
for(ra=0;ra<DCOUNT;ra++) dummy(ra);
PUT32(GPIOA_BSRR, (1<<(5+16)) );
for(ra=0;ra<DCOUNT;ra++) dummy(ra);
}
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.elf -T flash.ld flash.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary
Again you can replace arm-none-eabi with arm-linux-gnueabi if that is what you have/found. This code doesn't care about the differences.
The point here is for the processor to boot:
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000017 stmdaeq r0, {r0, r1, r2, r4}
800000c: 08000017 stmdaeq r0, {r0, r1, r2, r4}
08000010 <reset>:
8000010: f000 f808 bl 8000024 <notmain>
8000014: e7ff b.n 8000016 <hang>
08000016 <hang>:
8000016: e7fe b.n 8000016 <hang>
The application flash starts at 0x08000000 in the ARM memory space, called the Main Flash Memory in the reference manual. Depending on the boot strap settings 0x08000000 will be mirrored at 0x00000000, as documented in the ARM manuals this is where the vector table lives. The first word is a value loaded into the stack pointer on reset, the word at address 0x00000004 (which would be mirrored to 0x08000004) is the reset vector.
The above used the disassembler so it is trying to disassemble those values as instructions they are values/vectors ignore the disassembly for that table.
Assuming we can get the tools to put this binary in the flash at the desired location then
08000000 <_start>:
8000000: 20001000 value loaded into sp on reset
8000004: 08000011 reset vector
The reset vector is the address of the code to execute for that exception with the lsbit set to indicate thumb mode, the lsbit is stripped it does not go into the pc. So the reset vector address here is 0x08000010 which is correct:
08000010 <reset>:
8000010: f000 f808 bl 8000024 <notmain>
8000014: e7ff b.n 8000016 <hang>
And can follow this to notmain, name of the C entry point is not important, and some tools will add extra stuff it sees the label main(), have not seen one of those for years but continue to do this to also prove the point it doesn't matter.
So if this is put in the main flash at arm address 0x08000000 then this code will boot and run up to the C code.
Note sram starts at 0x20000000 and the RM shows this part has 32MBytes of sram so it has at least 0x1000 bytes to cover this project with plenty of extra room.
8000026: 481b ldr r0, [pc, #108] ; (8000094 <notmain+0x70>)
8000028: f7ff fff8 bl 800001c <GET32>
800002c: 2101 movs r1, #1
800002e: 4301 orrs r1, r0
8000030: 4818 ldr r0, [pc, #96] ; (8000094 <notmain+0x70>)
8000032: f7ff fff1 bl 8000018 <PUT32>
...
8000094: 40021034 andmi r1, r2, r4, lsr r0
Be it as I have programmed or through your program and CMSIS or HAL headers, you should see 0x40021034 being used in some form. Note this part of yours is a cortex-m0+ so it only has a limited number of thumb2 extensions note that bl is two separate 16 instructions that can be spaced apart, but are pretty much always found as a pair, they are two instructions, the rest of the instructions need to be 16 bit, if you see something.w in the disassembly or instructions other than bl being 32 or 16*2 bits then that may be a thumb2 instruction and that won't run on this processor and may be some setting you have used when building this code, you can see with this toolchain I have specifically called out an m0 which is effectively the same as m0+ from an instruction set perspective (architecture armv6-m). You do not want armv7-m for this chip it won't work, there are about a 100 or so instructions in armv7-m that won't work on armv6-m based chips.
The orring of the bit in the io enable register should resemble a read (ldr) from 0x40021034 a modification of the value read and a write (str) to that same address.
Your code as posted would have worked on other STM32 parts as many of them initialize the MODER register (if that part uses that flavor of GPIO peripheral) to zeros for most of the pins which is input. This part documents that most of the pins reset to 0b11 which is analog mode, curious why but whatever.
Reset value:
0xEBFF FFFF for port A
0xFFFF FFFF for other ports
So you can't simply set one of the two bits to change the mode if the bits started off as 0b00 then setting one can turn it into 0b01, but for this part you can either just clear bit 11 or better control both bits and not rely on the reset state, so clear the two bits and set one of them or clear one and set the other
5<<1 means 5 shifted left one 0b101 shift a zero in from the right gives 0b1010 which is a 0xA which is 10 this is a visual way to see that I am messing with PA5 and the number 5 is there, but for this register pin 5 mode settings are bits 10 and 11. 3 << (5<<1) means 3<<10 which is bits 10 and 11. the tilde means invert the whole thing so 00000C00 is the 3<<10 invert that you get FFFFF3FF which anded with the moder value will zero bits 10 and 11. now orr with 00000400 1<<10 to set bit 10.
We want the output at least for now to be a push-pull not open drain so even though the reset value is already push-pull, I clear it for good measure. Now I normally don't bother with the pull up or other gpio setup register, I mess with these two MODER and OTYPER for the STM32 parts that use this GPIO peripheral (you will see that not all STM32 parts use the same IP, the STM32F103 uses a different one for example, check it out.
So in some way confirm that CMSIS or not that the code produced is messing with these registers. From the documentation GPIOA starts at 0x50000000. so 0x50000000 and 0x50000004 registers.
Because this part has a GPIO BSRR register its a nice feature just use it for now so that you don't accidentally mess with other pins.
The dummy loop burns time so that in this case the led blinks on and off, you have to tune the DCOUNT based on the clock used for the processor when you get this running not too fast not too slow, just right. Doing it this way with an external function it is no longer dead code ( for(ra=0;ra<DCOUNT;ra++) continue; ) the compiler is forced to build it without using a volatile request.
No the code doesn't actually hit return(0); some compilers are not that smart and complain. (some are that smart and complain that you can't get there, YMMV)
All of these pieces need to be in place to have a half a chance of this working. Its not just about a few lines of C code.
With an stlink the kiel tools are fine and I would hope there is a way to examine memory space, you will want to examine 0x08000000 and compare that to the binary generated by the tool, and hopefully there is a way to examine the output of the tool as well to see what it built, easy to do with gnu.
You can use openocd instead of kiel to load and examine things from a command line it would be something in the form
openocd -f stlink.cfg -f target.cfg
and then in another window
telnet localhost 4444
gdb adds a whole lot more unknowns...
then you can use
mdw 0x08000000 40
In the telnet window to see what is in that main flash and then compare it to the loadable portion of the binary to see if your program is really there, if your program is not actually there then no matter what you do to the C code it wont make it blink.
There are ways to use openocd to flash parts, but it is very vendor/part specific as they have to add that capability to openocd and you have to have the right version, from memory it is something along the lines of
flash write_image erase notmain.elf
if using a "binary" with address information in it, if you are using a memory image then you need to put the address on that command line 0x08000000
Some st parts come locked or let's say boards like some blue pills where this doesn't work, virgin parts I don't know that I have seen locked, you bought loose parts it appears so they should not be locked.
If you get openocd working and gnu then you could also try using sram without having to have flash support initially.
sram.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
ldr r0,=0x20001000
mov sp,r0
bl notmain
b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
sram.ld
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
}
Since this part uses a vector table and what is about to be described is using the debugger to place and run a program in sram, volatile so when you reset/reboot it is lost, but it provides a way to experiment without having to get flash writing working.
We will tell the debugger to start execution at 0x20000000 so we want there to be an instruction there not a vector table.
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 sram.s -o sram.o
arm-none-eabi-ld -o notmain.elf -T sram.ld sram.o notmain.o
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy notmain.elf notmain.bin -O binary
always inspect your binary on a new project before running
Disassembly of section .text:
20000000 <_start>:
20000000: 4804 ldr r0, [pc, #16] ; (20000014 <dummy+0x2>)
20000002: 4685 mov sp, r0
20000004: f000 f808 bl 20000018 <notmain>
20000008: e7fe b.n 20000008 <_start+0x8>
2000000a <PUT32>:
2000000a: 6001 str r1, [r0, #0]
2000000c: 4770 bx lr
2000000e <GET32>:
2000000e: 6800 ldr r0, [r0, #0]
20000010: 4770 bx lr
20000012 <dummy>:
20000012: 4770 bx lr
20000014: 20001000 andcs r1, r0, r0
20000018 <notmain>:
20000018: b570 push {r4, r5, r6, lr}
and that looks good.
with openocd you can now
reset halt
load_image notmain.elf
resume 0x20000000
To run the program (might need a path, if you run openocd in the directory where the elf file is and/or you copy the elf file to the directory where you launched openocd (not telnet, openocd) then you usually don't need to put a path.
This is in sram not flash so may run faster and may want a larger value in the delay loop.
If you simply want to make the output high or low then just use the desired bsrr line and get rid of the loops, this code as written puts you in a safe infinite loop when you return from notmain, one that will not interfere with the gpio port, as part of your investigation of the binary you are building with your tool you need to confirm that the while loop you have placed is actually not dead code and was implemented (clang has been known to dead code this so others might as well) and some sandboxes undo stuff when you return from main so it could be that your code is now fine, but exits from main and the bootstrap undoes what you did to PA5 faster than you can see it.
That's all I can do so far, I have an stm32 cortex-m0+ part with a working openocd config if that helps, this is a different part but the core is the same if there isn't another tap then it should just work but you never know.
Short answer, your moder code wouldn't have worked, otherwise it looked good, but the C code is only part of the story required for success. This long answer highlights the main points that have to be there for success in booting and setting up the led. It is possible that both of us missed an additional enable, I don't have this part specifically so I cannot actually pull one out and run this code on it.

Related

How to get qemu to run an arm thumb binary?

I'm trying to learn the basics of ARM assembly and wrote a fairly simple program to sort an array. I initially assembled it using the armv8-a option and ran the program under qemu while debugging with gdb. This worked fine and the program initialized the array and sorted it as expected.
Ultimately I would like to be able to write some assembly for my Raspberry Pi Pico, which has an ARM Cortex M0+, which I believe uses the armv6-m option. However, when I change the directive in my code, it compiles fine but behaves strangely in that the program counter increments by 4 after every instruction instead of the 2 that I expect for thumb. This is causing my program to not work correctly. I suspect that qemu is trying to run my code as if it were compiled for the full ARM instruction set instead of thumb, but I'm not sure why this is.
I am running on Ubuntu Linux 20.04 LTS, using qemu-arm version 4.2.1 (installed from the package manager). Does the qemu-arm executable only run full ARM binaries? If so, is there another qemu package I can install to run a thumb binary?
Here is my code if it is helpful:
.arch armv6-m
.cpu cortex-m0plus
.syntax unified
.thumb
.data
arr: .skip 4 * 10
len: .word 10
.section .text
.global _start
.align 2
_start:
ldr r0, arr_adr # load the address of the start of the array into register 0
movs r1, #0 # clear the counter register
movs r2, #100
init_loop:
str r2, [r0,r1] # store r2's value to the base address of the array plus the offset stored in r1
subs r2, r2, #10 # subtract 10 from r2
adds r1, #4 # add 4 to the offset (1 word in bytes)
cmp r1, #40 # check if we've reached the end of the array
bne init_loop
movs r1, #0 # clear the offset
out_loop:
mov r3, r1 # set the index of the minimum value to the current array index
mov r4, r1 # set the inner loop index to the outer loop index
in_loop:
ldr r5, [r0,r3] # load the minimum index's value to r5
ldr r6, [r0,r4] # load the inner loop's next value to r6
cmp r6, r5 # compare the two values
bge in_loop_inc # if r6 is greater than or equal to r5, increment and restart loop
mov r3, r4 # set the minimum index to the current index
in_loop_inc:
adds r4, #4
cmp r4, #40 # check if at end of array
blt in_loop
ldr r5, [r0,r3] # load the minimum index value into r5
ldr r6, [r0,r1] # load the current outer loop index value into r6
str r6, [r0,r3] # swap the two values
str r5, [r0,r1]
adds r1, #4 # increment outer loop index
cmp r1, #40 # check if at end of array
blt out_loop
loop:
nop
b loop
arr_adr: .word arr
Thank you for your help!
There are a couple of concepts to disentangle here:
(1) Arm vs Thumb : these are two different instruction sets. Most CPUs support both, some support only one. Both are available simultaneously if the CPU supports both. To simplify a little bit, if you jump to an address with the least significant bit set that means "go to Thumb mode", and jumping to an address with that bit clear means "go to Arm mode". (Interworking is a touch more complicated than that, but that's a good initial mental model.) Note that all Arm instructions are 4 bytes long, but Thumb instructions can be either 2 or 4 bytes long.
(2) A-profile vs M-profile : these are two different families of CPU architecture. M-profile is "microcontrollers"; A-profile is "applications processors", which is "(almost) everything else". M-profile CPUs always support Thumb and only Thumb code. A-profile CPUs support both Arm and Thumb. The Raspberry Pi Pico is a Cortex-M0+, which is M-profile.
(3) QEMU system emulation vs user-mode emulation : these are two different QEMU executables which run guest code in different ways. The system emulation binary (typically qemu-system-arm) runs "bare metal code", eg an entire OS. The guest code has full control and can handle exceptions, write to hardware devices, etc. The user emulation binary (typically qemu-arm) is for running Linux user-space binaries. Guest code is started in unprivileged mode and has access to the usual Linux system calls. For system emulation, which CPU is being emulated depends on what machine type you select with the -M or --machine option. For user-mode emulation, the default CPU is "A-profile with all supported features enabled" (this is --cpu max).
You're currently using qemu-arm which means you get user-mode emulation. This should support Thumb binaries, but unless you pass it a --cpu option it will be using an A-profile CPU. I would also suggest using a newer QEMU for M-profile work, because a lot of M-profile CPU bugs have been fixed since version 4.2. I think 4.2 is also too old to have the Cortex-M0 CPU.
GDB should tell you in the PSR what the T bit is set to -- use that to check whether you're in Thumb mode or Arm mode, rather than looking at how much the PC is incrementing by.
There's currently no QEMU system emulation of the Raspberry Pi Pico (though somebody has been doing some experimental work on one). If your assembly is just basic "working with registers and a bit of memory" you can do that with the user-mode emulator. Or you can try the 'microbit' machine model, which is a Cortex-M0 board -- if you're not doing things that are specific to the Pi Pico that might be good enough.
memmap
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
}
strap.s
.cpu cortex-m0
.thumb
.syntax unified
.globl reset_entry
reset_entry:
.word 0x20001000
.word reset
.word hang
.word hang
.word hang
.thumb_func
reset:
ldr r0,=0x40002500
ldr r1,=4
str r1,[r0]
ldr r0,=0x40002008
ldr r1,=1
str r1,[r0]
ldr r0,=0x4000251C
ldr r1,=0x30
ldr r2,=0x37
loop_top:
str r1,[r0]
adds r1,r1,#1
ands r1,r1,r2
b loop_top
.thumb_func
hang:
b hang
build
arm-linux-gnueabi-as --warn --fatal-warnings strap.s -o strap.o
arm-linux-gnueabi-ld strap.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
Check the vector table as a quick check:
Disassembly of section .text:
00000000 <reset_entry>:
0: 20001000 andcs r1, r0, r0
4: 00000015 andeq r0, r0, r5, lsl r0
8: 0000002f andeq r0, r0, pc, lsr #32
c: 0000002f andeq r0, r0, pc, lsr #32
10: 0000002f andeq r0, r0, pc, lsr #32
00000014 <reset>:
14: 4806 ldr r0, [pc, #24] ; (30 <hang+0x2>)
16: 4907 ldr r1, [pc, #28] ; (34 <hang+0x6>)
18: 6001 str r1, [r0, #0]
1a: 4807 ldr r0, [pc, #28] ; (38
Looks good,
run it
qemu-system-arm -M microbit -nographic -kernel notmain.elf
and it will spew out 0123456701234567...until you ctrl-a then x to exit qemu.
Note this binary will not work on a real chip as I am cheating the uart.
You can get your feet wet with this sim. There is also a luminary micro one from the first cortex-m chips and you can limit yourself to armv6m instructions on that platform as well.
qemu and sims like this have very limited value for mcu work since almost all of the work is related to peripherals and pins, and the instruction set is just like the language of a book, French, Russian, English, German, doesn't matter a biology book is a biology book and the book is the goal. The peripherals are specific to the chip (the pico, a specific stm32 chip, a specific TI tiva C chip, etc).

I want to implement R10=R11+OxFFFFE in thumb assembly language

The immediate value i know can not be more then 12 bits in case of ADDW. is there any way to implement this in one instruction
Let the tools do the work, BTW immediates in arm (and other ISAs) has been asked and answered here many times. gnu assembler will choose the optimal solution (usually) using pseudocode.
.cpu cortex-m0
.thumb
ldr r0,=0xFFFFE
.align
00000000 <.text>:
0: 4800 ldr r0, [pc, #0] ; (4 <.text+0x4>)
2: 46c0 nop ; (mov r8, r8)
4: 000ffffe strdeq pc, [pc], -lr
enable thumb2 extensions
.cpu cortex-m7
.thumb
.syntax unified
ldr r0,=0xFFFFE
ldr r1,=0x00010001
.align
the other constant was to confirm it is attempting thumb2 extensions.
so no shortcuts you either need to build the value in parts or do a pc relative load from nearby. Or just code it with pseudocode and get what you get.
a similar constant
cpu cortex-m7
.thumb
.syntax unified
ldr r0,=0xFFFFFFFE
ldr r1,=0x00010001
.align
00000000 <.text>:
0: f06f 0001 mvn.w r0, #1
4: f04f 1101 mov.w r1, #65537 ; 0x10001
could have been done without the pc relative load.
No you cannot implement this with one instruction.
but this:
add r10,r11,#0xFFFFFFFE
you can...if that helps.

Position independent binary for Atmel SAM Cortex-M0+

I am trying to create a position independent binary for a Cortex-M0+ using the ARM GNU toolchain included with Atmel Studio 7 (arm-none-eabi ?). I have looked many places for information on how to do this, but am not successful. This would facilitate creating ping-pong images in low-high Flash memory areas for OTA updates without needing to know or care whether the update was a ping or pong image for that unit.
I have an 8 kB bootloader resident at 0x0000 which I can communicate with over UART and which will jump to 0x6000 (24 kB) after reset if it detects a binary there (i.e. not 0xFFFF erased Flash). This SAM-BA bootloader allows me to dump memory and erase and program Flash with .bin files at a designated address.
In the application project (simple LED blink), doing nothing but adding -section-start=.text=0x6000 to the linker command line results in the LED blink code working after it is programmed at 0x6000 by the bootloader. I see also in the hex file that it starts at 0x6000.
In my attempt to create a position independent binary, I have removed the above linker item, and added the -fPIC flag to the command lines for the compiler, the linker and the assembler. But, I think I still see absolute branch addresses in the disassembly, such as :
28e: d001 beq.n 294
And the result is that the LED blink binary I load at 0x6000 does not execute unless I specifically tell the linker to put it at 0x6000, which defeats the purpose. Note that I do also see what looks like relative branches in other parts of the disassembly :
21c: 4b03 ldr r3, [pc, #12] ; (22c )
21e: 58d3 ldr r3, [r2, r3]
220: 9301 str r3, [sp, #4]
222: 4798 blx r3
The SRAM is always at the same address (0x20000000), I just need to be able to re-position the executable. I have not modified the linker command file, and it does not have section for .got (e.g. (.got) or similar).
Can anyone explain to me the specific changes I need to make to the compiler/assembler/linker flags to create a position independent binary in this setup ? Many thanks in advance.
You need to look more closely at your disassembly. For 0xd001, I get this:
0x00000000: d001 .. BEQ {pc}+0x6 ; 0x6
In your case, the toolchain has tried to be helpful. Clearly, a 16 bit opcode can't encode an absolute address with a 32 bit address space. So you are closer than you think to a solution.

Why do I get the same address every time I build + disassemble a function inside GDB?

Every time when I disassemble a function, why do I always get the same instruction address and constants' address?
For example, after executing the following commands,
gcc -o hello hello.c -ggdb
gdb hello
(gdb) disassemble main
the dump code would be:
When I quit gdb and re-disassemble the main function, I will get the same result as before. The instruction address and even the address of constants are always the same for each disassemble command in gdb. Why is that? Does the compiled file hello contain certain information about the address of each assembly instruction as well as the constants' addresses?
If you made a position-independent executable (e.g. with gcc -fpie -pie, which is the default for gcc in many recent Linux distros), the kernel would randomize the address it mapped your executable at. (Except when running under GDB: GDB disables ASLR by default even for shared libraries, and for PIE executables.)
But you're making a position-dependent executable, which can take advantage of static addresses being link-time constants (by using them as immediates and so on without needing runtime relocation fixups). e.g. you or the compiler can use mov $msg, %edi (like your code) instead of lea msg, %rdi (with -fpie).
Regular (position-dependent) executables have their load-address set in the ELF headers: use readelf -a ./a.out to see the ELF metadata.
A non-PIE executable will load at the same time every time even without running it under GDB, at the address specified in the ELF program headers.
(gcc / ld chooses 0x400000 by default on x86-64-linux-elf; you could change this with a linker script). Relocation information for all the static addresses hard-coded into the code + data is not available, so the loader couldn't fix up the addresses even if it wanted to.
e.g. in a simple executable (with only a text segment, not data or bss) I built with -no-pie (which seems to be the default in your gcc):
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000c5 0x00000000000000c5 R E 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
So the ELF headers request that offset 0 in the file be mapped to virtual address 0x0000000000400000. (And the ELF entry point is 0x400080; that's where _start is.) I'm not sure what the relevance of PhysAddr = VirtAddr is; user-space executables don't know and can't easily find out what physical addresses the kernel used for pages of RAM backing their virtual memory, and it can change at any time as pages are swapped in / out.
Note that readelf does line wrapping; note there are two rows of columns headers. The 0x200000 is the Align column for that one LOADed segment.
By default, the GNU toolchain for x86-64 Linux produces position-dependent executables which are mapped at address 0x400000. (position-independent executables will be mapped at 0x55… addresses instead). It is possible to change that by building GCC --enable-default-pie, or by specifying compiler and linker flags.
However, even for a position-independent executable (PIE), the addresses would be constant between GDB runs because GDB disables address space layout randomization by default. GDB does this so that breakpoints at absolute addresses can be re-applied after the program has been started.
There are a variety of executable file formats. Typically, an executable file contains information anout several memory sections or segments. Inside the executable, references to memory addresses may be expressed relative to the beginning of a section. The executable also contains a relocation table. The relocation table is a list of those references, including where each one is in the executable, what section it refers to, and what type of reference it is (what field of an instruction it is used in, etc.).
The loader (software that loads your program into memory) reads the executable and writes the sections to memory. In your case, the loader appears to be using the same base addresses for sections every time it runs. After initially putting the sections in memory, the loader reads the relocation table and uses it to fix up all the references to memory by adjusting them based on where each section was loaded into memory. For example, the compiler may write an instruction as, in effect, “Load register 3 from the start of the data section plus 278 bytes.” If the loader puts the data section at address 2000, it will adjust this instruction to use the sum of 2000 and 278, making “Load register 3 from address 2278.”
Good modern loaders randomize where sections are loaded. They do this because malicious people are sometimes able to exploit bugs in programs to cause them to execute code injected by the attacker. Randomizing section locations prevents the attacker from knowing the address where their code will be injected, which can hinder their ability to prepare the code to be injected. Since your addresses are not changing, it appears your loader does not do this. You may be using an older system.
Some processor architectures and/or loaders support position independent code (PIC). In this case, the form of an instruction may be “Load register 3 from 694 bytes beyond where this instruction is.” In that case, as long as the data is always at the same distance from the instruction, it does not matter where they are in memory. When the process executes the instruction, it will add the address of the instruction to 694, and that will be the address of the data. Another way of implementing PIC-like code is for the loader to provide the addresses of each section to the program, by putting those addresses in registers or fixed locations in memory. Then the program can use those base addresses to do its own address calculations. Since your program has an address built into the code, it does not appear your program is using these methods.
a not intended to be really executed program
bootstrap
.globl _start
_start:
bl one
b .
first c file
extern unsigned int hello;
unsigned int one ( void )
{
return(hello+5);
}
second c file (being extern forces the compiler to compile the first object in a certain way)
unsigned int hello;
linker script
MEMORY
{
ram : ORIGIN = 0x00001000, LENGTH = 0x4000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
building position dependent
Disassembly of section .text:
00001000 <_start>:
1000: eb000000 bl 1008 <one>
1004: eafffffe b 1004 <_start+0x4>
00001008 <one>:
1008: e59f3008 ldr r3, [pc, #8] ; 1018 <one+0x10>
100c: e5930000 ldr r0, [r3]
1010: e2800005 add r0, r0, #5
1014: e12fff1e bx lr
1018: 0000101c andeq r1, r0, r12, lsl r0
Disassembly of section .bss:
0000101c <hello>:
101c: 00000000 andeq r0, r0, r0
the key here is at address 0x1018 the compiler had to leave a placeholder for the address to the external item. shown as offset 0x10 below
00000000 <one>:
0: e59f3008 ldr r3, [pc, #8] ; 10 <one+0x10>
4: e5930000 ldr r0, [r3]
8: e2800005 add r0, r0, #5
c: e12fff1e bx lr
10: 00000000 andeq r0, r0, r0
The linker fills this in at link time. You can see in the disassembly above that position dependent it fills in the absolute address of where to find that item. For this code to work the code must be loaded in a way that that item shows up at that address. It has to be loaded at a specific position or address in memory. Position dependent. (loaded at address 0x1000 basically).
If your toolchain supports position independent (gnu does) then this represents a solution.
Disassembly of section .text:
00001000 <_start>:
1000: eb000000 bl 1008 <one>
1004: eafffffe b 1004 <_start+0x4>
00001008 <one>:
1008: e59f3014 ldr r3, [pc, #20] ; 1024 <one+0x1c>
100c: e59f2014 ldr r2, [pc, #20] ; 1028 <one+0x20>
1010: e08f3003 add r3, pc, r3
1014: e7933002 ldr r3, [r3, r2]
1018: e5930000 ldr r0, [r3]
101c: e2800005 add r0, r0, #5
1020: e12fff1e bx lr
1024: 00000014 andeq r0, r0, r4, lsl r0
1028: 00000000 andeq r0, r0, r0
Disassembly of section .got:
0000102c <.got>:
102c: 0000103c andeq r1, r0, r12, lsr r0
Disassembly of section .got.plt:
00001030 <_GLOBAL_OFFSET_TABLE_>:
...
Disassembly of section .bss:
0000103c <hello>:
103c: 00000000 andeq r0, r0, r0
It has a performance hit of course, but instead of the compiler and linker working together by leaving one location, there is now a table, global offset table (for this solution) that is at a known location which is position relative to the code, that contains linker supplied offsets.
The program is not position independent yet, it will certainly not work if you load it anywhere. The loader has to patch up the table/solution based on where it wants to place the items. This is far simpler than having a very long list of each of the locations to patch in the first solution, although that would have been a very possible way to do it. A table in the executable (executables contain more than the program and data they have other items of information as you know if you objdump or readelf an elf file) could contain all of those offsets and the loader could patch those up too.
If your data and bss and other memory sections are fixed relative to .text as I have built here, then a got wasnt necessary the linker could have at link time computed the relative offset to the resource and along with the compiler found the item in an position independent way, and the binary could have been loaded just about anywhere (some minimum alignment may hav been required) and it would work without any patching. With the gnu solution I think you can move the segments relative to each other.
It is incorrect to state that the kernel will or would always randomize your location if built position independent. While possible so long as the toolchain and the loader from the operating system (a completely separate development) work hand in hand, the loader has the opportunity. But that does not in any way mean that every loader does or will. Specific operating systems/distros/versions may have that set as a default yes. If they come across a binary that is position independent (built in a way that loader expects). It is like saying all mechanics on the planet will use a specific brand and type of oil if you show up in their garage with a specific brand of car. A specific mechanic may always use a specific oil brand and type for a specific car, but that doesnt mean all mechanics will or perhaps even can obtain that specific oil brand or type. If that individual business chooses to as a policy then you as a customer can start to form an assumption that that is what you will get (with that assumption then failing when they change their policy).
As far as disassembly you can statically disassemble your project at build time or whenever. If loaded at a different position then there will be an offset to what you are seeing, but the .text code will still be in the same place relative to other code in that segment. If the static disassembly shows a call being 0x104 bytes ahead, then even if loaded somewhere else you should see that relative jump also be 0x104 bytes ahead, the addresses may be different.
Then there is the debugger part of this, for the debugger to work/show the correct information it also has to be part of the toolchain/loader(/os) team for everything to work/look right. It has to know this was position independent and have to know where it was loaded and/or the debugger is doing the loading for you and may not use the standard OS loader in the same way that a command line or gui does. So you might still see the binary in the same place every time when using the debugger.
The main bug here was your expectation. First operating systems like windows, linux, etc desire to use an MMU to allow them to manage memory better. To pick some/many non-linear blocks of physical memory and create a linear area of virtual memory for your program to live, more importantly the virtual address space for each separate program can look the same, I can have every program load at 0x8000 in virtual address space, without interfering with each other, with an MMU designed for this and an operating system that takes advantage of this. Even with this MMU and operating system and position independent loading one would hope they are not using physical addresses, they are still creating a virtual address space, just possibly with different load points for each program or each instance of a program. Expecting all operating systems to do this all the time is an expectation problem. And when using a debugger you are not in a stock environment, the program runs differently, can be loaded differently, etc. It is not the same as running without the debugger, so using a debugger also changes what you should expect to see happen. Two levels of expectation here to deal with.
Use an external component in a very simple program as I made above, see in the disassembly of the object that it has built for position independence as well as in the linking then try Linux as Peter has indicated and see if it loads in a different place each time, if not then you need to be looking at superuser SE or google around about how to use linux (and/or gdb) to get it to change the load location.

arm disassembly - ADR or SUB?

I started to build a arm disassembler.
I have the binary "48 00 4F E2"
Ida:
ROM:00000040 48 00 4F E2 ADR R0, sub_0
Qemu:
e24f0048 sub r0, pc, #72
I do not think it's an BE/LE problem because the commands that came before and after would look the same.
What happens ?
so if you just try it...
.word 0x48004FE2
.word 0xe24f0048
gives
0: 48004fe2 stmdami r0, {r1, r5, r6, r7, r8, r9, r10, r11, lr}
4: e24f0048 sub r0, pc, #72 ; 0x48
but since you are writing a disassembler (very good exercise to learn an instruction set BTW, keep going)...the first thing you notice is the condition code. Neither of the results you saw have "mi" associated with it so neither assuming they are both disassembling arm and arm mode, did not see that 4 as the top nibble. the 0xe is ALways so not noted.
you also see in your arm documentation that adr starts with
cccc0010010x1111 or cccc0010100x1111 0xX24F 0xX28F
and it is a sub without the Rn being 1111, but one disassembler chose to honor ADR which is a pseudo instruction, the other just decoded it as a sub. It may also matter what architecture you specified. In the newer arm arm ADR shows that it is supported by armv4t on, but the older arm arm (armv4 and armv5 and some armv6) shows no ADR instruction. It just shows the sub.
which starts as
cccc00I0010SNNNN
where cccc is the condition code I can be a zero in some cases or a 1 in this case, S is the save flags or not and NNNN is Rn.
sub(cond)(s) Rd,Rn,shifter operand
you have 0xE24F so that is
sub r0,pc,shifter_operand.
I is a 1 so that is a 32 bit immediate. 0x48 with a rotate of 0x0 which is a decimal 72 (hex 0x48).
what is at 0x40-0x48+8 (address 0x0000)? is that the label sub_0?
It is looking like they both correctly disassembled the instruction.
If you are truly making a disassembler though then you shouldnt have needed to ask this question as you had everything you needed in front of you already.

Resources