Related
The RISC-V instruction auipc does rd = (imm << 12) + PC, being rd the destination register and imm a 12 bit signed immediate.
The result of the above instruction will vary depending on at which address is the binary running. Let's suppose a system uses a bootloader to boot a firmware image. In that case, the initial PC for the firmware image will be different from 0x0. This fact will be reflected in the linker script by doing something like:
.text :
{
_text = .;
*(.text)
_etext = .;
} > FW_IMG
being FW_IMG something like:
FW_IMG (rx): ORIGIN = 2048, LENGTH = 2304
My question is, how can this work?
I mean, let's suppose a 32 bit CPU, and that the 4th instruction the compiler generates
is an auipc. Let's supposed that the FW image is to be placed at address 0x200000000, then, the PC will be 0x20000000 + 16 (4th instruction). Will the compiler be aware of this so it generates the right values etc. for the above auipc instruction?
EDIT
A good example of this is la. la is a pseudo-instruction that will be expanded to an auipc and an addi. If the compiler generates code to load a symbol, depending on where the image is to be located at runtime, the generated instructions will be different.
EDIT 2
I have tried to build the same image with 2 different linker scripts, completely different one from the other, and having that the first instruction is an la. The generated auipc instructions are indeed different in each cases, and they calculate the right address.
The only explanation I find to this is that, somehow, the assembler generates auipc 'placeholders' and then the linker fills them with the right values.
Let us ask the toolchain.
so.c
unsigned int x;
unsigned int y=5;
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int a )
{
x=a+y;
return(more_fun(x)+3);
}
start.s
.globl more_fun
more_fun:
j .
so.ld
MEMORY
{
mem0 : ORIGIN = 0x00003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x00004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x00005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x00006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
.got : { *(.got*) } > mem0
}
no reason at this time for this to be an actually functioning program.
position dependent
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 6799 lui x15,0x6
3006: 0007a783 lw x15,0(x15) # 6000 <y>
300a: 1141 addi x2,x2,-16
300c: c606 sw x1,12(x2)
300e: 953e add x10,x10,x15
3010: 6795 lui x15,0x5
3012: 00a7a023 sw x10,0(x15) # 5000 <x>
3016: 37ed jal 3000 <more_fun>
3018: 40b2 lw x1,12(x2)
301a: 050d addi x10,x10,3
301c: 0141 addi x2,x2,16
301e: 8082 ret
Disassembly of section .sbss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .sdata:
00006000 <y>:
6000: 0005 c.nop 1
position independent
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 00000797 auipc x15,0x0
3008: 02c7a783 lw x15,44(x15) # 3030 <_GLOBAL_OFFSET_TABLE_+0x8>
300c: 439c lw x15,0(x15)
300e: 1141 addi x2,x2,-16
3010: c606 sw x1,12(x2)
3012: 953e add x10,x10,x15
3014: 00000797 auipc x15,0x0
3018: 0187a783 lw x15,24(x15) # 302c <_GLOBAL_OFFSET_TABLE_+0x4>
301c: c388 sw x10,0(x15)
301e: 37cd jal 3000 <more_fun>
3020: 40b2 lw x1,12(x2)
3022: 050d addi x10,x10,3
3024: 0141 addi x2,x2,16
3026: 8082 ret
Disassembly of section .bss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .data:
00006000 <y>:
6000: 0005 c.nop 1
...
Disassembly of section .got:
00003028 <_GLOBAL_OFFSET_TABLE_>:
3028: 0000 unimp
302a: 0000 unimp
302c: 5000 lw x8,32(x8)
302e: 0000 unimp
3030: 6000 flw f8,0(x8)
3032: 0000 unimp
3034: ffff .2byte 0xffff
3036: ffff .2byte 0xffff
3038: 0000 unimp
...
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.
In this case I put the got in the same section. So no major adjustment needed here. Get to the got, use the got to get to the data.
MEMORY
{
mem0 : ORIGIN = 0x00003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x00004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x00005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x00006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
}
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 00003797 auipc x15,0x3
3008: 0087a783 lw x15,8(x15) # 600c <_GLOBAL_OFFSET_TABLE_+0x8>
300c: 439c lw x15,0(x15)
300e: 1141 addi x2,x2,-16
3010: c606 sw x1,12(x2)
3012: 953e add x10,x10,x15
3014: 00003797 auipc x15,0x3
3018: ff47a783 lw x15,-12(x15) # 6008 <_GLOBAL_OFFSET_TABLE_+0x4>
301c: c388 sw x10,0(x15)
301e: 37cd jal 3000 <more_fun>
3020: 40b2 lw x1,12(x2)
3022: 050d addi x10,x10,3
3024: 0141 addi x2,x2,16
3026: 8082 ret
Disassembly of section .bss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .data:
00006000 <y>:
6000: 0005 c.nop 1
...
Disassembly of section .got:
00006004 <_GLOBAL_OFFSET_TABLE_>:
6004: 0000 unimp
6006: 0000 unimp
6008: 5000 lw x8,32(x8)
600a: 0000 unimp
600c: 6000 flw f8,0(x8)
...
It tacked it on to .data if not specified apparently. But it is all good. You add 0x3000 to 0x3000 to get to 0x6000.
The call to more_fun is a pc-relative offset.
The jump and link (JAL) instruction uses the J-type format, where the J-immediate encodes a signed offset in multiples of 2 bytes. The offset is sign-extended and added to the address of the jump instruction to form the jump target address. Jumps can therefore target a ±1 MiB range. JAL stores the address of the instruction following the jump (pc+4) into register rd. The standard software calling convention uses x1 as the return address register and x5 as an alternate link register.
So until the program gets very big (or you play linker games to make function calls far apart) that all works.
Here is the thing about position independence...Think of it as the binary is a blob. If you load the binary above at 0x3000 then .data is at 0x6000, 0x3000 bytes away. But if you load at 0x20003000 then .data is at 0x20006000, which is still 0x3000 bytes away.
But, you have to update the got
600c: 0x20006000
But that is the whole point. You isolate the address of every global (or group of them) and put it in an table. Then if you want to relocate the program elsewhere you or the loader of the program has to find and change the entries in the got. In this case add 0x20000000 to all of them. Then the code all works.
In a bootloader situation where you are probably not an operating system parsing an elf file.
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem0
.bss : { *(.bss*) } > mem0
.data : { *(.data*) } > mem0
}
Disassembly of section .text:
00000000 <more_fun>:
0: 0000006f j 0 <more_fun>
00000004 <fun>:
4: 00000797 auipc x15,0x0
8: 0347a783 lw x15,52(x15) # 38 <_GLOBAL_OFFSET_TABLE_+0x8>
c: 439c lw x15,0(x15)
e: 1141 addi x2,x2,-16
10: c606 sw x1,12(x2)
12: 953e add x10,x10,x15
14: 00000797 auipc x15,0x0
18: 0207a783 lw x15,32(x15) # 34 <_GLOBAL_OFFSET_TABLE_+0x4>
1c: c388 sw x10,0(x15)
1e: 37cd jal 0 <more_fun>
20: 40b2 lw x1,12(x2)
22: 050d addi x10,x10,3
24: 0141 addi x2,x2,16
26: 8082 ret
Disassembly of section .bss:
00000028 <x>:
28: 0000 unimp
...
Disassembly of section .data:
0000002c <y>:
2c: 0005 c.nop 1
...
Disassembly of section .got:
00000030 <_GLOBAL_OFFSET_TABLE_>:
30: 0000 unimp
32: 0000 unimp
34: 0028 addi x10,x2,8
36: 0000 unimp
38: 002c addi x11,x2,8
...
In your bootstrap you would auipc x15,0 to get the pc then you would use normal (linker plus programming) techniques to get the offset to and size of the got. And you would make the adjustment to each entry yourself before running code that relies on the .got to find the data.
Could the toolchain do this without a got?
Sure, but...
mem0 : ORIGIN = 0x10000000, LENGTH = 0x1000
Disassembly of section .text:
10000000 <more_fun>:
10000000: 0000006f j 10000000 <more_fun>
10000004 <fun>:
10000004: 00000797 auipc x15,0x0
10000008: 0287a783 lw x15,40(x15) # 1000002c <y>
1000000c: 97aa add x15,x15,x10
1000000e: 1141 addi x2,x2,-16
10000010: 853e mv x10,x15
10000012: c606 sw x1,12(x2)
10000014: 00000717 auipc x14,0x0
10000018: 00f72a23 sw x15,20(x14) # 10000028 <x>
1000001c: 37d5 jal 10000000 <more_fun>
1000001e: 40b2 lw x1,12(x2)
10000020: 050d addi x10,x10,3
10000022: 0141 addi x2,x2,16
10000024: 8082 ret
Disassembly of section .bss:
10000028 <x>:
10000028: 0000 unimp
...
Disassembly of section .data:
1000002c <y>:
1000002c: 0005 c.nop 1
...
this
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
created an optimization I did not want.
Disassembly of section .text:
00000000 <more_fun>:
0: 0000006f j 0 <more_fun>
00000004 <fun>:
4: 02402783 lw x15,36(x0) # 24 <y>
8: 97aa add x15,x15,x10
a: 1141 addi x2,x2,-16
c: 853e mv x10,x15
e: c606 sw x1,12(x2)
10: 02f02023 sw x15,32(x0) # 20 <x>
14: 37f5 jal 0 <more_fun>
16: 40b2 lw x1,12(x2)
18: 050d addi x10,x10,3
1a: 0141 addi x2,x2,16
1c: 8082 ret
Disassembly of section .bss:
00000020 <x>:
20: 0000 unimp
...
Disassembly of section .data:
00000024 <y>:
24: 0005 c.nop 1
...
I wanted this position independence
10000004: 00000797 auipc x15,0x0
10000008: 0287a783 lw x15,40(x15) # 1000002c <y>
but despite asking for position independence I got this which is position dependent.
4: 02402783 lw x15,36(x0) # 24 <y>
fpic vs fpie. You probably want the fpie to make life much easier but as shown you need to know the tools. The tools know how to do it but we seem to be able to trip them up.
This one bothered me and delayed even writing this answer.
MEMORY
{
mem0 : ORIGIN = 0x10003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x20004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x30005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x40006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
}
Disassembly of section .text:
10003000 <more_fun>:
10003000: 0000006f j 10003000 <more_fun>
10003004 <fun>:
10003004: 30003797 auipc x15,0x30003
10003008: 0087a783 lw x15,8(x15) # 4000600c <_GLOBAL_OFFSET_TABLE_+0x8>
1000300c: 439c lw x15,0(x15)
1000300e: 1141 addi x2,x2,-16
10003010: c606 sw x1,12(x2)
10003012: 953e add x10,x10,x15
10003014: 30003797 auipc x15,0x30003
10003018: ff47a783 lw x15,-12(x15) # 40006008 <_GLOBAL_OFFSET_TABLE_+0x4>
1000301c: c388 sw x10,0(x15)
1000301e: 37cd jal 10003000 <more_fun>
10003020: 40b2 lw x1,12(x2)
10003022: 050d addi x10,x10,3
10003024: 0141 addi x2,x2,16
10003026: 8082 ret
Disassembly of section .bss:
30005000 <x>:
30005000: 0000 unimp
...
Disassembly of section .data:
40006000 <y>:
40006000: 0005 c.nop 1
...
Disassembly of section .got:
40006004 <_GLOBAL_OFFSET_TABLE_>:
40006004: 0000 unimp
40006006: 0000 unimp
40006008: 5000 lw x8,32(x8)
4000600a: 3000 fld f8,32(x8)
4000600c: 6000 flw f8,0(x8)
4000600e: 4000 lw x8,0(x8)
LOL I thought this was completely broken, but now I see....Because I used the disassembler it broke it into 16 bit values so it is actually going to 0x40006000 and 0x30005000...whew
And just to confirm:
.section .mfun
.globl more_fun
more_fun:
j .
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
mem1 : ORIGIN = 0x20004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x30005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x40006000, LENGTH = 0x1000
mem4 : ORIGIN = 0x10000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem0
.bss : { *(.bss*) } > mem0
.data : { *(.data*) } > mem0
.mfun : { *(.mfun*) } > mem4
}
Disassembly of section .text:
00000000 <fun>:
0: 02402783 lw x15,36(x0) # 24 <y>
4: 97aa add x15,x15,x10
6: 1141 addi x2,x2,-16
8: 853e mv x10,x15
a: c606 sw x1,12(x2)
c: 02f02023 sw x15,32(x0) # 20 <x>
10: 10000097 auipc x1,0x10000
14: ff0080e7 jalr -16(x1) # 10000000 <more_fun>
18: 40b2 lw x1,12(x2)
1a: 050d addi x10,x10,3
1c: 0141 addi x2,x2,16
1e: 8082 ret
Disassembly of section .bss:
00000020 <x>:
20: 0000 unimp
...
Disassembly of section .data:
00000024 <y>:
24: 0005 c.nop 1
...
Disassembly of section .mfun:
10000000 <more_fun>:
10000000: 0000006f j 10000000 <more_fun>
for fpie that works fine...and fpic does not change it based on different assumptions.
la x5,hello
la x6,world
.data
hello: .word 0x1
world: .word 0x2
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
mem1 : ORIGIN = 0x10004000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.data : { *(.data*) } > mem1
}
Disassembly of section .text:
00000000 <.text>:
0: 10004297 auipc x5,0x10004
4: 00028293 mv x5,x5
8: 10004317 auipc x6,0x10004
c: ffc30313 addi x6,x6,-4 # 10004004 <world>
Disassembly of section .data:
10004000 <hello>:
10004000: 0001 .2byte 0x1
...
10004004 <world>:
10004004: 0002 .2byte 0x2
...
or
Disassembly of section .text:
00000000 <.text>:
0: 10004297 auipc x5,0x10004
4: 00c2a283 lw x5,12(x5) # 1000400c <_GLOBAL_OFFSET_TABLE_+0x4>
8: 10004317 auipc x6,0x10004
c: 00832303 lw x6,8(x6) # 10004010 <_GLOBAL_OFFSET_TABLE_+0x8>
Disassembly of section .data:
10004000 <hello>:
10004000: 0001 .2byte 0x1
...
10004004 <world>:
10004004: 0002 .2byte 0x2
...
Disassembly of section .got:
10004008 <_GLOBAL_OFFSET_TABLE_>:
10004008: 0000 .2byte 0x0
1000400a: 0000 .2byte 0x0
1000400c: 4000 .2byte 0x4000
1000400e: 1000 .2byte 0x1000
10004010: 4004 .2byte 0x4004
10004012: 1000 .2byte 0x1000
Depending on how you build it from that assembly language file.
Do I expect llvm to work exactly the same? Nope, I would personally go through the exercises before attempting to use that tool.
In general the toolchain (compiler, assembler, linker) work together, they pretty much have to. The compiler or even assembler will generate what it can with what it sees for that one object, or within one optimization domain. Then the linker does its job which depending on the ISA may modify individual instructions or fill in addresses or offsets in a pool or other to resolve all the externals. segment locations being external as well as they are not known at compile/assemble time. But then you can get into link time optimization or llvm has bytecode optimization between the frontend and backend that you can play with.
You have to know what items have to be pc-relative to each other, and then from that what items can move. .text relative to .data for example, can move the .text and not move the .data or can move both or can move .data without moving the .text, but the distance from .text to .got has to be fixed for some of those situations, but that is under your control.
If this is a bootloader situation then the loaded program is going into ram not some flash/rom and some ram so you can lump it all into one memory space and not have a .got or you can break it up and do the extra work, etc etc.
The concept and construction is similar for other instruction sets too, the specific details may vary, but the tools have to work together generating the right instructions, right EXTRA instructions, or .pool or other so that the linker can patch it all together modifying instructions or pool/table data.
The risc-v documents are about the worst I have seen in my career, the information we need seems to be there, but the organization and ability to find things is dreadful.
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.
This is basically how we do (big) pc relative work in risc-v. The lower bits being zeroed out save having to do that ourselves or the linker having to do extra work with the offset in the following instruction(s). And as with most things you let the tools do the address work, you do not want to be counting instructions/bytes between things. And that address work is sometimes the compiler sometimes the assembler and sometimes the linker or a combination.
(I just did this .got thing yesterday or the day before here, and the tools were combining some data to make fewer entries in the .got which is obviously a good thing, could you imagine a program with a lot of globals or static locals? Position independents already adds enough overhead to the binary/data, but that would be...wow)
I am developing a freestanding application for an ARM Cortex-M microcontroller and while researching the structure of an S-Record file I found that I have some kind of misunderstanding in how the addresses are represented in the S-Record format.
I have a variable defined in my source code like so:
uint32_t g_ip_address = IP_ADDRESS(10, 1, 0, 56); // in LE: 0x3800010A
When I run objdump I see that the variable ends up in the .data section at address 0x1ffe01c4:
$ arm-none-eabi-objdump -t application.elf | grep g_ip_address
1ffe01c4 g O .data 00000004 g_ip_address
This makes sense, given that the memory section of my linker script looks like this and .data is going to RAM:
MEMORY
{
FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 0x0200000 /* 2M */
RAM (rwx) : ORIGIN = 0x1FFE0000, LENGTH = 0x00A0000 /* 640K */
}
However, when I check through the srec file, I'm finding that the address for the record is not 0x1FFE0000. It's 0x0005F570, which seems to put it in the FLASH section (spaces added for clarity).
S315 0005F570 00000000 3800010A 000010180000000014
Is there an implicit offset encoded in a different record entry? How does objcopy get this new address? If this value is being encoded into a function in some way (some pre-main initialization of variables perhaps)?
Ultimately, my goal is to be able to parse the srec file and patch the IP address value to create a new srec file. Is the idiomatic way of doing something like this simply to create a struct that hardcodes some leading magic number sequence that can be detected in the file?
flash.s
.cpu cortex-m0
.thumb
.word 0x00002000
.word reset
.thumb_func
reset:
b reset
.data
.word 0x11223344
.bss
.word 0x00000000
.word 0x00000000
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram AT > rom
.data : { *(.data*) } > ram AT > rom
}
build it
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy --srec-forceS3 so.elf -O srec so.srec
arm-none-eabi-objcopy -O binary so.elf so.bin
cat so.list
08000000 <reset-0x8>:
8000000: 00002000 andeq r2, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <reset>:
8000008: e7fe b.n 8000008 <reset>
Disassembly of section .bss:
20000000 <.bss>:
...
Disassembly of section .data:
20000008 <.data>:
20000008: 11223344 ; <UNDEFINED> instruction: 0x11223344
cat so.srec
S00A0000736F2E7372656338
S30F080000000020000009000008FEE7D2
S3090800000A443322113A
S70508000000F2
arm-none-eabi-readelf -l so.elf
Elf file type is EXEC (Executable file)
Entry point 0x8000000
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000094 0x08000000 0x08000000 0x0000a 0x0000a R E 0x2
LOAD 0x000000 0x20000000 0x0800000a 0x00000 0x00008 RW 0x1
LOAD 0x00009e 0x20000008 0x0800000a 0x00004 0x00004 RW 0x1
Section to Segment mapping:
Segment Sections...
00 .text
01 .bss
02 .data
hexdump -C so.bin
00000000 00 20 00 00 09 00 00 08 fe e7 44 33 22 11 |. ........D3".|
0000000e
bss is not normally exposed as is, you complicate your linker script to add beginning and end points so you can then zero that range in your bootstrap. For .data you can clearly see what is going on with the standard binutils tools.
You have not provided enough of your code (and linker script), nor a minimal example that demonstrates the problem, so this is about as far as this can go.
I'm trying to write a bare metal blink program for a Nucleo-64 Stm32F401re board using C.
However while starting debugging for errors (it didn't blink yet) I found an odd adress for which I found no explanation. This is the output of the relevant part of the disassembly:
blink.elf: file format elf32-littlearm
Disassembly of section .text:
08000000 <isr_vector_table>:
8000000: 20018000 andcs r8, r1, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <Reset_Handler>:
8000008: b480 push {r7}
800000a: af00 add r7, sp, #0
800000c: bf00 nop
800000e: 46bd mov sp, r7
8000010: bc80 pop {r7}
8000012: 4770 bx lr
Disassembly of section .ARM.attributes:
00000000 <.ARM.attributes>:
0: 00002d41 andeq r2, r0, r1, asr #26
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 00000023 andeq r0, r0, r3, lsr #32
10: 2d453705 stclcs 7, cr3, [r5, #-20] ; 0xffffffec
14: 0d06004d stceq 0, cr0, [r6, #-308] ; 0xfffffecc
18: 02094d07 andeq r4, r9, #448 ; 0x1c0
1c: 01140412 tsteq r4, r2, lsl r4
20: 03170115 tsteq r7, #1073741829 ; 0x40000005
24: 01190118 tsteq r9, r8, lsl r1
28: 061e011a ; <UNDEFINED> instruction: 0x061e011a
2c: Address 0x0000002c is out of bounds.
The Reset_Handler function itself is on the right adress but by using its name as pointer in the code it points one adress further! Here is the corresponding code:
extern int _stack_top; // bigger Memory Adress
void Reset_Handler (void);
__attribute__((section(".isr_vector"))) int* isr_vector_table[] = {
(int*)&_stack_top,
(int*)Reset_Handler
};
void Reset_Handler (void) {
}
And the Linker script I used which is basically the same used in most tutorials.
OUTPUT_ARCH(arm)
OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
ENTRY(Reset_Handler)
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
SRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 96K
}
_stack_top = ORIGIN(SRAM)+LENGTH(SRAM);
SECTIONS
{
.text :
{
. = ALIGN(4);
*(.isr_vector)
*(.text*)
*(.glue_7)
*(.glue_7t)
*(.eh_frame)
KEEP(*(.init))
KEEP(*(.fini))
. = ALIGN(4);
_etext = .;
} > FLASH
.rodata :
{
. = ALIGN(4);
*(.rodata*)
. = ALIGN(4);
} > FLASH
.ARM.extab :
{
*(.ARM.extab* .gnu.linkonce.armextab.*)
} >FLASH
.ARM :
{
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} >FLASH
.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} >FLASH
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} >FLASH
.fini_array :
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(.fini_array*))
KEEP (*(SORT(.fini_array.*)))
PROVIDE_HIDDEN (__fini_array_end = .);
} >FLASH
. = ALIGN(4);
_sidata = LOADADDR(.data);
.data :
{
. = ALIGN(4);
_sdata = .;
*(.data*)
. = ALIGN(4);
_edata = .;
} > SRAM AT > FLASH
.bss :
{
. = ALIGN(4);
_sbss = .;
__bss_start__ = _sbss;
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .;
__bss_end__ = _ebss;
} > SRAM
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
So why the adress stored in the isr_vector_table is 08000009 and not 08000008?
The only way I so far could change it to the right value was through hardcoding the value or defining a extra section for the Reset_Handler so I could use the adress as another extern value like the _stack_top.
Here are the commands I used for compilation as I don't know if they are necessary to find an answer:
cd C:/bare_metal
arm-none-eabi-gcc.exe -g main.c -o blink.elf -Wall -T STM32F4.ld -mcpu=cortex-m4 -mthumb --specs=nosys.specs -nostdlib -O0
arm-none-eabi-objdump.exe -D blink.elf
From the Programming Manual PM0214 of STM32F4:
Vector table
The vector table contains the reset value of the stack
pointer, and the start addresses, also called exception vectors, for
all exception handlers. Figure 11 on page 39 shows the order of the
exception vectors in the vector table. The least-significant bit of
each vector must be 1, indicating that the exception handler is Thumb
code.
So, the LSb = 1 indicates that the instruction pointed by that vector is a Thumb instruction. Cortex-M cores support only Thumb instruction set. The compiler knows that, and makes LSb = 1 automatically. If you somehow manage to make it 0, it won't work.
I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.
A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.
I am trying to compile freeRTOS for raspberry pi 2. Those are the commands I tried so far:
arm-none-eabi-gcc -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard test.c -o test.o
arm-none-eabi-as -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard startup.s -o startup.o
arm-none-eabi-ld test.o startup.o -static -Map kernel7.map -o
target.elf -T raspberrypi.ld
The two upper ones do work fine. However the last one doesn't, it gives me the following error:
startup.o: In function _start':
(.init+0x0): multiple definition of_start'
test.o::(.text+0x6c): first defined here
startup.o: In function swi_handler':
(.init+0x28): undefined reference tovPortYieldProcessor'
startup.o: In function irq_handler':
(.init+0x38): undefined reference tovFreeRTOS_ISR'
startup.o: In function zero_loop':
(.init+0xcc): undefined reference torpi_cpu_irq_disable'
This is the corresponding code:
test.c:
#include <stdio.h>
void exit(int code)
{
while(1)
;
}
int main(void)
{
return 0;
}
startup.s:
.extern system_init
.extern __bss_start
.extern __bss_end
.extern vFreeRTOS_ISR
.extern vPortYieldProcessor
.extern rpi_cpu_irq_disable
.extern main
.section .init
.globl _start
;;
_start:
;# All the following instruction should be read as:
;# Load the address at symbol into the program counter.
ldr pc,reset_handler ;# Processor Reset handler -- we will have to force this on the raspi!
;# Because this is the first instruction executed, of cause it causes an immediate branch into reset!
ldr pc,undefined_handler ;# Undefined instruction handler -- processors that don't have thumb can emulate thumb!
ldr pc,swi_handler ;# Software interrupt / TRAP (SVC) -- system SVC handler for switching to kernel mode.
ldr pc,prefetch_handler ;# Prefetch/abort handler.
ldr pc,data_handler ;# Data abort handler/
ldr pc,unused_handler ;# -- Historical from 26-bit addressing ARMs -- was invalid address handler.
ldr pc,irq_handler ;# IRQ handler
ldr pc,fiq_handler ;# Fast interrupt handler.
;# Here we create an exception address table! This means that reset/hang/irq can be absolute addresses
reset_handler: .word reset
undefined_handler: .word undefined_instruction
swi_handler: .word vPortYieldProcessor
prefetch_handler: .word prefetch_abort
data_handler: .word data_abort
unused_handler: .word unused
irq_handler: .word vFreeRTOS_ISR
fiq_handler: .word fiq
reset:
/* Disable IRQ & FIQ */
cpsid if
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continueBoot
overHyped: /* Get out of HYP mode */
ldr r1, =continueBoot
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;# CPSR_MODE_MASK
orr r1, r1, #0x13 ;# CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
continueBoot:
;# In the reset handler, we need to copy our interrupt vector table to 0x0000, its currently at 0x8000
mov r0,#0x8000 ;# Store the source pointer
mov r1,#0x0000 ;# Store the destination pointer.
;# Here we copy the branching instructions
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load multiple values from indexed address. ; Auto-increment R0
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store multiple values from the indexed address. ; Auto-increment R1
;# So the branches get the correct address we also need to copy our vector table!
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load from 4*n of regs (8) as R0 is now incremented.
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store this extra set of data.
;# Set up the various STACK pointers for different CPU modes
;# (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD2
msr cpsr_c,r0
mov sp,#0x8000
;# (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD1
msr cpsr_c,r0
mov sp,#0x4000
;# (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD3
msr cpsr_c,r0
mov sp,#0x8000000
ldr r0, =__bss_start
ldr r1, =__bss_end
mov r2, #0
zero_loop:
cmp r0,r1
it lt
strlt r2,[r0], #4
blt zero_loop
bl rpi_cpu_irq_disable
;# mov sp,#0x1000000
b main ;# We're ready?? Lets start main execution!
.section .text
undefined_instruction:
b undefined_instruction
prefetch_abort:
b prefetch_abort
data_abort:
b data_abort
unused:
b unused
fiq:
b fiq
hang:
b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl dummy
dummy:
bx lr
raspberrypi.ld:
/**
* BlueThunder Linker Script for the raspberry Pi!
*
*
*
**/
MEMORY
{
RESERVED (r) : ORIGIN = 0x00000000, LENGTH = 32K
INIT_RAM (rwx) : ORIGIN = 0x00008000, LENGTH = 32K
RAM (rwx) : ORIGIN = 0x00010000, LENGTH = 128M
}
ENTRY(_start)
SECTIONS {
/*
* Our init section allows us to place the bootstrap code at address 0x8000
*
* This is where the Graphics processor forces the ARM to start execution.
* However the interrupt vector code remains at 0x0000, and so we must copy the correct
* branch instructions to 0x0000 - 0x001C in order to get the processor to handle interrupts.
*
*/
.init : {
KEEP(*(.init))
} > INIT_RAM = 0
.module_entries : {
__module_entries_start = .;
KEEP(*(.module_entries))
KEEP(*(.module_entries.*))
__module_entries_end = .;
__module_entries_size = SIZEOF(.module_entries);
} > INIT_RAM
/**
* This is the main code section, it is essentially of unlimited size. (128Mb).
*
**/
.text : {
*(.text)
} > RAM
/*
* Next we put the data.
*/
.data : {
*(.data)
} > RAM
.bss :
{
__bss_start = .;
*(.bss)
*(.bss.*)
__bss_end = .;
} > RAM
/*
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > RAM
__exidx_end = .;
*/
/**
* Place HEAP here???
**/
PROVIDE(__HEAP_START = __bss_end );
/**
* Stack starts at the top of the RAM, and moves down!
**/
_estack = ORIGIN(RAM) + LENGTH(RAM);
}
As you can see test.c doesn't contain an entry point called _start, neither does it have one in its assembly compiled form. Only startup.s does.
Any idea's about how I could solve my current issue?
EDIT: all the code if needed used can be found here:https://github.com/jameswalmsley/RaspberryPi-FreeRTOS