How can instructions such as the RISC-V auipc work when the FW image is to be placed at some random address? - linker

The RISC-V instruction auipc does rd = (imm << 12) + PC, being rd the destination register and imm a 12 bit signed immediate.
The result of the above instruction will vary depending on at which address is the binary running. Let's suppose a system uses a bootloader to boot a firmware image. In that case, the initial PC for the firmware image will be different from 0x0. This fact will be reflected in the linker script by doing something like:
.text :
{
_text = .;
*(.text)
_etext = .;
} > FW_IMG
being FW_IMG something like:
FW_IMG (rx): ORIGIN = 2048, LENGTH = 2304
My question is, how can this work?
I mean, let's suppose a 32 bit CPU, and that the 4th instruction the compiler generates
is an auipc. Let's supposed that the FW image is to be placed at address 0x200000000, then, the PC will be 0x20000000 + 16 (4th instruction). Will the compiler be aware of this so it generates the right values etc. for the above auipc instruction?
EDIT
A good example of this is la. la is a pseudo-instruction that will be expanded to an auipc and an addi. If the compiler generates code to load a symbol, depending on where the image is to be located at runtime, the generated instructions will be different.
EDIT 2
I have tried to build the same image with 2 different linker scripts, completely different one from the other, and having that the first instruction is an la. The generated auipc instructions are indeed different in each cases, and they calculate the right address.
The only explanation I find to this is that, somehow, the assembler generates auipc 'placeholders' and then the linker fills them with the right values.

Let us ask the toolchain.
so.c
unsigned int x;
unsigned int y=5;
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int a )
{
x=a+y;
return(more_fun(x)+3);
}
start.s
.globl more_fun
more_fun:
j .
so.ld
MEMORY
{
mem0 : ORIGIN = 0x00003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x00004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x00005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x00006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
.got : { *(.got*) } > mem0
}
no reason at this time for this to be an actually functioning program.
position dependent
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 6799 lui x15,0x6
3006: 0007a783 lw x15,0(x15) # 6000 <y>
300a: 1141 addi x2,x2,-16
300c: c606 sw x1,12(x2)
300e: 953e add x10,x10,x15
3010: 6795 lui x15,0x5
3012: 00a7a023 sw x10,0(x15) # 5000 <x>
3016: 37ed jal 3000 <more_fun>
3018: 40b2 lw x1,12(x2)
301a: 050d addi x10,x10,3
301c: 0141 addi x2,x2,16
301e: 8082 ret
Disassembly of section .sbss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .sdata:
00006000 <y>:
6000: 0005 c.nop 1
position independent
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 00000797 auipc x15,0x0
3008: 02c7a783 lw x15,44(x15) # 3030 <_GLOBAL_OFFSET_TABLE_+0x8>
300c: 439c lw x15,0(x15)
300e: 1141 addi x2,x2,-16
3010: c606 sw x1,12(x2)
3012: 953e add x10,x10,x15
3014: 00000797 auipc x15,0x0
3018: 0187a783 lw x15,24(x15) # 302c <_GLOBAL_OFFSET_TABLE_+0x4>
301c: c388 sw x10,0(x15)
301e: 37cd jal 3000 <more_fun>
3020: 40b2 lw x1,12(x2)
3022: 050d addi x10,x10,3
3024: 0141 addi x2,x2,16
3026: 8082 ret
Disassembly of section .bss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .data:
00006000 <y>:
6000: 0005 c.nop 1
...
Disassembly of section .got:
00003028 <_GLOBAL_OFFSET_TABLE_>:
3028: 0000 unimp
302a: 0000 unimp
302c: 5000 lw x8,32(x8)
302e: 0000 unimp
3030: 6000 flw f8,0(x8)
3032: 0000 unimp
3034: ffff .2byte 0xffff
3036: ffff .2byte 0xffff
3038: 0000 unimp
...
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.
In this case I put the got in the same section. So no major adjustment needed here. Get to the got, use the got to get to the data.
MEMORY
{
mem0 : ORIGIN = 0x00003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x00004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x00005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x00006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
}
Disassembly of section .text:
00003000 <more_fun>:
3000: 0000006f j 3000 <more_fun>
00003004 <fun>:
3004: 00003797 auipc x15,0x3
3008: 0087a783 lw x15,8(x15) # 600c <_GLOBAL_OFFSET_TABLE_+0x8>
300c: 439c lw x15,0(x15)
300e: 1141 addi x2,x2,-16
3010: c606 sw x1,12(x2)
3012: 953e add x10,x10,x15
3014: 00003797 auipc x15,0x3
3018: ff47a783 lw x15,-12(x15) # 6008 <_GLOBAL_OFFSET_TABLE_+0x4>
301c: c388 sw x10,0(x15)
301e: 37cd jal 3000 <more_fun>
3020: 40b2 lw x1,12(x2)
3022: 050d addi x10,x10,3
3024: 0141 addi x2,x2,16
3026: 8082 ret
Disassembly of section .bss:
00005000 <x>:
5000: 0000 unimp
...
Disassembly of section .data:
00006000 <y>:
6000: 0005 c.nop 1
...
Disassembly of section .got:
00006004 <_GLOBAL_OFFSET_TABLE_>:
6004: 0000 unimp
6006: 0000 unimp
6008: 5000 lw x8,32(x8)
600a: 0000 unimp
600c: 6000 flw f8,0(x8)
...
It tacked it on to .data if not specified apparently. But it is all good. You add 0x3000 to 0x3000 to get to 0x6000.
The call to more_fun is a pc-relative offset.
The jump and link (JAL) instruction uses the J-type format, where the J-immediate encodes a signed offset in multiples of 2 bytes. The offset is sign-extended and added to the address of the jump instruction to form the jump target address. Jumps can therefore target a ±1 MiB range. JAL stores the address of the instruction following the jump (pc+4) into register rd. The standard software calling convention uses x1 as the return address register and x5 as an alternate link register.
So until the program gets very big (or you play linker games to make function calls far apart) that all works.
Here is the thing about position independence...Think of it as the binary is a blob. If you load the binary above at 0x3000 then .data is at 0x6000, 0x3000 bytes away. But if you load at 0x20003000 then .data is at 0x20006000, which is still 0x3000 bytes away.
But, you have to update the got
600c: 0x20006000
But that is the whole point. You isolate the address of every global (or group of them) and put it in an table. Then if you want to relocate the program elsewhere you or the loader of the program has to find and change the entries in the got. In this case add 0x20000000 to all of them. Then the code all works.
In a bootloader situation where you are probably not an operating system parsing an elf file.
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem0
.bss : { *(.bss*) } > mem0
.data : { *(.data*) } > mem0
}
Disassembly of section .text:
00000000 <more_fun>:
0: 0000006f j 0 <more_fun>
00000004 <fun>:
4: 00000797 auipc x15,0x0
8: 0347a783 lw x15,52(x15) # 38 <_GLOBAL_OFFSET_TABLE_+0x8>
c: 439c lw x15,0(x15)
e: 1141 addi x2,x2,-16
10: c606 sw x1,12(x2)
12: 953e add x10,x10,x15
14: 00000797 auipc x15,0x0
18: 0207a783 lw x15,32(x15) # 34 <_GLOBAL_OFFSET_TABLE_+0x4>
1c: c388 sw x10,0(x15)
1e: 37cd jal 0 <more_fun>
20: 40b2 lw x1,12(x2)
22: 050d addi x10,x10,3
24: 0141 addi x2,x2,16
26: 8082 ret
Disassembly of section .bss:
00000028 <x>:
28: 0000 unimp
...
Disassembly of section .data:
0000002c <y>:
2c: 0005 c.nop 1
...
Disassembly of section .got:
00000030 <_GLOBAL_OFFSET_TABLE_>:
30: 0000 unimp
32: 0000 unimp
34: 0028 addi x10,x2,8
36: 0000 unimp
38: 002c addi x11,x2,8
...
In your bootstrap you would auipc x15,0 to get the pc then you would use normal (linker plus programming) techniques to get the offset to and size of the got. And you would make the adjustment to each entry yourself before running code that relies on the .got to find the data.
Could the toolchain do this without a got?
Sure, but...
mem0 : ORIGIN = 0x10000000, LENGTH = 0x1000
Disassembly of section .text:
10000000 <more_fun>:
10000000: 0000006f j 10000000 <more_fun>
10000004 <fun>:
10000004: 00000797 auipc x15,0x0
10000008: 0287a783 lw x15,40(x15) # 1000002c <y>
1000000c: 97aa add x15,x15,x10
1000000e: 1141 addi x2,x2,-16
10000010: 853e mv x10,x15
10000012: c606 sw x1,12(x2)
10000014: 00000717 auipc x14,0x0
10000018: 00f72a23 sw x15,20(x14) # 10000028 <x>
1000001c: 37d5 jal 10000000 <more_fun>
1000001e: 40b2 lw x1,12(x2)
10000020: 050d addi x10,x10,3
10000022: 0141 addi x2,x2,16
10000024: 8082 ret
Disassembly of section .bss:
10000028 <x>:
10000028: 0000 unimp
...
Disassembly of section .data:
1000002c <y>:
1000002c: 0005 c.nop 1
...
this
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
created an optimization I did not want.
Disassembly of section .text:
00000000 <more_fun>:
0: 0000006f j 0 <more_fun>
00000004 <fun>:
4: 02402783 lw x15,36(x0) # 24 <y>
8: 97aa add x15,x15,x10
a: 1141 addi x2,x2,-16
c: 853e mv x10,x15
e: c606 sw x1,12(x2)
10: 02f02023 sw x15,32(x0) # 20 <x>
14: 37f5 jal 0 <more_fun>
16: 40b2 lw x1,12(x2)
18: 050d addi x10,x10,3
1a: 0141 addi x2,x2,16
1c: 8082 ret
Disassembly of section .bss:
00000020 <x>:
20: 0000 unimp
...
Disassembly of section .data:
00000024 <y>:
24: 0005 c.nop 1
...
I wanted this position independence
10000004: 00000797 auipc x15,0x0
10000008: 0287a783 lw x15,40(x15) # 1000002c <y>
but despite asking for position independence I got this which is position dependent.
4: 02402783 lw x15,36(x0) # 24 <y>
fpic vs fpie. You probably want the fpie to make life much easier but as shown you need to know the tools. The tools know how to do it but we seem to be able to trip them up.
This one bothered me and delayed even writing this answer.
MEMORY
{
mem0 : ORIGIN = 0x10003000, LENGTH = 0x1000
mem1 : ORIGIN = 0x20004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x30005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x40006000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem1
.bss : { *(.bss*) } > mem2
.data : { *(.data*) } > mem3
}
Disassembly of section .text:
10003000 <more_fun>:
10003000: 0000006f j 10003000 <more_fun>
10003004 <fun>:
10003004: 30003797 auipc x15,0x30003
10003008: 0087a783 lw x15,8(x15) # 4000600c <_GLOBAL_OFFSET_TABLE_+0x8>
1000300c: 439c lw x15,0(x15)
1000300e: 1141 addi x2,x2,-16
10003010: c606 sw x1,12(x2)
10003012: 953e add x10,x10,x15
10003014: 30003797 auipc x15,0x30003
10003018: ff47a783 lw x15,-12(x15) # 40006008 <_GLOBAL_OFFSET_TABLE_+0x4>
1000301c: c388 sw x10,0(x15)
1000301e: 37cd jal 10003000 <more_fun>
10003020: 40b2 lw x1,12(x2)
10003022: 050d addi x10,x10,3
10003024: 0141 addi x2,x2,16
10003026: 8082 ret
Disassembly of section .bss:
30005000 <x>:
30005000: 0000 unimp
...
Disassembly of section .data:
40006000 <y>:
40006000: 0005 c.nop 1
...
Disassembly of section .got:
40006004 <_GLOBAL_OFFSET_TABLE_>:
40006004: 0000 unimp
40006006: 0000 unimp
40006008: 5000 lw x8,32(x8)
4000600a: 3000 fld f8,32(x8)
4000600c: 6000 flw f8,0(x8)
4000600e: 4000 lw x8,0(x8)
LOL I thought this was completely broken, but now I see....Because I used the disassembler it broke it into 16 bit values so it is actually going to 0x40006000 and 0x30005000...whew
And just to confirm:
.section .mfun
.globl more_fun
more_fun:
j .
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
mem1 : ORIGIN = 0x20004000, LENGTH = 0x1000
mem2 : ORIGIN = 0x30005000, LENGTH = 0x1000
mem3 : ORIGIN = 0x40006000, LENGTH = 0x1000
mem4 : ORIGIN = 0x10000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.rodata : { *(.rodata*) } > mem0
.bss : { *(.bss*) } > mem0
.data : { *(.data*) } > mem0
.mfun : { *(.mfun*) } > mem4
}
Disassembly of section .text:
00000000 <fun>:
0: 02402783 lw x15,36(x0) # 24 <y>
4: 97aa add x15,x15,x10
6: 1141 addi x2,x2,-16
8: 853e mv x10,x15
a: c606 sw x1,12(x2)
c: 02f02023 sw x15,32(x0) # 20 <x>
10: 10000097 auipc x1,0x10000
14: ff0080e7 jalr -16(x1) # 10000000 <more_fun>
18: 40b2 lw x1,12(x2)
1a: 050d addi x10,x10,3
1c: 0141 addi x2,x2,16
1e: 8082 ret
Disassembly of section .bss:
00000020 <x>:
20: 0000 unimp
...
Disassembly of section .data:
00000024 <y>:
24: 0005 c.nop 1
...
Disassembly of section .mfun:
10000000 <more_fun>:
10000000: 0000006f j 10000000 <more_fun>
for fpie that works fine...and fpic does not change it based on different assumptions.
la x5,hello
la x6,world
.data
hello: .word 0x1
world: .word 0x2
MEMORY
{
mem0 : ORIGIN = 0x00000000, LENGTH = 0x1000
mem1 : ORIGIN = 0x10004000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > mem0
.data : { *(.data*) } > mem1
}
Disassembly of section .text:
00000000 <.text>:
0: 10004297 auipc x5,0x10004
4: 00028293 mv x5,x5
8: 10004317 auipc x6,0x10004
c: ffc30313 addi x6,x6,-4 # 10004004 <world>
Disassembly of section .data:
10004000 <hello>:
10004000: 0001 .2byte 0x1
...
10004004 <world>:
10004004: 0002 .2byte 0x2
...
or
Disassembly of section .text:
00000000 <.text>:
0: 10004297 auipc x5,0x10004
4: 00c2a283 lw x5,12(x5) # 1000400c <_GLOBAL_OFFSET_TABLE_+0x4>
8: 10004317 auipc x6,0x10004
c: 00832303 lw x6,8(x6) # 10004010 <_GLOBAL_OFFSET_TABLE_+0x8>
Disassembly of section .data:
10004000 <hello>:
10004000: 0001 .2byte 0x1
...
10004004 <world>:
10004004: 0002 .2byte 0x2
...
Disassembly of section .got:
10004008 <_GLOBAL_OFFSET_TABLE_>:
10004008: 0000 .2byte 0x0
1000400a: 0000 .2byte 0x0
1000400c: 4000 .2byte 0x4000
1000400e: 1000 .2byte 0x1000
10004010: 4004 .2byte 0x4004
10004012: 1000 .2byte 0x1000
Depending on how you build it from that assembly language file.
Do I expect llvm to work exactly the same? Nope, I would personally go through the exercises before attempting to use that tool.
In general the toolchain (compiler, assembler, linker) work together, they pretty much have to. The compiler or even assembler will generate what it can with what it sees for that one object, or within one optimization domain. Then the linker does its job which depending on the ISA may modify individual instructions or fill in addresses or offsets in a pool or other to resolve all the externals. segment locations being external as well as they are not known at compile/assemble time. But then you can get into link time optimization or llvm has bytecode optimization between the frontend and backend that you can play with.
You have to know what items have to be pc-relative to each other, and then from that what items can move. .text relative to .data for example, can move the .text and not move the .data or can move both or can move .data without moving the .text, but the distance from .text to .got has to be fixed for some of those situations, but that is under your control.
If this is a bootloader situation then the loaded program is going into ram not some flash/rom and some ram so you can lump it all into one memory space and not have a .got or you can break it up and do the extra work, etc etc.
The concept and construction is similar for other instruction sets too, the specific details may vary, but the tools have to work together generating the right instructions, right EXTRA instructions, or .pool or other so that the linker can patch it all together modifying instructions or pool/table data.
The risc-v documents are about the worst I have seen in my career, the information we need seems to be there, but the organization and ability to find things is dreadful.
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.
This is basically how we do (big) pc relative work in risc-v. The lower bits being zeroed out save having to do that ourselves or the linker having to do extra work with the offset in the following instruction(s). And as with most things you let the tools do the address work, you do not want to be counting instructions/bytes between things. And that address work is sometimes the compiler sometimes the assembler and sometimes the linker or a combination.
(I just did this .got thing yesterday or the day before here, and the tools were combining some data to make fewer entries in the .got which is obviously a good thing, could you imagine a program with a lot of globals or static locals? Position independents already adds enough overhead to the binary/data, but that would be...wow)

Related

Can't run a no-op function in qemu

I am using xpack qemu arm which is a fork of qemu with support for STM32 boards.
I am trying to run a simple program to get myself started.
I have my linker script
ENTRY(Reset_Handler)
MEMORY
{
FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 0x08000000
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 0x20000000
}
SECTIONS
{
. = ORIGIN(FLASH);
.text :
{
LONG(ORIGIN(RAM) + LENGTH(RAM)) /* set the SP initial value */
LONG(Reset_Handler) /* set the PC initial value */
*(.text)
}
}
my assembly file
.section .text
.global Reset_Handler
Reset_Handler:
BL main
BL .
and a c function, main
void main () {
return;
}
When I assemble, compile, and link, the generated memory contents are
00000000 <main-0x8>:
0: 40000000 .word 0x40000000
4: 00000020 .word 0x00000020
00000008 <main>:
void main () {
8: e52db004 push {fp} ; (str fp, [sp, #-4]!)
c: e28db000 add fp, sp, #0
return;
10: e1a00000 nop ; (mov r0, r0)
14: e24bd000 sub sp, fp, #0
18: e49db004 pop {fp} ; (ldr fp, [sp], #4)
1c: e12fff1e bx lr
00000020 <Reset_Handler>:
.section .text
.global Reset_Handler
Reset_Handler:
BL main
20: ebfffff8 bl 8 <main>
BL .
24: ebfffffe bl 24 <Reset_Handler+0x4>
I am using a STM32F407VG MCU, the docs state that
After this startup delay is over, the CPU fetches the top-of-stack value from address
0x0000 0000, then starts code execution from the boot memory starting from 0x0000 0004.
Thus, I store the initial value of the stack pointer 0x40000000 in memory location 0x00000000 and the initial value of the program counter in memory location 0x00000004
I start qemu like so
qemu-system-gnuarmeclipse -mcu STM32F407VG -machine STM32F4-Discovery -image myfile.elf -nographic --verbose --verbose -no-reboot -S
And I can see that the SP and PC registers (R13 and R15, respectively) are set to the expected values:
R00=00000000 R01=00000000 R02=00000000 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=40000000 R14=00000000 R15=00000020
PSR=40000153 -Z-- A svc32
FPSCR: 00000000
So, following the memory mapping output, the program should flow like so:
PC is set to 0x20, which runs BL 8 <main>
This branches to memory location 0x8, which is the start of the main function, it also saves the return address in LR
This function should perform a no-op, with pushing and popping FP to/from the stack
the function should return to the address of LR (which was previously saved)
The next instruction should loop forever (24: ebfffffe bl 24 <Reset_Handler+0x4>)
However, I run this, and I get the following error:
(qemu) Bad ram pointer 0x4
I am a little lost on what this error means. Am I missing something in my setup?
ORIGIN = 0x00000000
The memory is aliased to 0 by the hardware but the real address is not zero,
You linker script has to the use corrent FLASH address not boot time alias.
0x8000000
I would suggest to use stm provided linker scripts as you not exactly understand the documentation of the chip.

Where do addresses in S-Record files come from?

I am developing a freestanding application for an ARM Cortex-M microcontroller and while researching the structure of an S-Record file I found that I have some kind of misunderstanding in how the addresses are represented in the S-Record format.
I have a variable defined in my source code like so:
uint32_t g_ip_address = IP_ADDRESS(10, 1, 0, 56); // in LE: 0x3800010A
When I run objdump I see that the variable ends up in the .data section at address 0x1ffe01c4:
$ arm-none-eabi-objdump -t application.elf | grep g_ip_address
1ffe01c4 g O .data 00000004 g_ip_address
This makes sense, given that the memory section of my linker script looks like this and .data is going to RAM:
MEMORY
{
FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 0x0200000 /* 2M */
RAM (rwx) : ORIGIN = 0x1FFE0000, LENGTH = 0x00A0000 /* 640K */
}
However, when I check through the srec file, I'm finding that the address for the record is not 0x1FFE0000. It's 0x0005F570, which seems to put it in the FLASH section (spaces added for clarity).
S315 0005F570 00000000 3800010A 000010180000000014
Is there an implicit offset encoded in a different record entry? How does objcopy get this new address? If this value is being encoded into a function in some way (some pre-main initialization of variables perhaps)?
Ultimately, my goal is to be able to parse the srec file and patch the IP address value to create a new srec file. Is the idiomatic way of doing something like this simply to create a struct that hardcodes some leading magic number sequence that can be detected in the file?
flash.s
.cpu cortex-m0
.thumb
.word 0x00002000
.word reset
.thumb_func
reset:
b reset
.data
.word 0x11223344
.bss
.word 0x00000000
.word 0x00000000
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram AT > rom
.data : { *(.data*) } > ram AT > rom
}
build it
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy --srec-forceS3 so.elf -O srec so.srec
arm-none-eabi-objcopy -O binary so.elf so.bin
cat so.list
08000000 <reset-0x8>:
8000000: 00002000 andeq r2, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <reset>:
8000008: e7fe b.n 8000008 <reset>
Disassembly of section .bss:
20000000 <.bss>:
...
Disassembly of section .data:
20000008 <.data>:
20000008: 11223344 ; <UNDEFINED> instruction: 0x11223344
cat so.srec
S00A0000736F2E7372656338
S30F080000000020000009000008FEE7D2
S3090800000A443322113A
S70508000000F2
arm-none-eabi-readelf -l so.elf
Elf file type is EXEC (Executable file)
Entry point 0x8000000
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000094 0x08000000 0x08000000 0x0000a 0x0000a R E 0x2
LOAD 0x000000 0x20000000 0x0800000a 0x00000 0x00008 RW 0x1
LOAD 0x00009e 0x20000008 0x0800000a 0x00004 0x00004 RW 0x1
Section to Segment mapping:
Segment Sections...
00 .text
01 .bss
02 .data
hexdump -C so.bin
00000000 00 20 00 00 09 00 00 08 fe e7 44 33 22 11 |. ........D3".|
0000000e
bss is not normally exposed as is, you complicate your linker script to add beginning and end points so you can then zero that range in your bootstrap. For .data you can clearly see what is going on with the standard binutils tools.
You have not provided enough of your code (and linker script), nor a minimal example that demonstrates the problem, so this is about as far as this can go.

MIPS ASSEMBLY How to use if's and else's translation from C to MIPS Assembly

So i have this code in C that i have to translate into assembly
int a[10]={0,1,2,3,4,5,6,7,8,9};
int i, j, k;
i = 1;
goto abc;
def:
j = 1;
k = 4;
goto ghi;
i = 2;
abc:
goto def;
ghi:
if (i==j){
a[2] = a[3];
}else{
a[2] = a[4];
}
while(k>0){
a[k] = 7;
k = k - 1;
}
if((i>k) && (i<10)){
if((k==6) || (j>=i)){
a[9] = 400;
}else{
a[9] = 500;
}
}
switch(j){
case 0: a[6] = 4; break;
case 1: a[6] = 5; break;
case 2: a[6] = 6; break;
case 3: a[6] = 7; break;
}
I have been able to turn only the goto parts into assembly because i don't know how to turn the if's else's and switch commands into MIPS assembly
Here what i have done so far
.text
main:
li $t0, 1
lw $t0, variableI ## i = 1
j abc ## goto abc
def:
li $t0, 1
lw $t0, variableJ ## j=1
li $t0, 4
lw $t0, variableK ## k=4
j ghi ## goto ghi
li $t0, 2
lw $t0, variableI ## i = 2
abc:
j def ## goto def
ghi:
.data
variableI: .word
variableJ: .word
variableK: .word
vetorA: .word 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
It isn't necessary to translate all code for me just need a good explanation of how to use if's and else's since i haven't found any really good explanation online.
The instruction set manual is online that will cover all of it. As well as compilers.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c )
{
unsigned int r;
r = 0;
if(a==5)
{
r=6;
}
return(r);
}
mips-elf-gcc -O2 -c -fno-delayed-branch so.c -o so.o
mips-elf-objdump -d so.o
so.o: file format elf32-bigmips
Disassembly of section .text:
00000000 <fun>:
0: 24020005 li $2,5
4: 10820004 beq $4,$2,18 <fun+0x18>
8: 00000000 nop
c: 00001025 move $2,$0
10: 03e00008 jr $31
14: 00000000 nop
18: 24020006 li $2,6
1c: 03e00008 jr $31
20: 00000000 nop
equals and not equals are easy, bne, beq...
it is often common for the compiler to generate the opposite something like this in this case
return reg = 0;
compare input reg with 5
if NOT equal then branch to skip
return reg = 6
skip:
return
But not always, gcc did this
0: 24020005 li $2,5
4: 10820004 beq $4,$2,18 <fun+0x18>
8: 00000000 nop
r2 is the return register load it with a 5 as a temporary register for now in order to do the comparision. compare the input reg (r4) with 5 if equal then branch to address 18. If not equal keep going.
If not equal then
c: 00001025 move $2,$0
10: 03e00008 jr $31
14: 00000000 nop
put a 0 in the return register r2 and return. this path is the equivalent of
r = 0;
return(r);
If it is equal to 5 then
18: 24020006 li $2,6
1c: 03e00008 jr $31
20: 00000000 nop
put the 6 in r2 and return
For this example C code the compiler is very brute force about the solution, other than not incurring more branches to have a single exit point (not-atypical).
unsigned int fun ( unsigned int a, unsigned int b )
{
unsigned int r;
r = a;
switch(b)
{
case 5: r+=3; break;
case 7: r^=3; break;
}
return(r);
}
sometimes you will see the tools generate a jump table, depends on the architecture and other factors, in this case that is not going to happen but if you had switch(b&3) and then the four possible cases, I tried and gcc did not make a jump table.
But what is a switch really? in this case and in all cases it is no more than if (argument in the parens, in this case just b by itself) b is 5 else of b is 7 then. A switch is an if-then-else tree and often that is all a compiler will bother to implement.
00000000 <fun>:
0: 24030005 li $3,5
4: 00801025 move $2,$4
8: 10a30009 beq $5,$3,30 <fun+0x30>
c: 00000000 nop
10: 24030007 li $3,7
14: 14a30004 bne $5,$3,28 <fun+0x28>
18: 00000000 nop
1c: 38820003 xori $2,$4,0x3
20: 03e00008 jr $31
24: 00000000 nop
28: 03e00008 jr $31
2c: 00000000 nop
30: 24820003 addiu $2,$4,3
34: 03e00008 jr $31
38: 00000000 nop
r = a
4: 00801025 move $2,$4
if b == 5 then
0: 24030005 li $3,5
8: 10a30009 beq $5,$3,30 <fun+0x30>
c: 00000000 nop
r+=3 and return
30: 24820003 addiu $2,$4,3
34: 03e00008 jr $31
38: 00000000 nop
(else) if b==7
10: 24030007 li $3,7
14: 14a30004 bne $5,$3,28 <fun+0x28>
18: 00000000 nop
r ^=3 and return
1c: 38820003 xori $2,$4,0x3
20: 03e00008 jr $31
24: 00000000 nop
else return (r was set to a up front)
28: 03e00008 jr $31
2c: 00000000 nop
so it is a simple if-then-else tree with one non brute force out of order kind of thing the r = a was placed in the middle of the if b == 5 were mixed together.
unsigned int fun ( unsigned int a )
{
unsigned int r;
r = 0;
if(a<5)
{
r = 3;
}
return(r);
}
00000000 <fun>:
0: 2c840005 sltiu $4,$4,5
4: 14800004 bnez $4,18 <fun+0x18>
8: 00000000 nop
c: 00001025 move $2,$0
10: 03e00008 jr $31
14: 00000000 nop
18: 24020003 li $2,3
1c: 03e00008 jr $31
20: 00000000 nop
because of how mips works or at least the mips I have compiled for
set if less than unsigned
0: 2c840005 sltiu $4,$4,5
since the way the C was written we can actually discard the a variable with the if less than question. So if r4 (a) is less than 5 then "set" r4 (non-zero)
so now less than becomes an equal or not equal question
if(r4 is not equal to zero) combined with the above means if a is less than five then
4: 14800004 bnez $4,18 <fun+0x18>
8: 00000000 nop
return 3
18: 24020003 li $2,3
1c: 03e00008 jr $31
20: 00000000 nop
else return 0
c: 00001025 move $2,$0
10: 03e00008 jr $31
14: 00000000 nop
Mips doesn't like/use flags like most other instruction sets. Most others you would have a compare instruction which does a subtract and sets a bunch of flags but does not store the subtraction to a register, then subsequent instructions will use those flags. mips uses a philosophy of no flags so do the comparison and act on it in the same instruction. So for example another architecture:
00000000 <fun>:
0: e3500004 cmp r0, #4
4: 93a00003 movls r0, #3
8: 83a00000 movhi r0, #0
c: e12fff1e bx lr
compare a with 4 then if less than or same r = 3 if higher r = 0 and return. This is in a way also very atypical in that this instruction set shown here (arm) allows for per-instruction conditional execution rather than only branches are conditional and mov and add and such are not.
And this is more typical
00000000 <_fun>:
0: 0a00 clr r0
2: 2d97 0002 0004 cmp 2(sp), $4
8: 8301 blos c <_fun+0xc>
a: 0087 rts pc
c: 15c0 0003 mov $3, r0
10: 0087 rts pc
and this is almost brute force one to one
r = 0
0: 0a00 clr r0
compare a (passed in on the stack) with 4
2: 2d97 0002 0004 cmp 2(sp), $4
branch if lower or same to address c:
8: 8301 blos c <_fun+0xc>
if 5 or higher then return (with r = 0)
a: 0087 rts pc
if less than 5 (less than or equal to 4) then return 3
c: 15c0 0003 mov $3, r0
10: 0087 rts pc
(destination is on the right here mov $3,r0 means r0 = 3. $ means constant here)
Never thought I would use this knowledge again! Hope this helps.
It can help to look at a reference sheet and understand what the instructions are doing. Generally if and else can be simulated with "branch if equal" (beq) or "branch if not equal" (bne). Take a look at the below reference for those instructions.
Also, the while and switch use the same instruction - just think about how the code should flow.
Check out this reference for some more details on how to use the instructions, and others that I haven't mentioned. I will mention that most of the instructions should use registers, loading/saving words, branching, basic jumping, math, etc.
https://inst.eecs.berkeley.edu/~cs61c/resources/MIPS_help.html

C function from main is not pushing on stack in arm

I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.
A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.

Floating point variable's value missing in data memory

I am a master student and currently doing my summer final project, which is about design a MIPS processor with a FPU and implement in a FPGA.
The instructions that I'm going to implement are depended on the cross-compiler I'm using. So, from a hardware designer point of view, I started the project by first looking the instructions can be generated from the compiler.
For integer design(the main core design), I wrote some C codes, here for example, a simple one:
int main ()
{
int a,b,c;
a=1;
b=2;
c=a+2;
}
A simple addition, the compiler gives assembly codes: (I just posted the assembly codes in main, because I did not plan to run a operating system on my MIPS)
00400168 <main>:
400168: 27bdffe8 addiu sp,sp,-24
40016c: afbe0010 sw s8,16(sp)
400170: 03a0f021 move s8,sp
400174: 24020001 li v0,1
400178: afc20008 sw v0,8(s8)
40017c: 24020002 li v0,2
400180: afc20004 sw v0,4(s8)
400184: 8fc20008 lw v0,8(s8)
400188: 00000000 nop
40018c: 20420002 addi v0,v0,2
400190: afc20000 sw v0,0(s8)
400194: 03c0e821 move sp,s8
400198: 8fbe0010 lw s8,16(sp)
40019c: 27bd0018 addiu sp,sp,24
4001a0: 03e00008 jr ra
I like to understand the assembly code, that can helps me more understand MIPS architecture, and based on the instructions order I can design a hazard detection unit based on the compiler.
We can see from those 4 instruction:
400174: 24020001 li v0,1
400178: afc20008 sw v0,8(s8)
40017c: 24020002 li v0,2
400180: afc20004 sw v0,4(s8)
The compiler loads 1, 2 into variable a, b.
For the integer assembly code, I can understand no problems.
Ok, lets go to the floating point unit, as the same, I wrote a very similar C :
Floating point testing C code
void main ()
{
float a,b,c;
a=1;
b=2;
c=a+b;
}
Now the assembly codes are much different:
00400168 <main>:
400168: 27bdffe8 addiu sp,sp,-24
40016c: afbe0010 sw s8,16(sp)
400170: 03a0f021 move s8,sp
400174: c7808004 lwc1 $f0,-32764(gp)
400178: 00000000 nop
40017c: e7c00008 swc1 $f0,8(s8)
400180: c7808008 lwc1 $f0,-32760(gp)
400184: 00000000 nop
400188: e7c00004 swc1 $f0,4(s8)
40018c: c7c20008 lwc1 $f2,8(s8)
400190: c7c00004 lwc1 $f0,4(s8)
400194: 00000000 nop
400198: 46001000 add.s $f0,$f2,$f0
40019c: e7c00000 swc1 $f0,0(s8)
4001a0: 03c0e821 move sp,s8
4001a4: 8fbe0010 lw s8,16(sp)
4001a8: 27bd0018 addiu sp,sp,24
4001ac: 03e00008 jr ra
4001b0: 00000000 nop
Doesn't like pervious code, those 6 instructions looks like the program load the variable's value from data memory instead using instruction li:
400174: c7808004 lwc1 $f0,-32764(gp)
400178: 00000000 nop
40017c: e7c00008 swc1 $f0,8(s8)
400180: c7808008 lwc1 $f0,-32760(gp)
400184: 00000000 nop
400188: e7c00004 swc1 $f0,4(s8)
Here comes the problem, I just can not figure out what is value stored in -32764(gp) and f0,-32760(gp), because there are not any SW instructions that try to store data into those address.
Here is the fully assembly code generated by compiler:
floatadd: file format elf32-bigmips
Disassembly of section .init:
00400018 <_init>:
400018: 27bdffe0 addiu sp,sp,-32
40001c: afbf0014 sw ra,20(sp)
400020: 0c10003a jal 4000e8 <frame_dummy>
400024: 00000000 nop
400028: 0c10006d jal 4001b4 <__do_global_ctors_aux>
40002c: 00000000 nop
400030: 8fbf0014 lw ra,20(sp)
400034: 27bd0020 addiu sp,sp,32
400038: 03e00008 jr ra
40003c: 00000000 nop
Disassembly of section .text:
00400040 <_ftext>:
400040: 27bdffe0 addiu sp,sp,-32
400044: afb10014 sw s1,20(sp)
400048: 3c110040 lui s1,0x40
40004c: 9222126c lbu v0,4716(s1)
400050: afbf0018 sw ra,24(sp)
400054: 14400019 bnez v0,4000bc <_ftext+0x7c>
400058: afb00010 sw s0,16(sp)
40005c: 3c100040 lui s0,0x40
400060: 8e021260 lw v0,4704(s0)
400064: 00000000 nop
400068: 8c430000 lw v1,0(v0)
40006c: 00000000 nop
400070: 10600009 beqz v1,400098 <_ftext+0x58>
400074: 24420004 addiu v0,v0,4
400078: 0060f809 jalr v1
40007c: ae021260 sw v0,4704(s0)
400080: 8e021260 lw v0,4704(s0)
400084: 00000000 nop
400088: 8c430000 lw v1,0(v0)
40008c: 00000000 nop
400090: 1460fff9 bnez v1,400078 <_ftext+0x38>
400094: 24420004 addiu v0,v0,4
400098: 3c020000 lui v0,0x0
40009c: 24420000 addiu v0,v0,0
4000a0: 10400005 beqz v0,4000b8 <_ftext+0x78>
4000a4: 24020001 li v0,1
4000a8: 3c040040 lui a0,0x40
4000ac: 0c000000 jal 0 <_init-0x400018>
4000b0: 24840244 addiu a0,a0,580
4000b4: 24020001 li v0,1
4000b8: a222126c sb v0,4716(s1)
4000bc: 8fbf0018 lw ra,24(sp)
4000c0: 8fb10014 lw s1,20(sp)
4000c4: 8fb00010 lw s0,16(sp)
4000c8: 03e00008 jr ra
4000cc: 27bd0020 addiu sp,sp,32
004000d0 <call___do_global_dtors_aux>:
4000d0: 27bdffe8 addiu sp,sp,-24
4000d4: afbf0010 sw ra,16(sp)
4000d8: 8fbf0010 lw ra,16(sp)
4000dc: 00000000 nop
4000e0: 03e00008 jr ra
4000e4: 27bd0018 addiu sp,sp,24
004000e8 <frame_dummy>:
4000e8: 3c020000 lui v0,0x0
4000ec: 27bdffe8 addiu sp,sp,-24
4000f0: 3c040040 lui a0,0x40
4000f4: 3c050040 lui a1,0x40
4000f8: 24420000 addiu v0,v0,0
4000fc: afbf0010 sw ra,16(sp)
400100: 24840244 addiu a0,a0,580
400104: 10400003 beqz v0,400114 <frame_dummy+0x2c>
400108: 24a51270 addiu a1,a1,4720
40010c: 0c000000 jal 0 <_init-0x400018>
400110: 00000000 nop
400114: 3c040040 lui a0,0x40
400118: 8c831258 lw v1,4696(a0)
40011c: 3c020000 lui v0,0x0
400120: 10600007 beqz v1,400140 <frame_dummy+0x58>
400124: 24590000 addiu t9,v0,0
400128: 24841258 addiu a0,a0,4696
40012c: 13200004 beqz t9,400140 <frame_dummy+0x58>
400130: 00000000 nop
400134: 8fbf0010 lw ra,16(sp)
400138: 03200008 jr t9
40013c: 27bd0018 addiu sp,sp,24
400140: 8fbf0010 lw ra,16(sp)
400144: 00000000 nop
400148: 03e00008 jr ra
40014c: 27bd0018 addiu sp,sp,24
00400150 <call_frame_dummy>:
400150: 27bdffe8 addiu sp,sp,-24
400154: afbf0010 sw ra,16(sp)
400158: 8fbf0010 lw ra,16(sp)
40015c: 00000000 nop
400160: 03e00008 jr ra
400164: 27bd0018 addiu sp,sp,24
00400168 <main>:
400168: 27bdffe8 addiu sp,sp,-24
40016c: afbe0010 sw s8,16(sp)
400170: 03a0f021 move s8,sp
400174: c7808004 lwc1 $f0,-32764(gp)
400178: 00000000 nop
40017c: e7c00008 swc1 $f0,8(s8)
400180: c7808008 lwc1 $f0,-32760(gp)
400184: 00000000 nop
400188: e7c00004 swc1 $f0,4(s8)
40018c: c7c20008 lwc1 $f2,8(s8)
400190: c7c00004 lwc1 $f0,4(s8)
400194: 00000000 nop
400198: 46001000 add.s $f0,$f2,$f0
40019c: e7c00000 swc1 $f0,0(s8)
4001a0: 03c0e821 move sp,s8
4001a4: 8fbe0010 lw s8,16(sp)
4001a8: 27bd0018 addiu sp,sp,24
4001ac: 03e00008 jr ra
4001b0: 00000000 nop
004001b4 <__do_global_ctors_aux>:
4001b4: 3c020040 lui v0,0x40
4001b8: 2442124c addiu v0,v0,4684
4001bc: 8c44fffc lw a0,-4(v0)
4001c0: 27bdffe0 addiu sp,sp,-32
4001c4: 2403ffff li v1,-1
4001c8: afb00010 sw s0,16(sp)
4001cc: afbf0018 sw ra,24(sp)
4001d0: afb10014 sw s1,20(sp)
4001d4: 10830008 beq a0,v1,4001f8 <__do_global_ctors_aux+0x44>
4001d8: 2450fffc addiu s0,v0,-4
4001dc: 2411ffff li s1,-1
4001e0: 0080f809 jalr a0
4001e4: 2610fffc addiu s0,s0,-4
4001e8: 8e040000 lw a0,0(s0)
4001ec: 00000000 nop
4001f0: 1491fffb bne a0,s1,4001e0 <__do_global_ctors_aux+0x2c>
4001f4: 00000000 nop
4001f8: 8fbf0018 lw ra,24(sp)
4001fc: 8fb10014 lw s1,20(sp)
400200: 8fb00010 lw s0,16(sp)
400204: 03e00008 jr ra
400208: 27bd0020 addiu sp,sp,32
0040020c <call___do_global_ctors_aux>:
40020c: 27bdffe8 addiu sp,sp,-24
400210: afbf0010 sw ra,16(sp)
400214: 8fbf0010 lw ra,16(sp)
400218: 00000000 nop
40021c: 03e00008 jr ra
400220: 27bd0018 addiu sp,sp,24
Disassembly of section .fini:
00400224 <_fini>:
400224: 27bdffe0 addiu sp,sp,-32
400228: afbf0014 sw ra,20(sp)
40022c: 0c100010 jal 400040 <_ftext>
400230: 00000000 nop
400234: 8fbf0014 lw ra,20(sp)
400238: 27bd0020 addiu sp,sp,32
40023c: 03e00008 jr ra
400240: 00000000 nop
I am not good at MIPS assembly, can someone explain where are the floating point variable' value 1 and 2?
About your question
ELF executables can have one or more sections filled with the static data (string, floating point numbers, numbers, whatever) used by the program.
This sections are loaded into memory by the loader with the rest of the program, thereby avoiding intermixing code and data and reducing the code size.
For the ELF on MIPS systems you should refer to this where there is this nice picture:
As you can see $gp is used to address the section .sdata and .sbss, where the initial s stands for small.
All these efforts are taken to minimize code size as by using $gp the compiler can generate 16 bit offsets (versus the 32 bit ones normally used).
Since the offset is signed, $gp is placed in the middle of a (at most) 64 KiB region formed by .sdata + .sbss.
Your floating points value are not coded directly in the instructions because FP instruction does not takes immediates, instead they are saved into a readonly section and loaded from there.
About your purpose
Why in the end do you care about this?
If your goal is to design an implementation of the MIPS ISA, just pick the specific ISA (MIPS32 I? MIPS32 IV? MIPS 64?), get the documents, get the whole picture and implement a microarchitecture for it.
If an instruction is a valid instruction according to your chosen ISA then your implementation must be able to execute it, don't worry about what the compilers are doing, they are grown up, they can take care of them selves and in the end if the code you are executing is broken, who cares? As long it is valid.
These will help you:
MIPS32™ Architecture For Programmers Volume I: Introduction to the MIPS32™ Architecture
MIPS32™ Architecture For Programmers Volume II: The MIPS32™ Instruction Set

Resources