Beagleboard Qemu baremetal with UEFI

Beagleboard Qemu baremetal with UEFI - arm

I am trying to boot a freertos app from UEFI on Qemu
When i run the app from uboot, using the below commands it runs without any errors
fatload mmc 0 80300000 rtosdemo.bin
go 0x80300000
An uefi application loads the elf file at 0x80300000 and then I tried two options.
My boot.s file is below
`start:
_start:
_mainCRTStartup:
ldr r0, .LC6
msr CPSR_c, #MODE_UND|I_BIT|F_BIT /* Undefined Instruction */
mov sp, r0
sub r0, r0, #UND_STACK_SIZE
msr CPSR_c, #MODE_ABT|I_BIT|F_BIT /* Abort Mode */
mov sp, r0
...
`
Disassembly file
`
80300000 <_undf-0x20>:
80300000: ea001424 b 80305098 <start>
80300004: e59ff014 ldr pc, [pc, #20] ; 80300020 <_undf>
80300008: e59ff014 ldr pc, [pc, #20] ; 80300024 <_swi>
8030000c: e59ff014 ldr pc, [pc, #20] ; 80300028 <_pabt>
80300010: e59ff014 ldr pc, [pc, #20] ; 8030002c <_dabt>
...........
80305098 <start>:
80305098: e59f00f4 ldr r0, [pc, #244] ; 80305194 <endless_loop+0x18>
8030509c: e321f0db msr CPSR_c, #219 ; 0xdb
803050a0: e1a0d000 mov sp, r0
803050a4: e2400004 sub r0, r0, #4
`
use goto 0x80305098 which is the entry point addr specified in the elf file. Now it jumps to ldr r0, .. instruction but after that it just seems to be jumping some where in the middle of some function rather than stepping into msr instruction.
Since in uboot its jumping to 0x80300000, I tried by jumping to that addr, now it goes to instruction b 80305098 <start>, but after that instruction instead of jumping to 80305098 it just goes to the next instruction ldr pc, [pc, #20].
So any ideas on where I am going wrong?
EDIT:
I updated boot.s to
start:
_start:
_mainCRTStartup:
.thumb
thumb_entry_point:
blx arm_entry_point
.arm
arm_entry_point:
ldr r0, .LC6
msr CPSR_c, #MODE_UND|I_BIT|F_BIT /* Undefined Instruction Mode */
mov sp, r0
Now it works fine.

This is ARM code, but it sounds very much like it's being jumped to in Thumb state. The word e59f00f4 will be interpreted in Thumb as lsls r4, r6, #3; b 0x80304bde (if I've got my address maths right), which seems consistent with "jumping somewhere in the middle of some function". You can verify by checking bit 5 of the CPSR (assuming you're not in user mode) - if it's set, you've come in in Thumb state.
If that is the case, then the 'proper' solution probably involves making the UEFI loader application clever enough to do the right kind of interworking branch, but a quick and easy hack would be to place a shim somewhere just for the initial entry, something like:
.thumb
thumb_entry_point:
blx arm_entry_point
.arm
arm_entry_point:
b start

Related

In house bootloader ARM cortex M4 NRF52 chip

I am working on making a bootloader for a side project.
I have read in a hex file, verified the checksum and stored everything in flash with a corresponding address with an offset of 0x4000. I am having issues jumping to my application. I have read, searched and tried alot of different things such as the code here.
http://www.keil.com/support/docs/3913.htm
my current code is this;
int binary_exec(void * Address){
int i;
__disable_irq();
// Disable IRQs
for (i = 0; i < 8; i ++) NVIC->ICER[i] = 0xFFFFFFFF;
// Clear pending IRQs
for (i = 0; i < 8; i ++) NVIC->ICPR[i] = 0xFFFFFFFF;
// -- Modify vector table location
// Barriars
__DSB();
__ISB();
// Change the vector table
SCB->VTOR = ((uint32_t)0x4000 & 0x1ffff80);
// Barriars
__DSB();
__ISB();
__enable_irq();
// -- Load Stack & PC
binExec(Address);
return 0;
}
__asm void binexec(uint32_t *address)
{
mov r1, r0
ldr r0, [r1, #4]
ldr sp, [r1]
blx r0"
}
This just jumps to a random location and does not do anything. I have manually added the address to the PC using keil's register window and it jumps straight to my application but I have not found a way to do it using code. Any ideas? Thank you in advance.
Also the second to last line of the hex file there is the start linear address record:
http://www.keil.com/support/docs/1584.htm
does anyone know what to do with this line?
Thank you,
Eric Micallef

This is what I am talking about can you show us some fragments that look like this, this is an entire application just doesnt do much...
20004000 <_start>:
20004000: 20008000
20004004: 20004049
20004008: 2000404f
2000400c: 2000404f
20004010: 2000404f
20004014: 2000404f
20004018: 2000404f
2000401c: 2000404f
20004020: 2000404f
20004024: 2000404f
20004028: 2000404f
2000402c: 2000404f
20004030: 2000404f
20004034: 2000404f
20004038: 2000404f
2000403c: 20004055
20004040: 2000404f
20004044: 2000404f
20004048 <reset>:
20004048: f000 f806 bl 20004058 <notmain>
2000404c: e7ff b.n 2000404e <hang>
2000404e <hang>:
2000404e: e7fe b.n 2000404e <hang>
20004050 <dummy>:
20004050: 4770 bx lr
...
20004054 <systick_handler>:
20004054: 4770 bx lr
20004056: bf00 nop
20004058 <notmain>:
20004058: b510 push {r4, lr}
2000405a: 2400 movs r4, #0
2000405c: 4620 mov r0, r4
2000405e: 3401 adds r4, #1
20004060: f7ff fff6 bl 20004050 <dummy>
20004064: 2c64 cmp r4, #100 ; 0x64
20004066: d1f9 bne.n 2000405c <notmain+0x4>
20004068: 2000 movs r0, #0
2000406a: bd10 pop {r4, pc}
offset 0x00 is the stack pointer
20004000: 20008000
offset 0x04 is the reset vector or the entry point to this program
20004004: 20004049
I filled in the unused ones so they land in an infinite loop
20004008: 2000404f
and tossed in a different one just to show
2000403c: 20004055
In this case the VTOR would be set to 0x2004000 I would read 0x20004049 from 0x20004004 and then BX to that address.
so my binexec would be fed the address 0x20004000 and I would do something like this
ldr r1,[r0]
mov sp,r1
ldr r2,[r0,#4]
bx r2
If I wanted to fake a reset into that code. a thumb approach with thumb2 I assume you can ldr sp,[r0], I dont hand code thumb2 so dont have those memorized, and there are different thumb2 sets of extensions, as well as different syntax options in gas.
Now if you were not going to support interrupts, or for other reasons (might carry some binary code in your flash that you want to perform better and you copy that from flash to ram then use it in ram) you could download to ram an application that simply has its first instruction at the entry point, no vector table:
20004000 <_start>:
20004000: f000 f804 bl 2000400c <notmain>
20004004: e7ff b.n 20004006 <hang>
20004006 <hang>:
20004006: e7fe b.n 20004006 <hang>
20004008 <dummy>:
20004008: 4770 bx lr
...
2000400c <notmain>:
2000400c: b510 push {r4, lr}
2000400e: 2400 movs r4, #0
20004010: 4620 mov r0, r4
20004012: 3401 adds r4, #1
20004014: f7ff fff8 bl 20004008 <dummy>
20004018: 2c64 cmp r4, #100 ; 0x64
2000401a: d1f9 bne.n 20004010 <notmain+0x4>
2000401c: 2000 movs r0, #0
2000401e: bd10 pop {r4, pc}
In this case it would need to be agreed that the downloaded program is built for 0x20004000, you would download the data to that address, but when you want to run it you would instead do this
.globl binexec
binexec:
bx r0
in C
binexec(0x20004000|1);
or
.globl binexec
binexec:
orr r0,#1
bx r0
just to be safe(r).
In both cases you need to build your binaries right if you want them to run, both have to be linked for the target address, in particular the vector table approach, thus the question, can you show us an example vector table from one of your downloaded, programs, even the first few words might suffice...

EXC_BAD_ACCESS when executing an arm blx rx

Here is the c-source code line which crashes on an armv7:
ret = fnPtr (param1, param2);
In the debugger, fnPtr has an address of 0x04216c00. When I disassemble at the pc where it's pointing at the statement above, here is what I get:
0x18918e: movw r0, #0x73c
0x189192: movt r0, #0x1
0x189196: add r0, r2
0x189198: ldr r0, [r0]
0x18919a: str r0, [sp, #0x20]
0x18919c: ldr r0, [sp, #0x20]
0x18919e: ldr r1, [sp, #0x28]
0x1891a0: ldr r2, [sp, #0x2c]
0x1891a2: str r0, [sp, #0x14]
0x1891a4: mov r0, r1
0x1891a6: mov r1, r2
0x1891a8: ldr r2, [sp, #0x14]
0x1891aa: blx r2
Now, when I disassemble the memory at address $r2 (=0x4216c00), I get what is seemingly valid code that should be executed without any problem:
(lldb) disassemble -s 0x4216c00 -C 10
0x4216c00: push {r4, r5, r6, r7, lr}
0x4216c04: add r7, sp, #0xc
0x4216c08: push {r8, r10, r11}
0x4216c0c: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
0x4216c10: sub r7, r7, #0x280
0x4216c14: mov r6, r0
0x4216c18: bx r1
0x4216c1c: add r7, r7, #0x280
Yet what really happens is this:
EXC_BAD_ACCESS (code=2, address=0x4216c00)
Can anyone explain what is wrong and why the address is considered illegal?
Full disclosure: I am no assembly expert. The code compiled and linked is all c-code. Compiler is clang.

Check the value of r2 before calling executing blx instruction. It might be odd, telling the cpu that address is in thumb mode however from the listing it looks like in arm mode.
Try forcing clang to only arm mode by -mno-thumb to test this.

The EXC_BAD_ACCESS exception has two bits of data in it, the first is the "kern_return_t" number describing the access failure, and the second is the address accessed. In your case the code is 2, which means (from /usr/include/mach/kern_return.h):
#define KERN_PROTECTION_FAILURE 2
/* Specified memory is valid, but does not permit the
* required forms of access.
*/
Not sure why this is happening, sounds like you are trying to execute code that doesn't have the execute permission set. What does:
(lldb) image lookup -va 0x4216c00
say?
BTW, the exception types are in /usr/include/mach/exception_types.h, and if the codes have machine specific meanings, those will be in, e.g. /usr/include/mach/i386/exception.h) For ARM info you may have to look in the header in the Xcode SDK.

ARM Cortex-M3 startup file

I am modifying a startup file for an ARM Cortex-M3 microcontroller. Everything works fine so far, but I have a question regarding the need of using assembler code to perform the zero-filling of the BSS block.
By default the reset interrupt in the startup file looks as follows:
// Zero fill the bss segment.
__asm( " ldr r0, =_bss\n"
" ldr r1, =_ebss\n"
" mov r2, #0\n"
" .thumb_func\n"
" zero_loop:\n"
" cmp r0, r1\n"
" it lt\n"
" strlt r2, [r0], #4\n"
" blt zero_loop"
);
Using that code everything works as expected. However if I change the previous code for the following it stops working:
// Zero fill the bss segment.
for(pui32Dest = &_bss; pui32Dest < &_ebss; )
{
*pui32Dest++ = 0;
}
In principle both codes should do the same (fill the BSS with zeros), but the second one does not work for some reason that I fail to understand. I belive that the .thumb_func directive must play a role here, but I am not very familiar with ARM assembler. Any ideas or directions to help me understanding? Thanks!
Edit: By the way, the code to initialize the data segment (e.g. copy from Flash to RAM) is as follows and works just fine.
// Copy the data segment initializers from flash to SRAM.
pui32Src = &_etext;
for(pui32Dest = &_data; pui32Dest < &_edata; )
{
*pui32Dest++ = *pui32Src++;
}
Edit: Added the dissasembled code for both functions.
Assembly for the first looks like:
2003bc: 4806 ldr r0, [pc, #24] ; (2003d8 <zero_loop+0x14>)
2003be: 4907 ldr r1, [pc, #28] ; (2003dc <zero_loop+0x18>)
2003c0: f04f 0200 mov.w r2, #0
002003c4 <zero_loop>:
2003c4: 4288 cmp r0, r1
2003c6: bfb8 it lt
2003c8: f840 2b04 strlt.w r2, [r0], #4
2003cc: dbfa blt.n 2003c4 <zero_loop>
Assembly for the second looks like:
2003bc: f645 5318 movw r3, #23832 ; 0x5d18
2003c0: f2c2 0300 movt r3, #8192 ; 0x2000
2003c4: 9300 str r3, [sp, #0]
2003c6: e004 b.n 2003d2 <ResetISR+0x6e>
2003c8: 9b00 ldr r3, [sp, #0]
2003ca: 1d1a adds r2, r3, #4
2003cc: 9200 str r2, [sp, #0]
2003ce: 2200 movs r2, #0
2003d0: 601a str r2, [r3, #0]
2003d2: 9a00 ldr r2, [sp, #0]
2003d4: f644 033c movw r3, #18492 ; 0x483c
2003d8: f2c2 0300 movt r3, #8192 ; 0x2000
2003dc: 429a cmp r2, r3
2003de: d3f3 bcc.n 2003c8 <ResetISR+0x64>

If the initial stack is in the .bss section as suggested, you can see from the disassembly why the C code fails - it's loading the current pointer from the stack, saving the incremented pointer back to the stack, zeroing the location, then reloading the incremented pointer for the next iteration. If you zero the contents of the stack while using them, Bad Things happen.
In this case, turning on optimisation might fix it (a smart compiler should generate pretty much the same as the assembly code if it actually tries). More generally, though, it's probably safer to consider sticking with assembly code when doing things like this that would normally be done at a level below the C runtime environment - bootstrapping a C environment from C code which expects that environment to exist already is risky at best, since you can only hope the code doesn't attempt to use anything that's not yet set up.
After a quick look around (I'm not overly familiar with the specifics of Cortex-M development), it seems an alternative/additional solution might be adjusting the linker script to move the stack somewhere else.

ARM bootloader: Interrupt Vector Table Understanding

The code following is the first part of u-boot to define interrupt vector table, and my question is how every line will be used. I understand the first 2 lines which is the starting point and the first instruction to implement: reset, and we define reset below. But when will we use these instructions below? According to System.map, every instruction has a fixed address, so _fiq is at 0x0000001C, when we want to execute fiq, we will copy this address into pc and then execute,right? But in which way can we jump to this instruction: ldr pc, _fiq? It's realised by hardware or software? Hope I make myself understood correctly.
>.globl _start
>_start:b reset
> ldr pc, _undefined_instruction
> ldr pc, _software_interrupt
> ldr pc, _prefetch_abort
> ldr pc, _data_abort
> ldr pc, _not_used
> ldr pc, _irq
> ldr pc, _fiq
>_undefined_instruction: .word undefined_instruction
>_software_interrupt: .word software_interrupt
>_prefetch_abort: .word prefetch_abort
>_data_abort: .word data_abort
>_not_used: .word not_used
>_irq: .word irq
>_fiq: .word fiq

If you understand reset then you understand all of them.
When the processor is reset then hardware sets the pc to 0x0000 and starts executing by fetching the instruction at 0x0000. When an undefined instruction is executed or tries to be executed the hardware responds by setting the pc to 0x0004 and starts executing the instruction at 0x0004. irq interrupt, the hardware finishes the instruction it is executing starts executing the instruction at address 0x0018. and so on.
00000000 <_start>:
0: ea00000d b 3c <reset>
4: e59ff014 ldr pc, [pc, #20] ; 20 <_undefined_instruction>
8: e59ff014 ldr pc, [pc, #20] ; 24 <_software_interrupt>
c: e59ff014 ldr pc, [pc, #20] ; 28 <_prefetch_abort>
10: e59ff014 ldr pc, [pc, #20] ; 2c <_data_abort>
14: e59ff014 ldr pc, [pc, #20] ; 30 <_not_used>
18: e59ff014 ldr pc, [pc, #20] ; 34 <_irq>
1c: e59ff014 ldr pc, [pc, #20] ; 38 <_fiq>
00000020 <_undefined_instruction>:
20: 00000000 andeq r0, r0, r0
00000024 <_software_interrupt>:
24: 00000000 andeq r0, r0, r0
00000028 <_prefetch_abort>:
28: 00000000 andeq r0, r0, r0
0000002c <_data_abort>:
2c: 00000000 andeq r0, r0, r0
00000030 <_not_used>:
30: 00000000 andeq r0, r0, r0
00000034 <_irq>:
34: 00000000 andeq r0, r0, r0
00000038 <_fiq>:
38: 00000000 andeq r0, r0, r0
Now of course in addition to changing the pc and starting execution from these addresses. The hardware will save the state of the machine, switch processor modes if necessary and then start executing at the new address from the vector table.
Our job as programmers is to build the binary such that the instructions we want to be run for each of these instructions is at the right address. The hardware provides one word, one instruction for each location. Now if you never expect to ever have any of these exceptions, you dont have to have a branch at address zero for example you can just have your program start, there is nothing magic about the memory at these addresses. If you expect to have these exceptions, then you have two choices for instructions that are one word and can jump out of the way of the exception that follows. One is a branch the other is a load pc. There are pros and cons to each.

When the hardware takes an exception, the program counter (PC) is automatically set to the address of the relevant exception vector and the processor begins executing instructions from that address. When the processor comes out of reset, the PC is automatically set to base+0. An undefined instruction sets the PC to base+4, etc. The base address of the vector table (base) is either 0x00000000, 0xFFFF0000, or VBAR depending on the processor and configuration. Note that this provides limited flexibility in where the vector table gets placed and you'll need to consult the ARM documentation in conjunction with the reference manual for the device that you are using to get the right value to be used.
The layout of the table (4 bytes per exception) makes it necessary to immediately branch from the vector to the actual exception handler. The reasons for the LDR PC, label approach are twofold - because a PC-relative branch is limited to (24 << 2) bits (+/-32MB) using B would constrain the layout of the code in memory somewhat; by loading an absolute address the handler can be located anywhere in memory. Secondly it makes it very simple to change exception handlers at runtime, by simply writing a different address to that location, rather than having to assemble and hotpatch a branch instruction.
There's little value to having a remappable reset vector in this way, however, which is why you tend to see that one implemented as a simple branch to skip over the rest of the vectors to the real entry point code.

Why does my SWI instruction hang? (BeagleBone Black, ARM Cortex-A8 cpu)

I'm starting to write a toy OS for the BeagleBone Black, which uses an ARM Cortex-A8-based TI Sitara AM3359 SoC and the U-Boot bootloader. I've got a simple standalone hello world app writing to UART0 that I can load through U-Boot so far, and now I'm trying to move on to interrupt handlers, but I can't get SWI to do anything but hang the device.
According to the AM335x TRM (starting on page 4099, if you're interested), the interrupt vector table is mapped in ROM at 0x20000. The ROM SWI handler branches to RAM at 0x4030ce08, which branches to the address stored at 0x4030ce28. (Initially, this is a unique dead loop at 0x20084.)
My code sets up all the ARM processor modes' SP to their own areas at the top of RAM, and enables interrupts in the CPSR, then executes an SWI instruction, which always hangs. (Perhaps jumping to some dead-loop instruction?) I've looked at a bunch of samples, and read whatever documentation I could find, and I don't see what I'm missing.
Currently my only interaction with the board is over serial connection on UART0 with my linux box. U-Boot initializes UART0, and allows loading of the binary over the serial connection.
Here's the relevant assembly:
.arm
.section ".text.boot"
.equ usr_mode, 0x10
.equ fiq_mode, 0x11
.equ irq_mode, 0x12
.equ svc_mode, 0x13
.equ abt_mode, 0x17
.equ und_mode, 0x1b
.equ sys_mode, 0x1f
.equ swi_vector, 0x4030ce28
.equ rom_swi_b_addr, 0x20008
.equ rom_swi_addr, 0x20028
.equ ram_swi_b_addr, 0x4030CE08
.equ ram_swi_addr, 0x4030CE28
.macro setup_mode mode, stackpointer
mrs r0, cpsr
mov r1, r0
and r1, r1, #0x1f
bic r0, r0, #0x1f
orr r0, r0, #\mode
msr cpsr_csfx, r0
ldr sp, =\stackpointer
bic r0, r0, #0x1f
orr r0, r0, r1
msr cpsr_csfx, r0
.endm
.macro disable_interrupts
mrs r0, cpsr
orr r0, r0, #0x80
msr cpsr_c, r0
.endm
.macro enable_interrupts
mrs r0, cpsr
bic r0, r0, #0x80
msr cpsr_c, r0
.endm
.global _start
_start:
// Initial SP
ldr r3, =_C_STACK_TOP
mov sp, r3
// Set up all the modes' stacks
setup_mode fiq_mode, _FIQ_STACK_TOP
setup_mode irq_mode, _IRQ_STACK_TOP
setup_mode svc_mode, _SVC_STACK_TOP
setup_mode abt_mode, _ABT_STACK_TOP
setup_mode und_mode, _UND_STACK_TOP
setup_mode sys_mode, _C_STACK_TOP
// Clear out BSS
ldr r0, =_bss_start
ldr r1, =_bss_end
mov r5, #0
mov r6, #0
mov r7, #0
mov r8, #0
b _clear_bss_check$
_clear_bss$:
stmia r0!, {r5-r8}
_clear_bss_check$:
cmp r0, r1
blo _clear_bss$
// Load our SWI handler's address into
// the vector table
ldr r0, =_swi_handler
ldr r1, =swi_vector
str r0, [r1]
// Debug-print out these SWI addresses
ldr r0, =rom_swi_b_addr
bl print_mem
ldr r0, =rom_swi_addr
bl print_mem
ldr r0, =ram_swi_b_addr
bl print_mem
ldr r0, =ram_swi_addr
bl print_mem
enable_interrupts
swi_call$:
swi #0xCC
bl kernel_main
b _reset
.global _swi_handler
_swi_handler:
// Get the SWI parameter into r0
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
// Save lr onto the stack
stmfd sp!, {lr}
bl print_uint32
ldmfd sp!, {pc}
Those debugging prints produce the expected values:
00020008: e59ff018
00020028: 4030ce08
4030ce08: e59ff018
4030ce28: 80200164
(According to objdump, 0x80200164 is indeed _swi_handler. 0xe59ff018 is the instruction "ldr pc, [pc, #0x20]".)
What am I missing? It seems like this should work.

The firmware on the board changes the ARM execution mode and the locations of
the vector tables associated with the various modes. In my own case (a bare-metal
snippet code executed at Privilege Level 1 and launched by BBB's uBoot) the active vector table is at address 0x9f74b000.
In general, you might use something like the following function to locate the
active vector table:
static inline unsigned int *get_vectors_address(void)
{
unsigned int v;
/* read SCTLR */
__asm__ __volatile__("mrc p15, 0, %0, c1, c0, 0\n"
: "=r" (v) : : );
if (v & (1<<13))
return (unsigned int *) 0xffff0000;
/* read VBAR */
__asm__ __volatile__("mrc p15, 0, %0, c12, c0, 0\n"
: "=r" (v) : : );
return (unsigned int *) v;
}

change
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
stmfd sp!, {lr}
bl print_uint32
ldmfd sp!, {pc}
to
stmfd sp!, {r0-r3, r12, lr}
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
bl print_uint32
ldmfd sp!, {r0-r3, r12, pc}^
PS: You don't restore SPSR into CPSR of interrupted task AND you also scratch registers which are not banked by the cpu mode switch.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight