I'm writing an amateur operating system for ARM-based devices and currently trying to make it working in QEMU's versatilepb (ARM926EJ-S).
The problem arrives when I try to implement syscalls to my kernel. The idea is pretty simple: to implement system calls via SVC (SWI) instruction. So applications work in user mode, and to call a kernel function, they do SVC <code> instruction, so ARM processor switches to supervisor mode and calls the appropriate SVC handler.
But the problem is that when I call __asm__("SVC #0x08");, the device just resets and calls RESET_HANDLER, so it looks like the emulator just reboots.
I spent a few hours already to figure out what is the problem, but still got no idea.
Here is the code of ivt.s (the initial code with handlers):
.global __RESET
__RESET:
B RESET_HANDLER /* Reset */
B . /* Undefined */
B SWI_HANDLER /* SWI */
B . /* Prefetch Abort */
B . /* Data Abort */
B . /* reserved */
B . /* IRQ */
B . /* FIQ */
RESET_HANDLER:
MSR CPSR_c, 0x13 /* Supervisor mode */
LDR SP, =stack_top
MSR CPSR_c, 0x10 /* User mode */
LDR SP, =usr_stack_top
BL usermode_function
B .
SWI_HANDLER:
PUSH {LR}
BL syscall
POP {LR}
MOVS PC, LR
This is how I make the syscall:
void usermode_function() {
__asm__("SVC #0x00"); // Make syscall
}
And syscall implementation:
void syscall() {
// NEVER CALLED
__asm__("PUSH {r0-r7}");
__asm__("POP {r0-r7}");
}
But the code under SWI_HANDLER even never invoked.
I really even don't know how to ask the question, since it looks like I'm missing some very basic information in my mind.
So what could be the problem? Which information I should provide to make you able to help me?
Here is also the linker script:
ENTRY(__RESET)
SECTIONS
{
. = 0x10000;
.ivt . : { ivt.o(.text) }
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4KB of stack memory */
stack_top = .;
. = . + 0x100;
usr_stack_top = .;
}
Many thanks to #Jester and #old_timer, the problem is solved.
The problem was not with code, but with linker script. I have put my vector table at 0x10000, as you can see in the linker script, but it should be placed at 0x0. So SVC was not handled properly because the handler was placed in a wrong place.
When I changed the base address in my ld script and tried to load the firmware as ELF, everything starts to work perfectly.
You solved it one way but I'll still write my answer.
Very bare bare metal example...
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstrings ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x20);
}
static void hexstring ( unsigned int d )
{
hexstrings(d);
uart_send(0x0D);
uart_send(0x0A);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
hexstring(GETPC());
for(ra=0;ra<0x20;ra+=4)
{
hexstrings(ra);
hexstring(GET32(ra));
}
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Build
arm-linux-gnueabi-as --warn --fatal-warnings -march=armv5t strap.s -o strap.o
arm-linux-gnueabi-gcc -c -Wall -O2 -nostdlib -nostartfiles -ffreestanding -march=armv5t notmain.c -o notmain.o
arm-linux-gnueabi-ld strap.o notmain.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy notmain.elf -O binary notmain.bin
Execute
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.bin
Output
12345678
0001003C
00000000 E3A00000
00000004 E59F1004
00000008 E59F2004
0000000C E59FF004
00000010 00000183
00000014 00000100
00000018 00010000
0000001C 00000000
Examine, assemble disassemble
.word 0xE3A00000
.word 0xE59F1004
.word 0xE59F2004
.word 0xE59FF004
.word 0x00000183
.word 0x00000100
.word 0x00010000
.word 0x00000000
0: e3a00000 mov r0, #0
4: e59f1004 ldr r1, [pc, #4] ; 10 <.text+0x10>
8: e59f2004 ldr r2, [pc, #4] ; 14 <.text+0x14>
c: e59ff004 ldr pc, [pc, #4] ; 18 <.text+0x18>
10: 00000183 andeq r0, r0, r3, lsl #3
14: 00000100 andeq r0, r0, r0, lsl #2
18: 00010000 andeq r0, r1, r0
1c: 00000000 andeq r0, r0, r0
So you can see that they are basically launching a Linux kernel the ATAGS/dtb is in ram at 0x100 perhaps. And they jump to 0x10000. 0001003C being the pc shown by the program as loaded with that command line using the -O binary version was loaded at 0x10000 and executed there. If you were to have an swi event then you would execute starting with the ldr r2 instruction and land on the rest handler in your code.
(Note incidentally that qemu doesn't properly model uarts, at least so far as I have found so you don't have to initialize them you don't have to wait for the tx buffer to be empty you just jam bytes into the tx buffer and they come out).
If you run the elf without changing the linker script
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0001003C
00000000 00000000
00000004 00000000
00000008 00000000
0000000C 00000000
00000010 00000000
00000014 00000000
00000018 00000000
0000001C 00000000
Interesting it loads and runs at 0x10000 which is what it was linked for but doesn't bother to setup for coming out of reset at 0x00000000 and/or this is that linker issue that makes for bad elf files and it padded with zeros which is
1c: 00000000 andeq r0, r0, r0
So it could have executed from 0x00000000 to 0x10000 and run into our code.
If we change the linker script
ram : ORIGIN = 0x00000000, LENGTH = 32K
Run the elf not the bin
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.elf
12345678
0000003C
00000000 EA000002
00000004 EA000006
00000008 EA000006
0000000C EA000004
00000010 E321F013
00000014 E3A0D801
00000018 E321F010
0000001C E3A0DA09
as expected.
Now for the swi.
strap.s
.globl _start
_start:
b reset
b hang
b swi_handler
b hang
reset:
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
hexstring(0x12345678);
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
Run the elf, output is
12345678
11223344
12345678
as desired. But you could have also done this
strap.s
.globl _start
_start:
ldr pc,reset_addr
ldr pc,hang_addr
ldr pc,swi_handler_addr
ldr pc,hang_addr
reset_addr: .word reset
hang_addr: .word hang
swi_handler_addr: .word swi_handler
reset:
mov r0,#0x10000
mov r1,#0x00000
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
ldmia r0!,{r2,r3,r4,r5}
stmia r1!,{r2,r3,r4,r5}
msr cpsr_c, 0x13 /* Supervisor mode */
mov sp,#0x10000
msr cpsr_c, 0x10 /* User mode */
mov sp,#0x9000
bl notmain
hang:
b hang
swi_handler:
push {r0,r1,r2,r3,r4,lr}
bl handler
pop {r0,r1,r2,r3,r4,lr}
movs pc,lr
.globl GETPC
GETPC:
mov r0,pc
bx lr
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl do_swi
do_swi:
svc #0x08
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
unsigned int GETPC ( void );
void do_swi ( void );
#define UART_BASE 0x101F1000
#define UARTDR (UART_BASE+0x000)
static void uart_send ( unsigned int x )
{
PUT32(UARTDR,x);
}
static void hexstring ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
void handler ( void )
{
hexstring(0x11223344);
}
int notmain ( void )
{
unsigned int ra;
hexstring(0x12345678);
for(ra=0x10000;ra<0x10020;ra+=4) hexstring(GET32(ra));
for(ra=0x00000;ra<0x00020;ra+=4) hexstring(GET32(ra));
do_swi();
hexstring(0x12345678);
return(0);
}
memmap
MEMORY
{
ram : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
And now both the elf and the binary image versions work. I let the toolchain do the work for me:
00010010 <reset_addr>:
10010: 0001001c
00010014 <hang_addr>:
10014: 00010048
00010018 <swi_handler_addr>:
10018: 0001004c
The ldr pc, is position independent. I copy the four entries plus the four (well three) addresses so that 0x00000 matches 0x10000 and now the exception table (it is not a vector table btw) works.
With newer arm processors you could instead set VTOR to 0x10000 and it would use the one built into the binary, no copying necessary. Or as you solved just build and run your program from 0x00000 and there you go. I wanted to show the alternatives as well as how to figure out (by cheating, you have to love uarts in qemu) what qemu is doing and where it is loading without having to use a debugger.
Related
I am trying to generate exceptions like Bus Fault, Usage Fault on ARM Cortex-M3. My code for enable exceptions:
void EnableExceptions(void)
{
UINT32 uReg = SCB->SHCSR;
uReg |= 0x00070000;
SCB->SHCSR = uReg;
//Set Configurable Fault Status Register
SCB->CFSR = 0x0367E7C3;
//Set to 1 DIV_0_TRP register
SCB->CCR |= 0x00000010;
//Set priorities of fault handlers
NVIC_SetPriority(MemoryManagement_IRQn, 0x01);
NVIC_SetPriority(BusFault_IRQn, 0x01);
NVIC_SetPriority(UsageFault_IRQn, 0x01);
}
void UsageFault_Handler(void){
//handle
//I've set a breakpoint but system does not hit
}
void BusFault_Handler(void){
//handle
//I've set a breakpoint but system does not hit
}
I tried to generate division by zero exception and saw the variables value as "Infinity". However system does not generate any exception on keeps running. Also tried to generate Bus Fault exception and same thing happens.
Also when I comment out to EnableExceptions function system works correctly. What is wrong with my code? Does ARM handle these kind of errors inside of the microprocessor?
Cortex-M devices use the Thumb-2 instruction set exclusively, ARM uses the least significant bit of the the branch/jump/call address to determine whether the target is Thumb or ARM code, since the Cortex-M cannot run ARM code, you can generate an BusFault exception by creating a jump to an even address.
int dummy(){ volatile x = 0 ; return x ; }
int main()
{
typedef void (*fn_t)();
fn_t foo = (fn_t)(((char*)dummy) - 1) ;
foo() ;
}
The following will also work, since the call will fail before any instructions are executed, so it does not need to point to any valid code.
int main()
{
typedef void (*fn_t)();
fn_t foo = (fn_t)(0x8004000) ;
foo() ;
}
You can generate a usage fault by forcing an integer divide by zero:
int main()
{
volatile int x = 0 ;
volatile int y = 1 / x ;
}
From your comment question: How to generate any exception.
Here is one from the documentation:
Encoding T1 All versions of the Thumb instruction set.
SVC<c> #<imm8>
...
Exceptions
SVCall.
Which I can find by searching for SVCall.
Exceptions are well documented by ARM, there are ways to cause the exceptions you listed without having to break the bus (requiring a sim an fpga or creating your own silicon), you already know the search terms for the document to find busfault and usagefault.
How ARM handles these (internally or not) is documented. Internally in this case means a lockup or not if you look, otherwise they execute the fault handler (unless of course there is a fault fetching the fault handler).
Most you can create in C without resorting to assembly language instructions, but you have to be careful that it is generating what you think it is generating:
void fun ( void )
{
int x = 3;
int y = 0;
int z = x / y;
}
Disassembly of section .text:
00000000 <fun>:
0: 4770 bx lr
Instead you want something that actually generates the instruction that can cause the fault:
int fun0 ( int x, int y )
{
return(x/y);
}
void fun1 ( void )
{
fun0(3,0);
}
00000000 <fun0>:
0: fb90 f0f1 sdiv r0, r0, r1
4: 4770 bx lr
6: bf00 nop
00000008 <fun1>:
8: 4770 bx lr
but as shown you have to be careful about where and how you call it. In this case the call was done in the same file so the optimizer had visibility to see that this is now dead code and optimized it out, so a test like this would fail to generate a fault for multiple reasons.
This is why the OP needs to provide a complete minimal example the reason why faults are not being seen is not the processor. But the software and/or test code.
Edit
A complete minimal example, everything you need but a gnu toolchain (no . This is on a stm32 blue pill an STM32F103...
flash.s
.cpu cortex-m3
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset /* 1 Reset */
.word hang /* 2 NMI */
.word hang /* 3 HardFault */
.word hang /* 4 MemManage */
.word hang /* 5 BusFault */
.word usagefault /* 6 UsageFault */
.word hang /* 7 Reserved */
.word hang /* 8 Reserved */
.word hang /* 9 Reserved */
.word hang /*10 Reserved */
.word hang /*11 SVCall */
.word hang /*12 DebugMonitor */
.word hang /*13 Reserved */
.word hang /*14 PendSV */
.word hang /*15 SysTick */
.word hang /* External interrupt 1 */
.word hang /* External interrupt 2 */
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.thumb_func
.globl dosvc
dosvc:
svc 1
.thumb_func
.globl hop
hop:
bx r0
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
fun.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
void hop ( unsigned int );
#define GPIOCBASE 0x40011000
#define RCCBASE 0x40021000
#define SHCSR 0xE000ED24
void usagefault ( void )
{
unsigned int ra;
while(1)
{
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<100000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<100000;ra++) dummy(ra);
}
}
int notmain ( void )
{
unsigned int ra;
ra=GET32(SHCSR);
ra|=1<<18; //usagefault
PUT32(SHCSR,ra);
ra=GET32(RCCBASE+0x18);
ra|=1<<4; //enable port c
PUT32(RCCBASE+0x18,ra);
ra=GET32(GPIOCBASE+0x04);
ra&=(~(3<<20)); //PC13
ra|= (1<<20) ; //PC13
ra&=(~(3<<22)); //PC13
ra|= (0<<22) ; //PC13
PUT32(GPIOCBASE+0x04,ra);
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<200000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<200000;ra++) dummy(ra);
ra=GET32(0x08000004);
ra&=(~1);
hop(ra);
return(0);
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m3 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m3 -mthumb -c so.c -o so.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy -O binary so.elf so.bin
All of those command line options are not required arm-linux-gnueabi- and other flavors of gnu toolchains work just fine from several versions back to the present as I use them as a compiler, assembler and linker and don't mess with library or other stuff that varies from one flavor to another.
UsageFault The UsageFault fault handles non-memory related faults
caused by instruction execution.
A number of different situations cause usage faults, including:
• Undefined Instruction.
• Invalid state on instruction execution.
• Error on exception return.
• Attempting to access a disabled or unavailable coprocessor.
The following can cause usage faults when the processor is configured to
report them:
• A word or halfword memory accesses to an unaligned address.
• Division by zero.
Software can disable this fault. If it does, a UsageFault escalates to HardFault. UsageFault has a configurable priority.
...
Instruction execution with EPSR.T set to 0 causes the invalid state UsageFault
So the test here branches to an arm address vs a thumb address and this causes a usagefault. (Can read up about the BX instruction the psr.t bit how and when it gets changed, etc in the documentation as well)
Backing up this is the stm32 blue pill. There is an led on PC13, the code enables usagefault, configures PC13 as an output, blinks it once so we see the program started then if it hits the usagefault handler then it blinks forever.
ra&=(~1);
If you comment this out then it keeps branching to reset which does everything again one slow blink and you see that repeat forever.
Before running naturally you check the build to see that it has a chance of working:
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000
8000004: 08000049
8000008: 0800004f
800000c: 0800004f
8000010: 0800004f
8000014: 0800004f
8000018: 08000061
800001c: 0800004f
8000020: 0800004f
8000024: 0800004f
8000028: 0800004f
800002c: 0800004f
8000030: 0800004f
8000034: 0800004f
8000038: 0800004f
800003c: 0800004f
8000040: 0800004f
8000044: 0800004f
08000048 <reset>:
8000048: f000 f82a bl 80000a0 <notmain>
800004c: e7ff b.n 800004e <hang>
0800004e <hang>:
800004e: e7fe b.n 800004e <hang>
...
08000060 <usagefault>:
8000060: b570 push {r4, r5, r6, lr}
The vector table is correct the right vectors point to the right places.
0xE000ED28 CFSR RW 0x00000000
The HFSR is upper bits of the CFSR
> halt
target halted due to debug-request, current mode: Handler UsageFault
xPSR: 0x81000006 pc: 0x0800008a msp: 0x20000fc0
> mdw 0xE000ED28
0xe000ed28: 00020000
And that bit is
INVSTATE, bit[1]
0 EPSR.T bit and EPSR.IT bits are valid for instruction execution.
1 Instruction executed with invalid EPSR.T or EPSR.IT field.
Now
Using the CCR, see Configuration and Control Register, CCR on page B3-604, software can enable or disable:
• Divide by zero faults, alignment faults and some features of processor operation.
• BusFaults at priority -1 and higher.
The reset value of the CCR is IMPLEMENTATION DEFINED so it might just be enabled for you or not, likely have to look at the Cortex-m3 TRM or just read it:
> mdw 0xE000ED14
0xe000ed14: 00000000
so its zeros on mine.
So add fun.c:
unsigned int fun ( unsigned int x, unsigned int y)
{
return(x/y);
}
Change so.c:
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
unsigned int fun ( unsigned int, unsigned int);
#define GPIOCBASE 0x40011000
#define RCCBASE 0x40021000
#define SHCSR 0xE000ED24
#define CCR 0xE000ED14
void usagefault ( void )
{
unsigned int ra;
while(1)
{
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<100000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<100000;ra++) dummy(ra);
}
}
int notmain ( void )
{
unsigned int ra;
ra=GET32(SHCSR);
ra|=1<<18; //usagefault
PUT32(SHCSR,ra);
ra=GET32(CCR);
ra|=1<<4; //div by zero
PUT32(CCR,ra);
ra=GET32(RCCBASE+0x18);
ra|=1<<4; //enable port c
PUT32(RCCBASE+0x18,ra);
ra=GET32(GPIOCBASE+0x04);
ra&=(~(3<<20)); //PC13
ra|= (1<<20) ; //PC13
ra&=(~(3<<22)); //PC13
ra|= (0<<22) ; //PC13
PUT32(GPIOCBASE+0x04,ra);
PUT32(GPIOCBASE+0x10,1<<(13+0));
for(ra=0;ra<200000;ra++) dummy(ra);
PUT32(GPIOCBASE+0x10,1<<(13+16));
for(ra=0;ra<200000;ra++) dummy(ra);
fun(3,0);
return(0);
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m3 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m3 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m3 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o so.o fun.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy -O binary so.elf so.bin
confirm there is actually a divide instruction that we are going to hit
800011e: 2100 movs r1, #0
8000120: 2003 movs r0, #3
8000122: f000 f80f bl 8000144 <fun>
...
08000144 <fun>:
8000144: fbb0 f0f1 udiv r0, r0, r1
8000148: 4770 bx lr
load and run and the handler is called.
target halted due to debug-request, current mode: Handler UsageFault
xPSR: 0x81000006 pc: 0x08000086 msp: 0x20000fc0
> mdw 0xE000ED28
0xe000ed28: 02000000
and that indicates it was a divide by zero.
So everything you needed to know/do really was in the documentation, one document.
99.999% of bare-metal programming is reading or doing experiments to validate what was read, almost none of the work is writing the final application, it is but a tiny part of the job.
Before you can get to bare-metal programming you have to have mastered the toolchain otherwise nothing will work. Mastering the toolchain can be done without any target hardware, using free tools so that is just a matter of sitting down and doing it.
As far as you're trying to do a floating point divide by zero on a core that doesn't have hardware divide by zero you need to look at the soft float, for example libgcc:
ARM_FUNC_START divsf3
ARM_FUNC_ALIAS aeabi_fdiv divsf3
CFI_START_FUNCTION
# Mask out exponents, trap any zero/denormal/INF/NAN.
mov ip, #0xff
ands r2, ip, r0, lsr #23
do_it ne, tt
COND(and,s,ne) r3, ip, r1, lsr #23
teqne r2, ip
teqne r3, ip
beq LSYM(Ldv_s)
LSYM(Ldv_x):
...
# Division by 0x1p*: let''s shortcut a lot of code.
LSYM(Ldv_1):
and ip, ip, #0x80000000
orr r0, ip, r0, lsr #9
adds r2, r2, #127
do_it gt, tt
COND(rsb,s,gt) r3, r2, #255
and so on
which should have been visible in the disassembly, I don't off-hand see a forced exception (intentional undefined instruction, swi/svc or anything like that). This is only one possible library and now that I think about it this looks like arm not thumb, so would have to go looking for that, easier to just look at the disassembly.
Based on your comment and if I read the other question again I assume because it didn't raise an exception the correct result of a divide by zero is a properly signed infinity. but if you switch to a cortex-m4 or m7 then you might be able to trigger a hardware exception, but....read the documentation to find out.
Edit 2
void fun ( void )
{
int a = 3;
int b = 0;
volatile int c = a/b;
}
fun.c:6:18: warning: unused variable ‘c’ [-Wunused-variable]
6 | volatile int c = a/b;
| ^
08000140 <fun>:
8000140: deff udf #255 ; 0xff
8000142: bf00 nop
> halt
target halted due to debug-request, current mode: Handler UsageFault
xPSR: 0x01000006 pc: 0x08000076 msp: 0x20000fc0
> mdw 0xE000ED28
0xe000ed28: 00010000
and that bit means
The processor has attempted to execute an undefined instruction
So volatile failed to produce the desired result using gcc
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
arm-linux-gnueabi-gcc --version
arm-linux-gnueabi-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
produces
Disassembly of section .text:
00000000 <fun>:
0: deff udf #255 ; 0xff
2: bf00 nop
as well (and you can godbolt your way through others).
Yes a fault was produced and it was the usage fault but this would have been yet another Stack Overflow question as to why I didn't get a divide by zero. Blindly using volatile to force a divide doesn't work.
Making all three volatile
void fun ( void )
{
volatile int a = 3;
volatile int b = 0;
volatile int c = a/b;
}
Disassembly of section .text:
00000000 <fun>:
0: 2203 movs r2, #3
2: 2300 movs r3, #0
4: b084 sub sp, #16
6: 9201 str r2, [sp, #4]
8: 9302 str r3, [sp, #8]
a: 9b01 ldr r3, [sp, #4]
c: 9a02 ldr r2, [sp, #8]
e: fb93 f3f2 sdiv r3, r3, r2
12: 9303 str r3, [sp, #12]
14: b004 add sp, #16
16: 4770 bx lr
will generate the desired fault.
and with no optimizations
00000000 <fun>:
0: b480 push {r7}
2: b085 sub sp, #20
4: af00 add r7, sp, #0
6: 2303 movs r3, #3
8: 60fb str r3, [r7, #12]
a: 2300 movs r3, #0
c: 60bb str r3, [r7, #8]
e: 68fa ldr r2, [r7, #12]
10: 68bb ldr r3, [r7, #8]
12: fb92 f3f3 sdiv r3, r2, r3
16: 607b str r3, [r7, #4]
18: bf00 nop
1a: 3714 adds r7, #20
1c: 46bd mov sp, r7
1e: bc80 pop {r7}
20: 4770 bx lr
will also generate the desired fault
So master the language first (read read read), then master the toolchain second (read read read) then bare-metal programming (read read read). It is all about reading, not about coding. As shown above even with decades of experience at this level, you can't completely predict what the tools will generate; you have to just try it, but most important because you figured it out for one tool one time one day on one machine no reason to get too broad in your assumptions. Have to try it and examine what the compiler produces, repeat the process until you get the desired effect. Push comes to shove just write a few lines of asm and be done with it.
You weren't seeing faults because you weren't generating any and/or weren't trapping them or both. The list of possible reasons why is long based on the code provided, but these examples, that you should have no problem porting to your platform, should demonstrate your hardware works too, and then you can sort out why your software didn't by connecting the dots between code that does and code that doesn't. All I did was follow the documentation, and examine the output of the compiler, once I had the minimum number of things enabled, the fault handler was called. Without those enabled the usage fault was not triggered.
BusFault, HardFault, MemmanageFault, UsageFault, SVC Call , NMI those are internal exception for arm cortex-M microprocessors.
it depends really from which processor you are using, but let's suppose you are having cortex-m3:
By default all fault are mapped to hardfault handler unless you
enable them explicitly to get mapped to their own handler
Set bits: USGFAULTENA, BUSFAULTENA, MEMFAULTENA in system handler
control and state register => those fault each one will be mapped to
its proper handler
To generate a fault you can try :
Access not mapped memory area => this generate Busfault
Execute unrecognized instruction => UsageFault
Explictly trigger one of those fault by setting one of those bits:
USGFAULTACT, BUSFAULTACT, MEMFAULTACT in systen handler control and
status register => this will generate an exception for sure
Please for more details refer to : https://developer.arm.com/documentation/dui0552/a/cortex-m3-peripherals/system-control-block/system-handler-control-and-state-register?lang=en
In order to combine .c and assembly, I want to pass start address of my .c code, and program microcontroller to know that its program starts at that address. As I am writing my startup file in assembly, I need to pass .c code starting address to assembly, and then to write this address to the specific memory region of microcontroller ( so the microcontroller can start execution on this address after RESET)
Trying to create a project for stm32f103 in Keil with this structure:
Some .c file, for example main.c (for the main part of the program).
Startup file in assembly language. Which gets the adress of entry to the function written in some .c file, to be passed to Reset_Handler
Scatter file, written in this way:
LR_IROM1 0x08000000 0x00010000 { ; load region size_region
ER_IROM1 0x08000000 0x00010000 { ; load address = execution address
*.o (RESET, +First) ; RESET is code section with I.V.T.
* (InRoot$$Sections)
.ANY (+RO)
.ANY (+XO)
}
RW_IRAM1 0x20000000 0x00005000 { ; RW data
.ANY (+RW +ZI)
}
}
The problem is passing the entry point to the .c function. Reset_Handler, which needs .c entry point(starting adress) passed by __main, looks like this:
Reset_Handler PROC
EXPORT Reset_Handler [WEAK]
IMPORT __main
LDR R0, =__main
BX R0
ENDP
bout entry point __main, as a answer for one assembly raleted question was written:
__main() is the compiler supplied entry point for your C code. It is not the main() function you write, but performs initialisation for the
standard library, static data, the heap before calling your `main()'
function.
So, how to get this entry point in my assembly file?
Edit>> If somebody is interested in solution for KEIL, here it is, its all that simple!
Simple assembly startup.s file:
AREA STACK, NOINIT, READWRITE
SPACE 0x400
Stack_top
AREA RESET, DATA, READONLY
dcd Stack_top
dcd Reset_Handler
EXPORT _InitMC
IMPORT notmain
AREA PROGRAM, CODE, READONLY
Reset_Handler PROC
bl notmain
ENDP
_InitMC PROC ;start of the assembly procedure
Loop
b Loop ;infinite loop
ENDP
END
Simple c file:
extern int _InitMC();
int notmain(void) {
_InitMC();
return 0;
}
Linker is the same as the one mentioned above.
Project build was successful.
Using the gnu toolchain for example:
Bootstrap:
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word loop
.word loop
.word loop
.thumb_func
reset:
bl notmain
b loop
.thumb_func
loop: b .
.align
.thumb_func
.globl fun
fun:
bx lr
.end
C entry point (function name is not relevant, sometimes using main() adds garbage, depends on the compiler/toolchain)
void fun ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<1000;ra++) fun(ra);
return(0);
}
Linker script
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Build
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m0 -march=armv6-m -c so.c -o so.thumb.o
arm-none-eabi-ld -o so.thumb.elf -T flash.ld flash.o so.thumb.o
arm-none-eabi-objdump -D so.thumb.elf > so.thumb.list
arm-none-eabi-objcopy so.thumb.elf so.thumb.bin -O binary
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c so.c -o so.thumb2.o
arm-none-eabi-ld -o so.thumb2.elf -T flash.ld flash.o so.thumb2.o
arm-none-eabi-objdump -D so.thumb2.elf > so.thumb2.list
arm-none-eabi-objcopy so.thumb2.elf so.thumb2.bin -O binary
Result (all thumb versions)
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000
8000004: 08000015
8000008: 0800001b
800000c: 0800001b
8000010: 0800001b
08000014 <reset>:
8000014: f000 f804 bl 8000020 <notmain>
8000018: e7ff b.n 800001a <loop>
0800001a <loop>:
800001a: e7fe b.n 800001a <loop>
0800001c <fun>:
800001c: 4770 bx lr
800001e: 46c0 nop ; (mov r8, r8)
08000020 <notmain>:
8000020: b570 push {r4, r5, r6, lr}
8000022: 25fa movs r5, #250 ; 0xfa
8000024: 2400 movs r4, #0
8000026: 00ad lsls r5, r5, #2
8000028: 0020 movs r0, r4
800002a: 3401 adds r4, #1
800002c: f7ff fff6 bl 800001c <fun>
8000030: 42ac cmp r4, r5
8000032: d1f9 bne.n 8000028 <notmain+0x8>
8000034: 2000 movs r0, #0
8000036: bd70 pop {r4, r5, r6, pc}
Of course this has to be placed in flash at the right place with some tool.
The vector table is mapped by logic to 0x00000000 in the stm32 family.
08000000 <_start>:
8000000: 20001000
8000004: 08000015 <---- reset ORR 1
And in this minimal code the reset handler calls the C code the C code messes around and returns. Technically a fully functional program for most stm32s (change the stack init to a smaller value for those with less ram say 0x20000400 and it should work anywhere by using -mthumb by itself (armv4t) or adding the cortex-m0. well okay not the armv8ms they can technically not support all of armv6m but the one in the field I know about does.
I don't have Kiel so don't know how to translate to that, but it shouldn't be much of a stretch, just syntax.
I am executing C code for arm cortex-m3 for stm32l152C-discovery board but I observed that the function call from main is not getting pushed into the stack. I have analyzed the asm code of this source but I find it is OK. To understand better, please look the asm code generated for C code here:
main.elf: file format elf32-littlearm
*SYMBOL TABLE:
00000010 l d .text 00000000 .text
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 l d .debug_frame 00000000 .debug_frame
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000 clock.c
20004ffc g .text 00000000 _STACKTOP
**00000028 g F .text 000000e0 SystemClock_Config**
20000000 g .text 00000000 _DATA_BEGIN
20000000 g .text 00000000 _HEAP
**00000010 g F .text 00000016 main**
20000000 g .text 00000000 _BSS_END
00000108 g .text 00000000 _DATAI_BEGIN
20000000 g .text 00000000 _BSS_BEGIN
00000108 g .text 00000000 _DATAI_END
20000000 g .text 00000000 _DATA_END
Disassembly of section .text:
00000010 <main>:
#define LL_GPIO_MODE_OUTPUT 1
void SystemInit() ;
int main()
{
10: b580 push {r7, lr}
12: b082 sub sp, #8
14: af00 add r7, sp, #0
int i = 0;
16: 2300 movs r3, #0
18: 607b str r3, [r7, #4]
SystemClock_Config();
**1a: f000 f805 bl 28 <SystemClock_Config>
for(;;)
i++;
1e: 687b ldr r3, [r7, #4]
20: 3301 adds r3, #1**
22: 607b str r3, [r7, #4]
24: e7fb b.n 1e <main+0xe>
}
00000028 <SystemClock_Config>:
* PLLDIV = 3
* Flash Latency(WS) = 1
* #retval None
*/
void SystemClock_Config(void)
{
28: b480 push {r7}
2a: af00 add r7, sp, #0
SET_BIT(FLASH->ACR, FLASH_ACR_ACC64);
2c: 4a33 ldr r2, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
2e: 4b33 ldr r3, [pc, #204] ; (fc <SystemClock_Config+0xd4>)
30: 681b ldr r3, [r3, #0]
32: f043 0304 orr.w r3, r3, #4
36: 6013 str r3, [r2, #0]
MODIFY_REG(FLASH->ACR, FLASH_ACR_LATENCY, LL_FLASH_LATENCY_1);
38: 4a30 ldr r2, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3a: 4b30 ldr r3, [pc, #192] ; (fc <SystemClock_Config+0xd4>)
3c: 681b ldr r3, [r3, #0]
3e: f043 0301 orr.w r3, r3, #1
42: 6013 str r3, [r2, #0]*
}
the execution loops around 0x1a, 0x1c, 0x1e, 0x20 in PC register.
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
halted: PC: 0x0000001a
halted: PC: 0x0000001c
halted: PC: 0x0000001e
halted: PC: 0x00000020
It should jump to 0x28 (SystemClock_Config) at 0x1a.
A very simple completely working example:
vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl centry
done: b done
so.c
unsigned int fun ( unsigned int );
unsigned int centry ( void )
{
return(fun(5)+1);
}
fun.c
unsigned int fun ( unsigned int x )
{
return(x+1);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c so.c -o so.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c fun.c -o fun.o
arm-none-eabi-ld -o so.elf -T flash.ld vectors.o so.o fun.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
the whole program
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000009 andeq r0, r0, r9
00000008 <reset>:
8: f000 f802 bl 10 <centry>
0000000c <done>:
c: e7fe b.n c <done>
...
00000010 <centry>:
10: b510 push {r4, lr}
12: 2005 movs r0, #5
14: f000 f802 bl 1c <fun>
18: 3001 adds r0, #1
1a: bd10 pop {r4, pc}
0000001c <fun>:
1c: 3001 adds r0, #1
1e: 4770 bx lr
a simulation of the program:
read32(0x00000000)=0x20001000
read32(0x00000004)=0x00000009
--- 0x00000008: 0xF000
--- 0x0000000A: 0xF802 bl 0x0000000F
--- 0x00000010: 0xB510 push {r4,lr}
write32(0x20000FF8,0x00000000)
write32(0x20000FFC,0x0000000D)
--- 0x00000012: 0x2005 movs r0,#0x05
--- 0x00000014: 0xF000
--- 0x00000016: 0xF802 bl 0x0000001B
--- 0x0000001C: 0x3001 adds r0,#0x01
--- 0x0000001E: 0x4770 bx r14
--- 0x00000018: 0x3001 adds r0,#0x01
--- 0x0000001A: 0xBD10 pop {r4,pc}
read32(0x20000FF8)=0x00000000
read32(0x20000FFC)=0x0000000D
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
--- 0x0000000C: 0xE7FE b 0x0000000B
sure it is a somewhat useless program but it demonstrates booting and calling functions (the function address does not show up on the stack, when you do a call (bl) the r14 gets the return address and r15 gets the address to branch to. if you have nested functions like centry (the C entry point main() is not an important function name you can call your entry point whatever you want so long as your bootstrap matches) calling fun, then you need to preserve the return address however you choose, typically save it on the stack. r4 is being pushed just to keep the stack aligned on a 64 bit boundary per the abi.
for your system you would set the linker script for 0x08000000 normally (stm32).
What we are missing from you is the beginning of your binary, can you do a hexdump of the memory image/binary showing the handfuls of byte before main including the first few instructions of main?
If a bare metal program doesnt do the simplest boot steps right, the first thing you do is to examine the binary where the entry point or vector table is depending on the architecture and see that you built it right.
In this case in my example this is a cortex-m so the stack pointer initialization value (if you choose to use it) is at 0x00000000, you can put anything there and then simply write over the sp if you want, your choice...then address 0x00000004 is the reset vector which is the address of the code to handle the reset with the lsbit set to indicate thumb mode.
so 0x00000008|1 = 0x00000009.
If you dont have
0x2000xxxx
0x00000011
then your processor is not going to boot right. I am so much in the habit of using 0x08000000 that I dont remember if 0x00000000 works for an stm, it in theory should...but depends on how you are loading the flash and what mode/state the chip is in at that time.
you might need to link for 0x08000000 and at a minimum if nothing else changed
0x2000xxxx
0x08000011
as the first two word in your binary/memory image.
EDIT
note you can make a single binary that can be entered both with a vector or a bootloader
.thumb
.thumb_func
.global _start
_start:
bl reset
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
placing a branch (well bl to fill the space) in the stack address spot then loading the stack pointer later.
Or use a branch
.thumb
.thumb_func
.global _start
_start:
b reset
nop
.word _start
reset:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
Your application is missing an interrupt table. As a result, the processor is reading instructions as interrupt vectors, and faulting repeatedly as those instructions cannot be interpreted as invalid addresses.
Use the support files from the STM32L1xx standard peripheral library to generate an appropriate linker script and interrupt table.
I am trying to compile freeRTOS for raspberry pi 2. Those are the commands I tried so far:
arm-none-eabi-gcc -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard test.c -o test.o
arm-none-eabi-as -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard startup.s -o startup.o
arm-none-eabi-ld test.o startup.o -static -Map kernel7.map -o
target.elf -T raspberrypi.ld
The two upper ones do work fine. However the last one doesn't, it gives me the following error:
startup.o: In function _start':
(.init+0x0): multiple definition of_start'
test.o::(.text+0x6c): first defined here
startup.o: In function swi_handler':
(.init+0x28): undefined reference tovPortYieldProcessor'
startup.o: In function irq_handler':
(.init+0x38): undefined reference tovFreeRTOS_ISR'
startup.o: In function zero_loop':
(.init+0xcc): undefined reference torpi_cpu_irq_disable'
This is the corresponding code:
test.c:
#include <stdio.h>
void exit(int code)
{
while(1)
;
}
int main(void)
{
return 0;
}
startup.s:
.extern system_init
.extern __bss_start
.extern __bss_end
.extern vFreeRTOS_ISR
.extern vPortYieldProcessor
.extern rpi_cpu_irq_disable
.extern main
.section .init
.globl _start
;;
_start:
;# All the following instruction should be read as:
;# Load the address at symbol into the program counter.
ldr pc,reset_handler ;# Processor Reset handler -- we will have to force this on the raspi!
;# Because this is the first instruction executed, of cause it causes an immediate branch into reset!
ldr pc,undefined_handler ;# Undefined instruction handler -- processors that don't have thumb can emulate thumb!
ldr pc,swi_handler ;# Software interrupt / TRAP (SVC) -- system SVC handler for switching to kernel mode.
ldr pc,prefetch_handler ;# Prefetch/abort handler.
ldr pc,data_handler ;# Data abort handler/
ldr pc,unused_handler ;# -- Historical from 26-bit addressing ARMs -- was invalid address handler.
ldr pc,irq_handler ;# IRQ handler
ldr pc,fiq_handler ;# Fast interrupt handler.
;# Here we create an exception address table! This means that reset/hang/irq can be absolute addresses
reset_handler: .word reset
undefined_handler: .word undefined_instruction
swi_handler: .word vPortYieldProcessor
prefetch_handler: .word prefetch_abort
data_handler: .word data_abort
unused_handler: .word unused
irq_handler: .word vFreeRTOS_ISR
fiq_handler: .word fiq
reset:
/* Disable IRQ & FIQ */
cpsid if
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continueBoot
overHyped: /* Get out of HYP mode */
ldr r1, =continueBoot
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;# CPSR_MODE_MASK
orr r1, r1, #0x13 ;# CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
continueBoot:
;# In the reset handler, we need to copy our interrupt vector table to 0x0000, its currently at 0x8000
mov r0,#0x8000 ;# Store the source pointer
mov r1,#0x0000 ;# Store the destination pointer.
;# Here we copy the branching instructions
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load multiple values from indexed address. ; Auto-increment R0
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store multiple values from the indexed address. ; Auto-increment R1
;# So the branches get the correct address we also need to copy our vector table!
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load from 4*n of regs (8) as R0 is now incremented.
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store this extra set of data.
;# Set up the various STACK pointers for different CPU modes
;# (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD2
msr cpsr_c,r0
mov sp,#0x8000
;# (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD1
msr cpsr_c,r0
mov sp,#0x4000
;# (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD3
msr cpsr_c,r0
mov sp,#0x8000000
ldr r0, =__bss_start
ldr r1, =__bss_end
mov r2, #0
zero_loop:
cmp r0,r1
it lt
strlt r2,[r0], #4
blt zero_loop
bl rpi_cpu_irq_disable
;# mov sp,#0x1000000
b main ;# We're ready?? Lets start main execution!
.section .text
undefined_instruction:
b undefined_instruction
prefetch_abort:
b prefetch_abort
data_abort:
b data_abort
unused:
b unused
fiq:
b fiq
hang:
b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl dummy
dummy:
bx lr
raspberrypi.ld:
/**
* BlueThunder Linker Script for the raspberry Pi!
*
*
*
**/
MEMORY
{
RESERVED (r) : ORIGIN = 0x00000000, LENGTH = 32K
INIT_RAM (rwx) : ORIGIN = 0x00008000, LENGTH = 32K
RAM (rwx) : ORIGIN = 0x00010000, LENGTH = 128M
}
ENTRY(_start)
SECTIONS {
/*
* Our init section allows us to place the bootstrap code at address 0x8000
*
* This is where the Graphics processor forces the ARM to start execution.
* However the interrupt vector code remains at 0x0000, and so we must copy the correct
* branch instructions to 0x0000 - 0x001C in order to get the processor to handle interrupts.
*
*/
.init : {
KEEP(*(.init))
} > INIT_RAM = 0
.module_entries : {
__module_entries_start = .;
KEEP(*(.module_entries))
KEEP(*(.module_entries.*))
__module_entries_end = .;
__module_entries_size = SIZEOF(.module_entries);
} > INIT_RAM
/**
* This is the main code section, it is essentially of unlimited size. (128Mb).
*
**/
.text : {
*(.text)
} > RAM
/*
* Next we put the data.
*/
.data : {
*(.data)
} > RAM
.bss :
{
__bss_start = .;
*(.bss)
*(.bss.*)
__bss_end = .;
} > RAM
/*
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > RAM
__exidx_end = .;
*/
/**
* Place HEAP here???
**/
PROVIDE(__HEAP_START = __bss_end );
/**
* Stack starts at the top of the RAM, and moves down!
**/
_estack = ORIGIN(RAM) + LENGTH(RAM);
}
As you can see test.c doesn't contain an entry point called _start, neither does it have one in its assembly compiled form. Only startup.s does.
Any idea's about how I could solve my current issue?
EDIT: all the code if needed used can be found here:https://github.com/jameswalmsley/RaspberryPi-FreeRTOS
I'm trying to make my own code from the scratch for an interrupt program. I'm using zynq7000 which consists of two ARM cortex A9 processors.
I load the program written in C using SDK alongwith the the FSBL and bit file generated in PlanAhead, into the Flash Memory. When my program starts running the processor enters the user mode. In user mode IRQ and FIQ are disabled. I'm trying to enter the supervisor mode using the SWI instruction to enable the IRQ and FIQ interrupts. When I debug, it shows that it encounters the SIGTRAP when I call the SWI instruction.
So, how can I make my own code in C, which can enable the interrupts and run my ISR, even after the processor has booted(FSBL) and started?
Thanks
vectors.s
.globl _start
_start:
ldr pc,_reset_handler
ldr pc,_undefined_handler
ldr pc,_swi_handler
b hang
b hang
b hang
b hang
b hang
_reset_handler: .word reset
_undefined_handler: .word hang
_swi_handler: .word swi
reset:
mov sp,#0xD6000000
add sp,sp,#0xC000
bl notmain
hang: b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
swi:
ldr r0,[lr,#-4]
mov r1,#0x00FFFFFF
and r0,r0,r1
movs pc,lr
.globl switest
switest:
stmdb r13!,{lr}
swi #0x23
ldmia r13!,{lr}
bx lr
linker script
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 0x4000
}
SECTIONS
{
.text : { *(.text*) } > ram
}
notmain.c
extern void uart_init ( void );
extern void uart_putc ( unsigned char );
extern unsigned int switest ( void );
void hexstrings ( unsigned int d )
{
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_putc(rc);
if(rb==0) break;
}
uart_putc(0x20);
}
void hexstring ( unsigned int d )
{
hexstrings(d);
uart_putc(0x0D);
uart_putc(0x0A);
}
int notmain ( void )
{
uart_init();
hexstring(0x12345678);
hexstring(switest());
return(0);
}
The above are relevant parts from a minimal swi working example. Note that this only works for svc/swi instructions in arm mode not thumb nor thumb2, if you want to make a generic works in all modes, I recommend you dont bother with the svc immediate and instead pass something in a register.
This example does not enable interrupts it just shows how to make an svc/swi call and how to implement the exception handler. This knowledge has been well documented in the arm architectural reference manuals for many many years now.