Related
I am trying to apply a type of side channel attack I read about in this paper that tries to infer execution state from differences in IRQ latencies on a MCU with a cortex M4 processor. The attack carefully interrupts instructions that occur right after a branch and measures the interrupt latency. When different branches have instructions of different lengths, you can look at the interrupt latency to determine in which of these branches the interrupt occurred and leak some of the program state.
I wrote a simple function that I want to attack in the way described above. I am using the SysTick timer to generate the interrupt at the correct point in time. To get an initial good value for the interrupt timer I used GDB to stop the program at the target line to see the SysTick value at that time.
I implemented a very simple interrupt handler that
loads the SysTick timer value from memory
subtracts this value from the reload value to get the elapsed time since interrupt (i.e. the IRQ latency)
clears the interrupt and
void __attribute__((interrupt("IRQ"))) SysTick_Handler(void)
{
/* USER CODE BEGIN SysTick_IRQn 0 */
SysTick->CTRL &= 0xfffffffe; // disable SysTick (~SysTick_CTRL_ENABLE_Msk)
*timer_value = SysTick->VAL; // capture counter value (as quickly as possible)
*timer_value = SysTick->LOAD - *timer_value; // subtract it from reload value to get IRQ latency
SysTick->VAL = 0; // reset initial value
}
However I find that I always get the same IRQ latency, regardless of the instruction that was interrupted. I expect the interrupt latency to be longer when a longer instruction is interrupted.
This is the function I wrote to test the attack
extern uint32_t *timer_value;
int sample_function(int *a, int *b){
/*
* function description -- store the smallest of the two value in a, if MEASURE_CYCLESS defined return the number
* of clock cycles that have been elapsed since the timer has been started
* r0 contains pointer to a
* r1 contains pointer to b
*/
__asm volatile(
/* push working registers */
"PUSH {r4-r8} \n"
/* move counter into r8 */
"MOV r8, #10 \n"
/* begin loop */
"begin_loop: \n"
/* decrement counter variable*/
"SUB r8, r8, #1 \n"
/* if counter variable not equal to 0, jump back to start of loop */
"CMP r8, #0 \n"
/* if r8 not equal to 0, jump back to begin of loop*/
"BNE begin_loop \n"
/* load a into r2 */
"LDR r2, [r0] \n"
/* load b into r3 */
"LDR r3, [r1] \n"
/* store a-b in r4, setting status flags -- if result is 0 Z flag is set */
"SUBS r4, r2, r3 \n"
/* if a-b positive, a is larger otherwise, b is larger (assuming a not equal to b) */
"BPL a_larger \n"
#ifdef SPY
/* load address of (*timer_value) into r4 -- use of LDR pseudo-instruction places constant in a literal pool*/
"LDR r4, =timer_value \n"
/* Load (*timer_value) into r4 */
"LDR r4, [r4] \n"
/* load address of Systick VAL into r5 */
"LDR r5, =0xe000e018 \n"
/* Load value at address stored in R5 (= Systick Val) */
"LDR r5, [r5] \n"
/* Move Systick Val into adress stored at r4 (= *timer_value = address of timer_value)*/
"STR r5, [r4] \n"
#endif
"NOP \n"
/*instruction that gets interrupted -- swap value*/
"STR r2, [r1] \n"
/* load value at this address into r0 (return value) */
"STR r3, [r0] \n"
"B end \n"
"a_larger: \n"
"MOV r0, #0 \n" // instruction that gets interrupted
"end: POP {r4-r8}"
); // pop working registers
}
Note, the section of code in the #define block is used to automatically determine a good timer reload value (instead of using GDB), but I'm currently not using the value I obtained this way.
I also have an empty loop in there to delay the instruction that is meant to be interrupted a bit.
The instruction that gets interrupted is the instruction right after the #define block. When I remove the NOP instruction I still get the same interrupt latency. When I increase or decrease the timer value (to interrupt some cycles earlier or later) I also still get the same IRQ latency.
Am I missing something here? Is there some behavior I do not know about?
Also, is it important to use the attribute __attribute__((interrupt("IRQ")) for an interrupt handler?
This is what I was thinking and commenting on.
bootstrap
.thumb_func
reset:
bl notmain
ldr r4,=0xE000E018
ldr r0,=0xE000E010
mov r1,#7
str r1,[r0]
b hang
.thumb_func
hang:
nop
nop
nop
nop
nop
nop
nop
b hang
setup uart and systick
void notmain ( void )
{
uart_init();
hexstring(0x12345678);
PUT32(STK_CSR,4);
PUT32(STK_RVR,0xF40000);
PUT32(STK_CVR,0x00000000);
//PUT32(STK_CSR,7);
}
event handler
.thumb_func
.globl systick_handler
systick_handler:
ldr r0,[r4]
ldr r5,[sp,#0x18]
push {r0,lr}
bl hexstrings
mov r0,r5
bl hexstring
pop {r0,pc}
grab the timer and address of interrupted instruction and print them out.
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 0800005C
00F3FFF4 0800005E
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 08000050
08000050 <hang>:
8000050: bf00 nop
8000052: bf00 nop
8000054: bf00 nop
8000056: bf00 nop
8000058: bf00 nop
800005a: bf00 nop
800005c: bf00 nop
800005e: e7f7 b.n 8000050 <hang>
From ARM's documentation.
Interrupt Latency
There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.
And that last line we can perhaps see happening here. You can try various instructions, but this architecture has the ability to restart the long duration instructions (reads and push/pop, multiply, and such). I think to see much of a latency difference you may need to create bus or shared resource contention (vs instructions)
Also systick is an exception not an interrupt, so there may be some differences with respect to latency.
Ok so I am trying to ditch energia texas instruments arduino style ide and I have used IAR for coding a Tiva C series development board where I was able to use pointers to memory locations to perform specific things like toggling led for example. I have had a hard time doing the same on a dev board running a MSP430FR5994 mcu, I know the memory address of the green led pin to be PORT 1 PIN 1 OR P1.1 on the board. I also have included the msp430.h header file for an api to the board from my ide. What I don't understand is why when in debug my code is changing the value of the correct registers to the correct numbers but it is not altering the board. I have also verified that it is connected to the board as it will not proceed to debug with it unplugged. My direct questions are this: 1 I should be able to alter memory locations with no headerfiles or any special api's as long as I know the specific addresses correct? 2 I did not see anything about clock gating in the data sheet and in debug I can see those registers changing values so is there something other than setting the pin direction and value that I need to do?( the default pin function is generic gpio I checked so I left that register alone. Any ideas or pointing out obvious errors in my approach would be very helpful thanks. In the code below I used the header file names as I could not get the direct pointers to work. Also I was confused by the data sheet as the base address for port 1 was written as 0200H which is 5 hex numbers when I was expecting 4 since the chip is 16bit system? I assumed with the offsets it meant 0x202H etc am I incorrect in this assumption?
registers during debugging image
ti.com/lit/ds/symlink/msp430fr5994.pdf (datasheet port 1 mem locations page 130)
#include <msp430.h>
/**
* main.c
*/
int main(void)
{
WDTCTL = WDTPW | WDTHOLD; // stop watchdog timer
while(1){
int i ;
int j ;
P1DIR = 2;
//*((unsigned int *)0x204Hu) = 2;
P1OUT = 2;
//*((unsigned int *)0x202Hu)= 2;
for( i = 0; i< 2 ; i++){}
P1OUT = 0;
//*((unsigned int *)0x202Hu)= 0;
for (j = 0 ; j< 2; j++){}
}
return 0;
}
See section 12.3.1 of the MSP430FR59xx User's Guide.
After a BOR reset, all port pins are high-impedance with Schmitt
triggers and their module functions disabled to prevent any cross
currents. The application must initialize all port pins including
unused ones (Section 12.3.2) as input high impedance, input with
pulldown, input with pullup, output high, or output low according to
the application needs by configuring PxDIR, PxREN, PxOUT, and PxIES
accordingly. This initialization takes effect as soon as the LOCKLPM5
bit in the PM5CTL register (described in the PMM chapter) is cleared;
until then, the I/Os remain in their high-impedance state with Schmitt
trigger inputs disabled.
And here is the example blinky code provided in the MSP430FR599x Code Examples available to download from here.
#include <msp430.h>
int main(void)
{
WDTCTL = WDTPW | WDTHOLD; // Stop WDT
// Configure GPIO
P1OUT &= ~BIT0; // Clear P1.0 output latch for a defined power-on state
P1DIR |= BIT0; // Set P1.0 to output direction
PM5CTL0 &= ~LOCKLPM5; // Disable the GPIO power-on default high-impedance mode
// to activate previously configured port settings
while(1)
{
P1OUT ^= BIT0; // Toggle LED
__delay_cycles(100000);
}
}
You probably need to add that PM5CTL0 &= ~LOCKLPM5; line to your code.
And then single-step through your code in the debugger to observe the LED. Because if you let your code run at full speed the delay loops are way too short to observe the LED flash with your eye.
This is derived from one of my examples; I don't have a card handy to test it on, but it should just work or be close.
startup.s
.word hang /* 0xFFE0 */
.word hang /* 0xFFE2 */
.word hang /* 0xFFE4 */
.word hang /* 0xFFE6 */
.word hang /* 0xFFE8 */
.word hang /* 0xFFEA */
.word hang /* 0xFFEC */
.word hang /* 0xFFEE */
.word hang /* 0xFFF0 */
.word hang /* 0xFFF2 */
.word hang /* 0xFFF4 */
.word hang /* 0xFFF6 */
.word hang /* 0xFFF8 */
.word hang /* 0xFFFA */
.word hang /* 0xFFFC */
.word reset /* 0xFFFE */
reset.s
.global reset
reset:
mov #0x03FF,r1
call #notmain
jmp hang
.global hang
hang:
jmp hang
.globl dummy
dummy:
ret
so.c
void dummy ( unsigned short );
#define WDTCTL (*((volatile unsigned short *)0x015C))
#define P1OUT (*((volatile unsigned short *)0x0202))
#define P1DIR (*((volatile unsigned short *)0x0204))
void notmain ( void )
{
unsigned short ra;
WDTCTL = 0x5A80;
P1DIR|=0x02;
while(1)
{
P1OUT |= 0x0002;
for(ra=0;ra<10000;ra++) dummy(ra);
P1OUT &= 0xFFFD;
for(ra=0;ra<10000;ra++) dummy(ra);
}
}
memmap
MEMORY
{
rom : ORIGIN = 0xC000, LENGTH = 0xFFE0-0xC000
ram : ORIGIN = 0x1C00, LENGTH = 0x2C00-0x1C00
vect : ORIGIN = 0xFFE0, LENGTH = 0x20
}
SECTIONS
{
VECTORS : { startup.o } > vect
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
build
msp430-gcc -Wall -O2 -c so.c -o so.o
msp430-ld -T memmap reset.o so.o startup.o -o so.elf
msp430-objdump -D so.elf > so.list
msp430-objcopy -O ihex so.elf out.hex
examine output
so.elf: file format elf32-msp430
Disassembly of section VECTORS:
0000ffe0 <VECTORS>:
ffe0: 0a c0 bic r0, r10
ffe2: 0a c0 bic r0, r10
ffe4: 0a c0 bic r0, r10
ffe6: 0a c0 bic r0, r10
ffe8: 0a c0 bic r0, r10
ffea: 0a c0 bic r0, r10
ffec: 0a c0 bic r0, r10
ffee: 0a c0 bic r0, r10
fff0: 0a c0 bic r0, r10
fff2: 0a c0 bic r0, r10
fff4: 0a c0 bic r0, r10
fff6: 0a c0 bic r0, r10
fff8: 0a c0 bic r0, r10
fffa: 0a c0 bic r0, r10
fffc: 0a c0 bic r0, r10
fffe: 00 c0 bic r0, r0
Disassembly of section .text:
0000c000 <reset>:
c000: 31 40 ff 03 mov #1023, r1 ;#0x03ff
c004: b0 12 0e c0 call #0xc00e
c008: 00 3c jmp $+2 ;abs 0xc00a
0000c00a <hang>:
c00a: ff 3f jmp $+0 ;abs 0xc00a
0000c00c <dummy>:
c00c: 30 41 ret
0000c00e <notmain>:
c00e: 0b 12 push r11
c010: b2 40 80 5a mov #23168, &0x015c ;#0x5a80
c014: 5c 01
c016: a2 d3 04 02 bis #2, &0x0204 ;r3 As==10
c01a: a2 d3 02 02 bis #2, &0x0202 ;r3 As==10
c01e: 0b 43 clr r11
c020: 0f 4b mov r11, r15
c022: b0 12 0c c0 call #0xc00c
c026: 1b 53 inc r11
c028: 3b 90 10 27 cmp #10000, r11 ;#0x2710
c02c: f9 23 jnz $-12 ;abs 0xc020
c02e: b2 f0 fd ff and #-3, &0x0202 ;#0xfffd
c032: 02 02
c034: 0b 43 clr r11
c036: 0f 4b mov r11, r15
c038: b0 12 0c c0 call #0xc00c
c03c: 1b 53 inc r11
c03e: 3b 90 10 27 cmp #10000, r11 ;#0x2710
c042: f9 23 jnz $-12 ;abs 0xc036
c044: ea 3f jmp $-42 ;abs 0xc01a
looks fine the vector table is there and points to the right place, etc.
10,000 might not be enough to see the led blink.
From the datasheet it appears that 0x202 is P1OUT and 0x204 is P1DIR
And you have to get it programmed into the board. I use mspdebug for the boards I have but that program may have stopped working on the eval boards from TI a while ago. And mspdebug supported the Intel hex format. So use objcopy for other formats.
If you were not wanting to use gnu tools then you still have to deal with the vector table and the bootstrap in front of the C code if you are trying to get away from someone's sandbox and do your own thing.
You are on the right path it may be as simple as your delay loops are way way too small and or getting optimized out since they are dead code as written.
If you rely on .bss or .data being initialized then you have more work to do in the linker script and bootstrap. I don't, so don't have that problem, actually wondering why .data was in this linker script...
I've matched addresses to the datasheet for your part, increasing the odds of success. If you use an external function and pass the loop variable to that function (the external can be C or asm, doesn't matter) then the optimizer won't remove it as dead code. That or add volatile on the loop variable and check the disassembly to see that it wasn't removed.
I am trying to compile freeRTOS for raspberry pi 2. Those are the commands I tried so far:
arm-none-eabi-gcc -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard test.c -o test.o
arm-none-eabi-as -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard startup.s -o startup.o
arm-none-eabi-ld test.o startup.o -static -Map kernel7.map -o
target.elf -T raspberrypi.ld
The two upper ones do work fine. However the last one doesn't, it gives me the following error:
startup.o: In function _start':
(.init+0x0): multiple definition of_start'
test.o::(.text+0x6c): first defined here
startup.o: In function swi_handler':
(.init+0x28): undefined reference tovPortYieldProcessor'
startup.o: In function irq_handler':
(.init+0x38): undefined reference tovFreeRTOS_ISR'
startup.o: In function zero_loop':
(.init+0xcc): undefined reference torpi_cpu_irq_disable'
This is the corresponding code:
test.c:
#include <stdio.h>
void exit(int code)
{
while(1)
;
}
int main(void)
{
return 0;
}
startup.s:
.extern system_init
.extern __bss_start
.extern __bss_end
.extern vFreeRTOS_ISR
.extern vPortYieldProcessor
.extern rpi_cpu_irq_disable
.extern main
.section .init
.globl _start
;;
_start:
;# All the following instruction should be read as:
;# Load the address at symbol into the program counter.
ldr pc,reset_handler ;# Processor Reset handler -- we will have to force this on the raspi!
;# Because this is the first instruction executed, of cause it causes an immediate branch into reset!
ldr pc,undefined_handler ;# Undefined instruction handler -- processors that don't have thumb can emulate thumb!
ldr pc,swi_handler ;# Software interrupt / TRAP (SVC) -- system SVC handler for switching to kernel mode.
ldr pc,prefetch_handler ;# Prefetch/abort handler.
ldr pc,data_handler ;# Data abort handler/
ldr pc,unused_handler ;# -- Historical from 26-bit addressing ARMs -- was invalid address handler.
ldr pc,irq_handler ;# IRQ handler
ldr pc,fiq_handler ;# Fast interrupt handler.
;# Here we create an exception address table! This means that reset/hang/irq can be absolute addresses
reset_handler: .word reset
undefined_handler: .word undefined_instruction
swi_handler: .word vPortYieldProcessor
prefetch_handler: .word prefetch_abort
data_handler: .word data_abort
unused_handler: .word unused
irq_handler: .word vFreeRTOS_ISR
fiq_handler: .word fiq
reset:
/* Disable IRQ & FIQ */
cpsid if
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continueBoot
overHyped: /* Get out of HYP mode */
ldr r1, =continueBoot
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;# CPSR_MODE_MASK
orr r1, r1, #0x13 ;# CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
continueBoot:
;# In the reset handler, we need to copy our interrupt vector table to 0x0000, its currently at 0x8000
mov r0,#0x8000 ;# Store the source pointer
mov r1,#0x0000 ;# Store the destination pointer.
;# Here we copy the branching instructions
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load multiple values from indexed address. ; Auto-increment R0
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store multiple values from the indexed address. ; Auto-increment R1
;# So the branches get the correct address we also need to copy our vector table!
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load from 4*n of regs (8) as R0 is now incremented.
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store this extra set of data.
;# Set up the various STACK pointers for different CPU modes
;# (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD2
msr cpsr_c,r0
mov sp,#0x8000
;# (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD1
msr cpsr_c,r0
mov sp,#0x4000
;# (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD3
msr cpsr_c,r0
mov sp,#0x8000000
ldr r0, =__bss_start
ldr r1, =__bss_end
mov r2, #0
zero_loop:
cmp r0,r1
it lt
strlt r2,[r0], #4
blt zero_loop
bl rpi_cpu_irq_disable
;# mov sp,#0x1000000
b main ;# We're ready?? Lets start main execution!
.section .text
undefined_instruction:
b undefined_instruction
prefetch_abort:
b prefetch_abort
data_abort:
b data_abort
unused:
b unused
fiq:
b fiq
hang:
b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl dummy
dummy:
bx lr
raspberrypi.ld:
/**
* BlueThunder Linker Script for the raspberry Pi!
*
*
*
**/
MEMORY
{
RESERVED (r) : ORIGIN = 0x00000000, LENGTH = 32K
INIT_RAM (rwx) : ORIGIN = 0x00008000, LENGTH = 32K
RAM (rwx) : ORIGIN = 0x00010000, LENGTH = 128M
}
ENTRY(_start)
SECTIONS {
/*
* Our init section allows us to place the bootstrap code at address 0x8000
*
* This is where the Graphics processor forces the ARM to start execution.
* However the interrupt vector code remains at 0x0000, and so we must copy the correct
* branch instructions to 0x0000 - 0x001C in order to get the processor to handle interrupts.
*
*/
.init : {
KEEP(*(.init))
} > INIT_RAM = 0
.module_entries : {
__module_entries_start = .;
KEEP(*(.module_entries))
KEEP(*(.module_entries.*))
__module_entries_end = .;
__module_entries_size = SIZEOF(.module_entries);
} > INIT_RAM
/**
* This is the main code section, it is essentially of unlimited size. (128Mb).
*
**/
.text : {
*(.text)
} > RAM
/*
* Next we put the data.
*/
.data : {
*(.data)
} > RAM
.bss :
{
__bss_start = .;
*(.bss)
*(.bss.*)
__bss_end = .;
} > RAM
/*
__exidx_start = .;
.ARM.exidx :
{
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > RAM
__exidx_end = .;
*/
/**
* Place HEAP here???
**/
PROVIDE(__HEAP_START = __bss_end );
/**
* Stack starts at the top of the RAM, and moves down!
**/
_estack = ORIGIN(RAM) + LENGTH(RAM);
}
As you can see test.c doesn't contain an entry point called _start, neither does it have one in its assembly compiled form. Only startup.s does.
Any idea's about how I could solve my current issue?
EDIT: all the code if needed used can be found here:https://github.com/jameswalmsley/RaspberryPi-FreeRTOS
I try to program Cortex-A9 in a bare metal fashion. I use the 'hello world' code from:
https://github.com/tukl-msd/gem5.bare-metal which works. However, I'm not able to get interrupts working. When I create an Interrupt with Interrupt e.g. #47 my software doesn't jump in the ISR function. What I am missing? Do I have to do some more initialization?
Startup Code:
.section INTERRUPT_VECTOR, "x"
.global _Reset
_Reset:
B Reset_Handler /* Reset */
B . /* Undefined */
B . /* SWI */
B . /* Prefetch Abort */
B . /* Data Abort */
B . /* reserved */
B irq_handler /* IRQ */
B irq_handler /* FIQ */
// Some Definitions for GIC:
.equ GIC_DIST, 0x10041000
.equ GIC_CPU , 0x10040000
// GIC Definitions for CPU interface
.equ ICCICR , 0x00
.equ ICCPMR , 0x04
.equ ICCEOIR , 0x10
.equ ICCIAR , 0x0C
// GIC Definitions for Distributor interface
.equ ICDDCR , 0x00
.equ ICDISER , 0x100
.equ ICDIPTR , 0x800
// Other Definitions
.equ USR_MODE , 0x10
GIC_dist_base : .word 0 // address of GIC distributor
GIC_cpu_base : .word 0 // address of GIC CPU interface
Reset_Handler:
LDR sp, =stack_top
// Enable Interrupts on CPU Side:
MRS r1, cpsr // get the cpsr.
BIC r1, r1, #0x80 // enable IRQ (ORR to disable).
MSR cpsr_c, r1 // copy it back, control field bit update.
// Configure GIC:
BL IC_init
// Branch to C code
BL main
B .
// Initialize GIC
.global GIC_init
IC_init:
stmfd sp!,{lr}
// Read GIC base from Configuration Base Address Register
// And use it to initialize GIC_dist_base and GIC_cpu_base
//mrc p15, 4, r0, c15, c0, 0
//add r2, r0, #GIC_DIST // Calculate address
ldr r2, =GIC_DIST
ldr r1, =GIC_dist_base
str r2,[r1] // Store address of GIC distributor
//add r2, r0, #GIC_CPU // Calculate address
ldr r2, =GIC_CPU
ldr r1, =GIC_cpu_base
str r2,[r1] // Store address of GIC CPU interface
// Register (ICCPMR) to enable interrutps of all priorities
ldr r1,=0xFFFF
ldr r2,=GIC_dist_base
str r1,[r2,#ICCPMR]
// Set the enable bit in the CPU interface control register
// ICCICR, allowing CPU(s) to receive interrupts
mov r1,#1
str r1,[r2,#ICCICR]
// Set the enable bit in the distributor control register
// ICDDCR, allowing interrpupts to be generated
ldr r2,=GIC_dist_base
ldr r2,[r2] // Nase address of distributor
mov r1, #1
str r1,[r2,#ICDDCR]
ldmfd sp!,{pc}
//config_interrupt (int ID , int CPU);
.global config_interrupt
config_interrupt:
stmfd sp!,{r4-r5, lr}
// Cinfigure the distributor interrupt set-enable registers (ICDISERn)
// enable the intterupt
// reg_offset = (M/32)*4 (shift and clear some bits)
// value = 1 << (N mod 32);
ldr r2,=GIC_dist_base
ldr r2,[r2] // Read GIC distributor base address
add r2,r2,#ICDISER // r2 <- base address of ICDSER regs
lsr r4,r0,#3 // clculate reg_offset
bic r4,r4,#3 // r4 <- reg_offset
add r4,r2,r4 // r4 <- address of ICDISERn
// Create a bit mask
and r2,r0,#0x1F // r2 <- N mod 32
mov r5,#1 // need to set one bit
lsl r2,r5,r2 // r2 <- value
// Using address in r4 and value in r2 to set the correct bit in the GIC register
ldr r3,[r4] // read ICDISERn
orr r3, r3, r2 // set the enable bit
str r3,[r4] // store the new register value
// Configure the distributor interrupt processor targets register (ICDIPTRn)
// select target CPU(s)
// reg_offset = (N/4)*4 (clear 2 bottom bits)
// index = N mod 4;
ldr r2,=GIC_dist_base
ldr r2,[r2] // Read GIC distributor base address
add r2,r2, #ICDIPTR // base address of ICDIPTR regs
bic r4,r0,#3 // r4 <- reg_offset
add r4,r2,r4 // r4 <- address of ICDIPTRn
// Get the address of th ebyte wihtih ICDIPTRn
and r2,r0,#0x3 // r2 <- index
add r4,r2,r4 // r4 <- byte address to be set
strb r1,[r4]
ldmfd sp!, {r4-r5, lr}
// int get_inLerrupt_number();
// Get the interrupt ID for the current interrupt. This should be called al the
// beginning of ISR. It also changes the state of the interrupt from pending to
// active, which helps to prevent other CPUs from trying to handle it.
.global get_interrupt_number
get_intterrupt_number:
// Read the JCCIAR from the CPU Interface
ldr r0,=GIC_cpu_base
ldr r0,[r0]
ldr r0,[r0,#ICCIAR]
mov pc,lr
// void end_of_interrupt (int ID);
// Notify the GIC that the interrupt has been processed. The state goes from
// active to inactive, or it goes from active and pending to pending.
.global end_of_interrupt
end_of_interrupt:
ldr r1,=GIC_cpu_base
ldr r1,[r1]
str r0,[r1,#ICCEOIR]
mov pc, lr
// IRQ Handler that calls the ISR function in C
.global irq_handler
irq_handler:
stmfd sp!,{r0-r7, lr}
// Call Interrupt Service Routine in C:
bl ISR
ldmfd sp!, {r0-r7, lr}
// Must substract 4 from lr
subs pc, lr, #4
Linker Script:
ENTRY(_Reset)
SECTIONS
{
. = 0x0;
.text : {
boot.o (INTERRUPT_VECTOR)
*(.text)
}
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4kB of stack memory */
stack_top = .;
PROVIDE (end = .) ;
}
Main C Program:
#include <stdio.h>
extern "C" void config_interrupt(int, int);
volatile unsigned int * const SHADOW = (unsigned int *)0x1000a000;
void sendShadow(unsigned int s)
{
*SHADOW = s;
}
int main(void)
{
config_interrupt(47,0);
unsigned int r = 1337;
while (1)
{
printf("Hello World! %d\n", r);
sendShadow(1);
}
}
void ISR(void)
{
printf("ISR");
}
I'm starting to write a toy OS for the BeagleBone Black, which uses an ARM Cortex-A8-based TI Sitara AM3359 SoC and the U-Boot bootloader. I've got a simple standalone hello world app writing to UART0 that I can load through U-Boot so far, and now I'm trying to move on to interrupt handlers, but I can't get SWI to do anything but hang the device.
According to the AM335x TRM (starting on page 4099, if you're interested), the interrupt vector table is mapped in ROM at 0x20000. The ROM SWI handler branches to RAM at 0x4030ce08, which branches to the address stored at 0x4030ce28. (Initially, this is a unique dead loop at 0x20084.)
My code sets up all the ARM processor modes' SP to their own areas at the top of RAM, and enables interrupts in the CPSR, then executes an SWI instruction, which always hangs. (Perhaps jumping to some dead-loop instruction?) I've looked at a bunch of samples, and read whatever documentation I could find, and I don't see what I'm missing.
Currently my only interaction with the board is over serial connection on UART0 with my linux box. U-Boot initializes UART0, and allows loading of the binary over the serial connection.
Here's the relevant assembly:
.arm
.section ".text.boot"
.equ usr_mode, 0x10
.equ fiq_mode, 0x11
.equ irq_mode, 0x12
.equ svc_mode, 0x13
.equ abt_mode, 0x17
.equ und_mode, 0x1b
.equ sys_mode, 0x1f
.equ swi_vector, 0x4030ce28
.equ rom_swi_b_addr, 0x20008
.equ rom_swi_addr, 0x20028
.equ ram_swi_b_addr, 0x4030CE08
.equ ram_swi_addr, 0x4030CE28
.macro setup_mode mode, stackpointer
mrs r0, cpsr
mov r1, r0
and r1, r1, #0x1f
bic r0, r0, #0x1f
orr r0, r0, #\mode
msr cpsr_csfx, r0
ldr sp, =\stackpointer
bic r0, r0, #0x1f
orr r0, r0, r1
msr cpsr_csfx, r0
.endm
.macro disable_interrupts
mrs r0, cpsr
orr r0, r0, #0x80
msr cpsr_c, r0
.endm
.macro enable_interrupts
mrs r0, cpsr
bic r0, r0, #0x80
msr cpsr_c, r0
.endm
.global _start
_start:
// Initial SP
ldr r3, =_C_STACK_TOP
mov sp, r3
// Set up all the modes' stacks
setup_mode fiq_mode, _FIQ_STACK_TOP
setup_mode irq_mode, _IRQ_STACK_TOP
setup_mode svc_mode, _SVC_STACK_TOP
setup_mode abt_mode, _ABT_STACK_TOP
setup_mode und_mode, _UND_STACK_TOP
setup_mode sys_mode, _C_STACK_TOP
// Clear out BSS
ldr r0, =_bss_start
ldr r1, =_bss_end
mov r5, #0
mov r6, #0
mov r7, #0
mov r8, #0
b _clear_bss_check$
_clear_bss$:
stmia r0!, {r5-r8}
_clear_bss_check$:
cmp r0, r1
blo _clear_bss$
// Load our SWI handler's address into
// the vector table
ldr r0, =_swi_handler
ldr r1, =swi_vector
str r0, [r1]
// Debug-print out these SWI addresses
ldr r0, =rom_swi_b_addr
bl print_mem
ldr r0, =rom_swi_addr
bl print_mem
ldr r0, =ram_swi_b_addr
bl print_mem
ldr r0, =ram_swi_addr
bl print_mem
enable_interrupts
swi_call$:
swi #0xCC
bl kernel_main
b _reset
.global _swi_handler
_swi_handler:
// Get the SWI parameter into r0
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
// Save lr onto the stack
stmfd sp!, {lr}
bl print_uint32
ldmfd sp!, {pc}
Those debugging prints produce the expected values:
00020008: e59ff018
00020028: 4030ce08
4030ce08: e59ff018
4030ce28: 80200164
(According to objdump, 0x80200164 is indeed _swi_handler. 0xe59ff018 is the instruction "ldr pc, [pc, #0x20]".)
What am I missing? It seems like this should work.
The firmware on the board changes the ARM execution mode and the locations of
the vector tables associated with the various modes. In my own case (a bare-metal
snippet code executed at Privilege Level 1 and launched by BBB's uBoot) the active vector table is at address 0x9f74b000.
In general, you might use something like the following function to locate the
active vector table:
static inline unsigned int *get_vectors_address(void)
{
unsigned int v;
/* read SCTLR */
__asm__ __volatile__("mrc p15, 0, %0, c1, c0, 0\n"
: "=r" (v) : : );
if (v & (1<<13))
return (unsigned int *) 0xffff0000;
/* read VBAR */
__asm__ __volatile__("mrc p15, 0, %0, c12, c0, 0\n"
: "=r" (v) : : );
return (unsigned int *) v;
}
change
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
stmfd sp!, {lr}
bl print_uint32
ldmfd sp!, {pc}
to
stmfd sp!, {r0-r3, r12, lr}
ldr r0, [lr, #-4]
bic r0, r0, #0xff000000
bl print_uint32
ldmfd sp!, {r0-r3, r12, pc}^
PS: You don't restore SPSR into CPSR of interrupted task AND you also scratch registers which are not banked by the cpu mode switch.