AM335x FreeRTOS port, unable to handle IRQs and SWI

AM335x FreeRTOS port, unable to handle IRQs and SWI - c

I'm currently trying to port FreeRTOS to the TI AM335x processor, best known for being used on the BeagleBones. I am able to boot, run GPIOs and setup a compare match timer for running the system ticks. If I disable interrupts, i can see how the interrupt get set after a correct amount of time after the timer was started. And if I enable interrupts, my application dies after that same given time. The application also dies if I try to yield a task, aka calling the SWI handler. This makes me belive that the vector table is unavailable or incorrectly setup. The ROM Exception Vectors for SWI and IRQ has the contenct 4030CE08h and 4030CE18h. Which again in RAM executes some branching, the TRM says:
User code can redirect any exception to a custom handler either by writing its address to the appropriate location from 4030CE24h to 4030CE3Ch or by overriding the branch (load into PC) instruction between addresses from 4030CE04h to 4030CE1Ch.
My vIRQHandler function address is therefore written to 4030CE38h. One would hope this was enough, but sadly no. I suspect that there is something wrong in my boot.s file, however my assembly has never been that great and i'm struggling to understand the code. The boot.s and the rest of the project was started from a OMAP3 port.
Boot.s:
.section .startup,"ax"
.code 32
.align 0
b _start /* reset - _start */
ldr pc, _undf /* undefined - _undf */
ldr pc, _swi /* SWI - _swi */
ldr pc, _pabt /* program abort - _pabt */
ldr pc, _dabt /* data abort - _dabt */
nop /* reserved */
ldr pc, _irq /* IRQ - read the VIC */
ldr pc, _fiq /* FIQ - _fiq */
_undf: .word 0x4030CE24 /* undefined */
_swi: .word 0x4030CE28 /* SWI */
_pabt: .word 0x4030CE2C /* program abort */
_dabt: .word 0x4030CE30 /* data abort */
_irq: .word 0x4030CE38
_fiq: .word 0x4030CE3C /* FIQ */
The branch to start sets up a stack for each mode and clears the bss, not sure if that is relevant. This is the code which seems relevant to me, and I have updated the words to fit the AM335 instead of the OMAP3.
The setting IRQ handler:
#define E_IRQ (*(REG32 (0x4030CE38)))
....
/* Setup interrupt handler */
E_IRQ = ( long ) vIRQHandler;
If anyone have any pointers to an assembly newbie it would be much appriciated, because i'm completely stuck :)

U-boot had moved the exception vector table. However, instead of recompiling u-boot I just reset the exception vector table in my own start script.
Added this right before branching to main:
/* Set V=0 in CP15 SCTRL register - for VBAR to point to vector */
mrc p15, 0, r0, c1, c0, 0 # Read CP15 SCTRL Register
bic r0, #(1 << 13) # V = 0
mcr p15, 0, r0, c1, c0, 0 # Write CP15 SCTRL Register
/* Set vector address in CP15 VBAR register */
ldr r0, =_vector_table
mcr p15, 0, r0, c12, c0, 0 #Set VBAR
bl main
And put in the _vector_table label at the start of my exception vector table:
.section .startup,"ax"
.code 32
.align 0
_vector_table: b _start /* reset - _start */
ldr pc, _undf /* undefined - _undf */
ldr pc, _swi /* SWI - _swi */
ldr pc, _pabt /* program abort - _pabt */
ldr pc, _dabt /* data abort - _dabt */
nop /* reserved */
ldr pc, _irq /* IRQ - read the VIC */
ldr pc, _fiq /* FIQ - _fiq */
Now all the exceptions gets redirected to my code. Hopefully this will help anyone in the same situation that I was in:)

Related

Why is the IRQ latency in my ARM interrupt handler always the same, regardless of the instruction that is being interrupted?

I am trying to apply a type of side channel attack I read about in this paper that tries to infer execution state from differences in IRQ latencies on a MCU with a cortex M4 processor. The attack carefully interrupts instructions that occur right after a branch and measures the interrupt latency. When different branches have instructions of different lengths, you can look at the interrupt latency to determine in which of these branches the interrupt occurred and leak some of the program state.
I wrote a simple function that I want to attack in the way described above. I am using the SysTick timer to generate the interrupt at the correct point in time. To get an initial good value for the interrupt timer I used GDB to stop the program at the target line to see the SysTick value at that time.
I implemented a very simple interrupt handler that
loads the SysTick timer value from memory
subtracts this value from the reload value to get the elapsed time since interrupt (i.e. the IRQ latency)
clears the interrupt and
void __attribute__((interrupt("IRQ"))) SysTick_Handler(void)
{
/* USER CODE BEGIN SysTick_IRQn 0 */
SysTick->CTRL &= 0xfffffffe; // disable SysTick (~SysTick_CTRL_ENABLE_Msk)
*timer_value = SysTick->VAL; // capture counter value (as quickly as possible)
*timer_value = SysTick->LOAD - *timer_value; // subtract it from reload value to get IRQ latency
SysTick->VAL = 0; // reset initial value
}
However I find that I always get the same IRQ latency, regardless of the instruction that was interrupted. I expect the interrupt latency to be longer when a longer instruction is interrupted.
This is the function I wrote to test the attack
extern uint32_t *timer_value;
int sample_function(int *a, int *b){
/*
* function description -- store the smallest of the two value in a, if MEASURE_CYCLESS defined return the number
* of clock cycles that have been elapsed since the timer has been started
* r0 contains pointer to a
* r1 contains pointer to b
*/
__asm volatile(
/* push working registers */
"PUSH {r4-r8} \n"
/* move counter into r8 */
"MOV r8, #10 \n"
/* begin loop */
"begin_loop: \n"
/* decrement counter variable*/
"SUB r8, r8, #1 \n"
/* if counter variable not equal to 0, jump back to start of loop */
"CMP r8, #0 \n"
/* if r8 not equal to 0, jump back to begin of loop*/
"BNE begin_loop \n"
/* load a into r2 */
"LDR r2, [r0] \n"
/* load b into r3 */
"LDR r3, [r1] \n"
/* store a-b in r4, setting status flags -- if result is 0 Z flag is set */
"SUBS r4, r2, r3 \n"
/* if a-b positive, a is larger otherwise, b is larger (assuming a not equal to b) */
"BPL a_larger \n"
#ifdef SPY
/* load address of (*timer_value) into r4 -- use of LDR pseudo-instruction places constant in a literal pool*/
"LDR r4, =timer_value \n"
/* Load (*timer_value) into r4 */
"LDR r4, [r4] \n"
/* load address of Systick VAL into r5 */
"LDR r5, =0xe000e018 \n"
/* Load value at address stored in R5 (= Systick Val) */
"LDR r5, [r5] \n"
/* Move Systick Val into adress stored at r4 (= *timer_value = address of timer_value)*/
"STR r5, [r4] \n"
#endif
"NOP \n"
/*instruction that gets interrupted -- swap value*/
"STR r2, [r1] \n"
/* load value at this address into r0 (return value) */
"STR r3, [r0] \n"
"B end \n"
"a_larger: \n"
"MOV r0, #0 \n" // instruction that gets interrupted
"end: POP {r4-r8}"
); // pop working registers
}
Note, the section of code in the #define block is used to automatically determine a good timer reload value (instead of using GDB), but I'm currently not using the value I obtained this way.
I also have an empty loop in there to delay the instruction that is meant to be interrupted a bit.
The instruction that gets interrupted is the instruction right after the #define block. When I remove the NOP instruction I still get the same interrupt latency. When I increase or decrease the timer value (to interrupt some cycles earlier or later) I also still get the same IRQ latency.
Am I missing something here? Is there some behavior I do not know about?
Also, is it important to use the attribute __attribute__((interrupt("IRQ")) for an interrupt handler?

This is what I was thinking and commenting on.
bootstrap
.thumb_func
reset:
bl notmain
ldr r4,=0xE000E018
ldr r0,=0xE000E010
mov r1,#7
str r1,[r0]
b hang
.thumb_func
hang:
nop
nop
nop
nop
nop
nop
nop
b hang
setup uart and systick
void notmain ( void )
{
uart_init();
hexstring(0x12345678);
PUT32(STK_CSR,4);
PUT32(STK_RVR,0xF40000);
PUT32(STK_CVR,0x00000000);
//PUT32(STK_CSR,7);
}
event handler
.thumb_func
.globl systick_handler
systick_handler:
ldr r0,[r4]
ldr r5,[sp,#0x18]
push {r0,lr}
bl hexstrings
mov r0,r5
bl hexstring
pop {r0,pc}
grab the timer and address of interrupted instruction and print them out.
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 0800005C
00F3FFF4 0800005E
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 08000050
08000050 <hang>:
8000050: bf00 nop
8000052: bf00 nop
8000054: bf00 nop
8000056: bf00 nop
8000058: bf00 nop
800005a: bf00 nop
800005c: bf00 nop
800005e: e7f7 b.n 8000050 <hang>
From ARM's documentation.
Interrupt Latency
There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.
And that last line we can perhaps see happening here. You can try various instructions, but this architecture has the ability to restart the long duration instructions (reads and push/pop, multiply, and such). I think to see much of a latency difference you may need to create bus or shared resource contention (vs instructions)
Also systick is an exception not an interrupt, so there may be some differences with respect to latency.

How to initialize the core Timer in an ARM Cortex-A7

Currently, I'm working on developing an Operative System for Raspberry 2, it's my final project to obtain my University degree, and right now I'm having severe problems to create a simple timer that throws an interrupt each second because the documentation provided by ARM doesn't clarify how to initialize that module.
I read the architectural reference manual, it's in ARM architecture/Reference manuals/ARMv7-AR
Can someone explain to me how it is the process of initializing a core timer?
I will adjunct what I tried so far:
In my C file
_local_timer_init();
// ROUTING IRQ
*(volatile uint32_t*)CORE0_L_TIMER_INT_CTL = 0x8;
In my assemble file
.globl _local_timer_init
/*
THIS STEPS APPLIES IN A SYSTEM WHERE THERE IS NOT VIRTUALIZATION SUPPORT
(I think so)
1. Look into CNTKCTL register if you need
2. Look into CNTP_CTL or CNTH_CTL or CNTV_CTL to enable or disable
the corresponding timer (bit 0)
3. You have to set the compare value for the corresponding timer
CNTP_CVAL, CNTH_CVAL, CNTV_CVAL if needed
4. It should be in boot.S but you have to initialize the counter
frequency register, CNTFRQ
5. Putting the corresponding TVAL register to a right value
6. Routing the IRQ and enabling IRQ of the corresponding core
*/
_local_timer_init:
// ENABLING TIMER
mov r0, #1
mcr p15, #0, r0, c14, c3, #1 //Write to CNTV_CTL
// SETTING FREQUENCY TIMER
//we don't need this right now
// SETTING TVAL REGISTER (virtual)
mrc p15, #0, r0, c14, c0, #0 //we obtain CNTFRQ
mcr p15, #0, r0, c14, c3, #0 //Write to CNTV_TVAL
I also created my custom assemble handler for IRQ exceptions like this:
maybe the problem is here, I really don't know Is this the correct way to handle an IRQ exception?
irq_s_handler:
/*Mode: PL1 irq */
srsda sp!, #0x12 //we stores the spsr and lr at the address contained in sp of the mode irq
/*
It is necessary to switch to supervisor mode and store some registers
into it's stack for having support for nested exceptions
*/
push {r0-r12}
bl irq_c_handler
pop {r0-r12}
rfeib sp! //we do the inverse operation of srsdb
subs pc, lr, #4 //we adjust the appropiate value considered

Recover from Hard Fault on Cortex M0+

Until now I had a Hard fault handler in C that I defined in the vector table:
.sect ".intvecs"
.word _top_of_main_stack
.word _c_int00
.word NMI
.word Hard_Fault
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
.word Reserved
....
....
....
One of our tests triggers a hard fault (on purpose) by writing to a non existing address. Once the test is done, the handler returns to the calling function and the cortex recovers from the fault. Worth mentioning that the handler does not have any arguments.
Now I'm in the phase of writing a real handler.
I created a struct for the stack frame so we can print PC, LR, and xPSR in case of a fault:
typedef struct
{
int R0 ;
int R1 ;
int R2 ;
int R3 ;
int R12 ;
int LR ;
int ReturnAddress ;
int xPSR ;
} InterruptStackFrame_t ;
My hard fault handler in C is defined:
void Hard_Fault(InterruptStackFrame_t* p_stack_frame)
{
// Write to external memory that I can read from outside
/* prints a message containing information about stack frame:
* p_stack_frame->LR, p_stack_frame->PC, p_stack_frame->xPSR,
* (uint32_t)p_stack_frame (SP)
*/
}
I created an assembly function:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP ; store pointer to stack frame
BL Hard_Fault ; go to C function handler
POP {R0-R7} ; pop out all stack frame
MOV PC, R5 ; jump to LR that was in the stack frame (the calling function before the fault)
.endasmfunc
This is the right time to say that I don't have an OS, so I do not have to check bit[2] of LR because I definitely know that I use MSP and not PSP.
The program compiles and runs properly and I used JTAG to ensure that all registers restore to the wanted values.
When executing the last command (MOV PC, R5) the PC returns to the correct address, but at some point, the debugger indicates that the M0 is locked in a hard fault and cannot recover.
I do not understand the difference between using a C function as a handler or an assembly function that calls a C function.
Does anyone know what is the problem?
Eventually, I will use an assert function that will stuck the processor, but I want it to be optional and up to my decision.

To explain "old_timer"'s comment:
When entering an exception or interrupt handler on the Cortex the LR register has a special value.
Normally you return from the exception handler by simply jumping to that value (by writing that value to the PC register).
The Cortex CPU will then automatically pop all the registers from the stack and it will reset the interrupt logic.
When directly jumping to the PC stored on the stack however you will destroy some registers and you don't restore the interrupt logic.
Therefore this is not a good idea.
Instead I'd do something like this:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
B Hard_Fault
EDIT
Using the B instruction may not work because the "distance" allowed for the B instruction is more limited than for the BL instruction.
However there are two possibilities you could use (unfortunately I'm not sure if these will definitely work).
The first one will return to the address that had been passed in the LR register when entering the assembler handler:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
PUSH {LR}
BL Hard_Fault
POP {PC}
The second one will indirectly do the jump:
.thumbfunc _hard_fault_wrapper
_hard_fault_wrapper: .asmfunc
MRS R0, MSP
LDR R1, =Hard_Fault
MOV PC, R1
EDIT 2
You cannot use LR because it holds EXC_RETURN value. ... You have to read the LR from stack and you must clean the stack from the stack frame, because the interrupted program doesn't know that a frame was stored.
According to the Cortex M3 manual you must exit from an exception handler by writing one of the three EXC_RETURN values to the PC register.
If you simply jump to the LR value stored in the stack frame you remain in the exception handler!
If something stupid happens during the program the CPU will assume that an exception happened inside the exception handler and it hangs.
I assume that the Cortex M0 works the same way as the M3 in this point.
If you want to modify some CPU register during the exception handler you can modify the stack frame. Thc CPU will automatically pop all registers from the stack frame when you are writing the EXC_RETURN value to the PC register.
If you want to modify one of the registers not present in the stack frame (such as R5) you can directly modify it in the exception handler.
And this shows another problem of your interrupt handler:
The instruction POP {R0-R7} will set registers R4 to R7 to values that do not match the program that has been interrupted. R12 will also be destroyed depending on the C code. This means that in the program being interrupted these four registers suddenly change while the program is not prepared for that!

Microblaze interrupt example with freeRTOS giving multiple definition of _interrupt_handler

I have created a simple example program with the Xilinx SDK that has FreeRTOS and I am running into an issue which seems quite unexpected. I want to fire an software interrupt and so I have set up the code this way.
void software_test( void ) __attribute__((interrupt_handler));
void software_test( void )
{
// clear the interrupt
*((volatile uint32_t *) 0x4120000C) = 0x80;
interrupt_occurred++;
}
When I try to compile it complains about:
\interrupt_example_bsp\microblaze_0\libsrc\freertos823_xilinx_v1_1\src/portasm.S:288: multiple definition of `_interrupt_handler'
./src/freertos_hello_world.o:\Debug/../src/freertos_hello_world.c:130: first defined here
I checked portasm.S and it has the following code in it:
.global _interrupt_handler
... bunch more unreleated code here
.text
.align 4
_interrupt_handler:
portSAVE_CONTEXT
/* Stack the return address. */
swi r14, r1, portR14_OFFSET
/* Switch to the ISR stack. */
lwi r1, r0, pulISRStack
/* The parameter to the interrupt handler. */
ori r5, r0, configINTERRUPT_CONTROLLER_TO_USE
/* Execute any pending interrupts. */
bralid r15, XIntc_DeviceInterruptHandler
or r0, r0, r0
/* See if a new task should be selected to execute. */
lwi r18, r0, ulTaskSwitchRequested
or r18, r18, r0
/* If ulTaskSwitchRequested is already zero, then jump straight to
restoring the task that is already in the Running state. */
beqi r18, task_switch_not_requested
/* Set ulTaskSwitchRequested back to zero as a task switch is about to be
performed. */
swi r0, r0, ulTaskSwitchRequested
/* ulTaskSwitchRequested was not 0 when tested. Select the next task to
execute. */
bralid r15, vTaskSwitchContext
or r0, r0, r0
... bunch more code here
I am unclear how to fix this, has anyone else encountered this.
Any help is greatly appreciated. Thanks in advance.

Here is some information on implementing a Microblaze ISR using FreeRTOS: http://www.freertos.org/RTOS-Xilinx-Microblaze-KC705.html#implementing_an_ISR

ARM M3: Using 'extra' space in GPIO peripheral memory map? Can you do this?

I'm trying to understand someones code and they are reading a memory space in the GPIO region (0x4002 0000 - 0x4002 03FF), but it's a higher address than the GPIO registers (they only go to 0x24).
Can you use all the extra space above 0x4002 0024 and below 0x4002 03FFF? What would happen if this space is read from?
EDIT:
I totally forgot I could just post the code. I bolded the line that causes me headaches:
R0 = 0x15
PUSH {R3,LR} ;
ADD.W R0, R0, R0,LSL#1 ;
MOV GPIO_Port_A_Address, #0x40020000
LSLS R0, R0, #2 ;
ADDS R2, GPIO_Port_A_Address, R0 ;
LDRB R2, [R2,#4] ;
MOVS R1, #1 ;
LSL.W R1, R1, R2 ;
LDR R0, [GPIO_Port_A_Address,R0] ;
UXTH R1, R1 ;
BL sub_8001ED8 ;
MOVS R0, #0 ;
POP {R3,PC} ;

As a start, there are other GPIO ports that live at every multiple of 0x200 from 0x40020000 - 0x400223FF and beyond that but still in your range, there are the CRC peripheral, RCC, and Flash controller. The relevant memory map is on page 50 of RM0033 (Rev 3, old version so page number is probably wrong).
0x40023C00 - 0x40023FFF Flash interface register
0x40023800 - 0x40023BFF RCC
0x40023000 - 0x400233FF CRC
0x40022000 - 0x400223FF GPIOI
0x40021C00 - 0x40021FFF GPIOH
0x40021800 - 0x40021BFF GPIOG
0x40021400 - 0x400217FF GPIOF
0x40021000 - 0x400213FF GPIOE
0X40020C00 - 0x40020FFF GPIOD
0x40020800 - 0x40020BFF GPIOC
0x40020400 - 0x400207FF GPIOB
0x40020000 - 0x400203FF GPIOA
The code you have posted, as best I have been able to calculate, does access some unimplemented addresses (0x40020100, 0x400200FC), so I'm not sure what's going on there, or if I have miscalculated. In testing on an STM32F207, I can confirm that you can read and write to this without getting a fault, but the registers are unimplemented and always read as zero.
It would be a really bad idea to use peripheral registers as general purpose memory. Not every bit will be R/W, not all addresses may be implemented, and that's not even getting into the fact that you'll be configuring hardware based on application data and not correct register values. The range you've specified includes the flash controller and RCC, both of which are vital to the operation of the microcontroller.
If you are out of memory, there are some memory spaces that you may be able to use as general purpose if they are not already used for another purpose. The STM32F2's have a 4 kB backup SRAM that can be used, though there is some setup required to make it R/W. The USB peripheral(s) also has some RAM built in for endpoint buffers. If you aren't using USB, you could abuse some of this memory, and you could configure the USB peripheral so there aren't any bad side effects.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight