I'm trying to add some logic at boundaries between userspace and kernelspace particularly on the ARM architecture.
One such boundary appears to be the vector_swi routine implemented in arch/arm/kernel/entry-common.S. Right now, I have most of my code written in a C function which I would like to call somewhere at the start of vector_swi.
Thus, I did the following:
ENTRY(vector_swi)
sub sp, sp, #S_FRAME_SIZE
stmia sp, {r0 - r12} # Calling r0 - r12
ARM( add r8, sp, #S_PC )
ARM( stmdb r8, {sp, lr}^ ) # Calling sp, lr
THUMB( mov r8, sp )
THUMB( store_user_sp_lr r8, r10, S_SP ) # calling sp, lr
mrs r8, spsr # called from non-FIQ mode, so ok.
str lr, [sp, #S_PC] # Save calling PC
str r8, [sp, #S_PSR] # Save CPSR
str r0, [sp, #S_OLD_R0] # Save OLD_R0
zero_fp
#ifdef CONFIG_BTM_BOUNDARIES
bl btm_entering_kernelspace # <--- My function
#endif
When the contents of my function are as follows everything works fine:
static int btm_enabled = 0;
asmlinkage inline void btm_entering_kernelspace(void)
{
int cpu;
int freq;
struct acpu_level *level;
if(!btm_enabled) {
return;
}
cpu = smp_processor_id();
freq = acpuclk_krait_get_rate(cpu);
(void) cpu;
(void) freq;
(void) level;
}
However, when I add some additional code, the kernel enters into a crash-reboot loop.
static int btm_enabled = 0;
asmlinkage inline void btm_entering_kernelspace(void)
{
int cpu;
int freq;
struct acpu_level *level;
if(!btm_enabled) {
return;
}
cpu = smp_processor_id();
freq = acpuclk_krait_get_rate(cpu);
(void) cpu;
(void) freq;
(void) level;
// --------- Added code ----------
for (level = drv.acpu_freq_tbl; level->speed.khz != 0; level++) {
if(level->speed.khz == freq) {
break;
}
}
}
Although the first instinct is to blame the logic of the added code, please note that none of it should ever execute since btm_enabled is 0.
I have double-checked and triple-checked to make sure btm_enabled is 0 by adding a sysfs entry to print out the value of the variable (with the added code removed).
Could someone explain what is going on here or what I'm doing wrong?
The first version will probably compile to just a return instruction as it has no side effect. The second needs to load btm_enabled and in the process overwrites one or two system call arguments.
When calling a C function from assembly language you need to ensure that registers that may be modified do not contain needed information.
To solve your specific problem, you could update your code to read:
#ifdef CONFIG_BTM_BOUNDARIES
stmdb sp!, {r0-r3, r12, lr} # <--- New instruction
bl btm_entering_kernelspace # <--- My function
ldmia sp!, {r0-r3, r12, lr} # <--- New instruction
#endif
The new instructions store registers r0-r3, r12 and lr onto the stack and restore them after your function call. These are the only registers a C function is allowed to modify, saving r12 here is unnecessary here is it's value is not used, but doing so keeps the stack 8-byte aligned as required by the ABI.
Related
I am new to embedded C, and I recently watched some videos about volatile qualifier. They all mention about the same things. The scenarios for the use of a volatile qualifier :
when reading or writing a variable in ISR (interrupt service routine)
RTOS application or multi thread (which is not my case)
memory mapped IO (which is also not my case)
My question is that my code does not stuck in the whiletest();function below
when my UART receives data and then triggers the void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) interrupt function
int test;
int main(void)
{
test = 0;
MX_GPIO_Init();
MX_USART1_UART_Init();
HAL_UART_Receive_IT(&huart1, (uint8_t *)&ch, 1);
while (1)
{
Delay(500);
printf("the main is runing\r\n");
whiletest();
}
}
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
if(huart->Instance == USART1)
{
test = 1;
HAL_UART_Receive_IT(&huart1, (uint8_t *)&ch, 1);
}
}
void whiletest(void)
{
int count =0;
while(!test){
count++;
printf("%d\r\n",count);
Delay(2000);
}
}
I use keil IDE and stm32cubeIDE. I learned that the compiler would optimize some instructions away if you choose the o2 or o3 optimization level. Therefore, I chose the o2 level for build option, but it seems no effect on my code. The compiler does not optimize the load instruction away in the while loop and cache the test value 0 in the main function as the videos teach on youtube. It is confusing. In what situation I am supposed to use volatile qualifier while keep my code optimized (o2 or o3 level).
note: I am using stm32h743zi (M7)
volatile informs the compiler that object is side effects prone. It means that it can be changed by something which is not in the program execution path.
As you never call the interrupt routine directly compiler assumes that the test variable will never be 1. You need to tell him (volatile does it) that it may change anyway.
example:
volatile int test;
void interruptHandler(void)
{
test = 1;
}
void foo(void)
{
while(!test);
LED_On();
}
Compiler knows that the test can be changed somehow and always read it in the while loop
foo:
push {r4, lr}
ldr r2, .L10
.L6:
ldr r3, [r2] //compiler reads the value of the test from the memory as it knows that it can change.
cmp r3, #0
beq .L6
bl LED_On
pop {r4, lr}
bx lr
.L10:
.word .LANCHOR0
test:
Without the volatile compiler will assume that the test always will be zero.
foo:
ldr r3, .L10
ldr r3, [r3]
cmp r3, #0
bne .L6
.L7:
b .L7 //dead loop here
.L6:
push {r4, lr}
bl LED_On
pop {r4, lr}
bx lr
.L10:
.word .LANCHOR0
test:
In your code you have to use volatile if the object is changed by something which is not in the program path.
The compiler may only optimize (change) code if the optimized code behaves as if it the optimizer did nothing.
In your case you are calling two functions (Delay and printf) in your while loop. The compiler has no visibility of what these functions do since they appear in a separate compiler unit. The compiler therefore must assume they may change the value of the global variable test and therefore cannot optimize out the check for the value in test. Remove the function calls and the compiler may well optimize out the check for value of test.
I'm trying to implement some "OSEK-Services" on an arm7tdmi-s using gcc arm. Unfortunately turning up the optimization level results in "wrong" code generation. The main thing I dont understand is that the compiler seems to ignore the procedure call standard, e.g. passing parameters to a function by moving them into registers r0-r3. I understand that function calls can be inlined but still the parameters need to be in the registers to perform the system call.
Consider the following code to demonstrate my problem:
unsigned SysCall(unsigned param)
{
volatile unsigned ret_val;
__asm __volatile
(
"swi 0 \n\t" /* perform SystemCall */
"mov %[v], r0 \n\t" /* move the result into ret_val */
: [v]"=r"(ret_val)
:: "r0"
);
return ret_val; /* return the result */
}
int main()
{
unsigned retCode;
retCode = SysCall(5); // expect retCode to be 6 when returning back to usermode
}
I wrote the Top-Level software interrupt handler in assembly as follows:
.type SWIHandler, %function
.global SWIHandler
SWIHandler:
stmfd sp! , {r0-r2, lr} #save regs
ldr r0 , [lr, #-4] #load sysCall instruction and extract sysCall number
bic r0 , #0xff000000
ldr r3 , =DispatchTable #load dispatchTable
ldr r3 , [r3, r0, LSL #2] #load sysCall address into r3
ldmia sp, {r0-r2} #load parameters into r0-r2
mov lr, pc
bx r3
stmia sp ,{r0-r2} #store the result back on the stack
ldr lr, [sp, #12] #restore return address
ldmfd sp! , {r0-r2, lr} #load result into register
movs pc , lr #back to next instruction after swi 0
The dispatch table looks like this:
DispatchTable:
.word activateTaskService
.word getTaskStateService
The SystemCall function looks like this:
unsigned activateTaskService(unsigned tID)
{
return tID + 1; /* only for demonstration */
}
running without optimization everything works fine and the parameters are in the registers as to be expected:
See following code with -O0 optimization:
00000424 <main>:
424: e92d4800 push {fp, lr}
428: e28db004 add fp, sp, #4
42c: e24dd008 sub sp, sp, #8
430: e3a00005 mov r0, #5 #move param into r0
434: ebffffe1 bl 3c0 <SysCall>
000003c0 <SysCall>:
3c0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
3c4: e28db000 add fp, sp, #0
3c8: e24dd014 sub sp, sp, #20
3cc: e50b0010 str r0, [fp, #-16]
3d0: ef000000 svc 0x00000000
3d4: e1a02000 mov r2, r0
3d8: e50b2008 str r2, [fp, #-8]
3dc: e51b3008 ldr r3, [fp, #-8]
3e0: e1a00003 mov r0, r3
3e4: e24bd000 sub sp, fp, #0
3e8: e49db004 pop {fp} ; (ldr fp, [sp], #4)
3ec: e12fff1e bx lr
Compiling the same code with -O3 results in the following assembly code:
00000778 <main>:
778: e24dd008 sub sp, sp, #8
77c: ef000000 svc 0x00000000 #Inline SystemCall without passing params into r0
780: e1a02000 mov r2, r0
784: e3a00000 mov r0, #0
788: e58d2004 str r2, [sp, #4]
78c: e59d3004 ldr r3, [sp, #4]
790: e28dd008 add sp, sp, #8
794: e12fff1e bx lr
Notice how the systemCall gets inlined without assigning the value 5 t0 r0.
My first approach is to move those values manually into the registers by adapting the function SysCall from above as follows:
unsigned SysCall(volatile unsigned p1)
{
volatile unsigned ret_val;
__asm __volatile
(
"mov r0, %[p1] \n\t"
"swi 0 \n\t"
"mov %[v], r0 \n\t"
: [v]"=r"(ret_val)
: [p1]"r"(p1)
: "r0"
);
return ret_val;
}
It seems to work in this minimal example but Im not very sure whether this is the best possible practice. Why does the compiler think he can omit the parameters when inlining the function? Has somebody any suggestions whether this approach is okay or what should be done differently?
Thank you in advance
A function call in C source code does not instruct the compiler to call the function according to the ABI. It instructs the compiler to call the function according to the model in the C standard, which means the compiler must pass the arguments to the function in a way of its choosing and execute the function in a way that has the same observable effects as defined in the C standard.
Those observable effects do not include setting any processor registers. When a C compiler inlines a function, it is not required to set any particular processor registers. If it calls a function using an ABI for external calls, then it would have to set registers. Inline calls do not need to obey the ABI.
So merely putting your system request inside a function built of C source code does not guarantee that any registers will be set.
For ARM, what you should do is define register variables assigned to the required register(s) and use those as input and output to the assembly instructions:
unsigned SysCall(unsigned param)
{
register unsigned Parameter __asm__("r0") = param;
register unsigned Result __asm__("r0");
__asm__ volatile
(
"swi 0"
: "=r" (Result)
: "r" (Parameter)
: // "memory" // if any inputs are pointers
);
return Result;
}
(This is a major kludge by GCC; it is ugly, and the documentation is poor. But see also https://stackoverflow.com/tags/inline-assembly/info for some links. GCC for some ISAs has convenient specific-register constraints you can use instead of r, but not for ARM.) The register variables do not need to be volatile; the compiler knows they will be used as input and output for the assembly instructions.
The asm statement itself should be volatile if it has side effects other than producing a return value. (e.g. getpid() doesn't need to be volatile.)
A non-volatile asm statement with outputs can be optimized away if the output is unused, or hoisted out of loops if its used with the same input (like a pure function call). This is almost never what you want for a system call.
You also need a "memory" clobber if any of the inputs are pointers to memory that the kernel will read or modify. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details (and a way to use a dummy memory input or output to avoid a "memory" clobber.)
A "memory" clobber on mmap/munmap or other system calls that affect what memory means would also be wise; you don't want the compiler to decide to do a store after munmap instead of before.
Our current project includes FreeRTOS, and I added --use_frame_pointer to Keil uVision's ARMGCC compiler option. But after loading firmware into STM32F104 chip, then runs it, it crashed. Without --use_frame_pointer, everything is OK.
The hard fault handler shows that faultStackAddress is 0x40FFFFDC, which points to a reserved area. Does anyone has any idea of this error? Thanks a lot.
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
}
I stepped into each line of code, and the crash happened in below function in FreeRTOS's port.c after I called vTaskDelete(NULL);
void vPortYieldFromISR( void )
{
/* Set a PendSV to request a context switch. */
portNVIC_INT_CTRL_REG = portNVIC_PENDSVSET_BIT;
}
But seems like this is not the root cause, because when I deleted vTaskDelete(NULL), crash still happened.
[update on Jan 8] sample code
#include "FreeRTOSConfig.h"
#include "FreeRTOS.h"
#include "task.h"
#include <stm32f10x.h>
void crashTask(void *param)
{
unsigned int i = 0;
/* halt the hardware. */
while(1)
{
i += 1;
}
vTaskDelete(NULL);
}
void testCrashTask()
{
xTaskCreate(crashTask, (const signed char *)"crashTask", configMINIMAL_STACK_SIZE, NULL, 1, NULL);
}
void Hard_Fault_Handler(unsigned int *faultStackAddress);
/* The fault handler implementation calls a function called Hard_Fault_Handler(). */
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(unsigned int *faultStackAddress)
{
int i = 0;
while(1)
{
i += 1;
}
}
void nvicInit(void)
{
NVIC_PriorityGroupConfig(NVIC_PriorityGroup_4);
#ifdef VECT_TAB_RAM
NVIC_SetVectorTable(NVIC_VectTab_RAM, 0x0);
#else
NVIC_SetVectorTable(NVIC_VectTab_FLASH, 0x0);
#endif
}
int main()
{
nvicInit();
testCrashTask();
vTaskStartScheduler();
}
/* For now, the stack depth of IDLE has 88 left. if want add func to here,
you should increase it. */
void vApplicationIdleHook(void)
{ /* ATTENTION: all funcs called within here, must not be blocked */
//workerProbe();
}
void debugSendTraceInfo(unsigned int taskNbr)
{
}
When crash happened, in HardFault_Handler, Keil MDK IDE reports below fault information. I looked the STKERR error, which mainly means that stack pointer is corrupted. But I really have no idea why it is corrupted. Without --use_frame_pointer, everything works OK.
[update on Jan 13]
I did further investigation. Seems like the crash is caused by FreeRTOS's default TimerTask. If I comment out the xTimerCreateTimerTask() in vTaskStartScheduler() function(tasks.c), the crash does not happen.
Another odd thing is that if I debug it and step into the TimerTask's portYIELD_WITHIN_API() function call, then resume the application. It does not crash. So my guess is that this might due to certain time sequence. But I could not find the root cause of it.
Any thoughts? Thanks.
I ran into a similar problem in my project. It looks that armcc --use_frame_pointer tends to generate broken function epilogues. An example of generated code:
; function prologue
stmdb sp!, {r3, r4, r5, r6, r7, r8, r9, r10, r11, lr}
add.w r11, sp, #36
; ... actual function code ...
; function epilogue
mov sp, r11
; <--- imagine an interrupt happening here
sub sp, #36
ldmia.w sp!, {r3, r4, r5, r6, r7, r8, r9, r10, r11, pc}
This code actually seems to break the constraint from AAPCS section 5.2.1.1:
A process may only access (for reading or writing) the closed interval of the entire stack delimited by [SP, stack-base – 1] (where SP is the value of register r13).
Now, on Cortex-M3, when an exception/interrupt arrives, partial register set is automatically pushed onto the current process' stack before jumping into the exception handler. If an exception is raised between the mov and sub, that partial register set will overwrite the registers stored by the function prologue's stmdb instruction, thus corrupting the state of the caller function.
Unfortunately, there doesn't seem to be any easy solution. None of the optimization settings seems to fix this code that looks like it can be easily fixed (coerced into sub sp, r11, #36). It seems that --use_frame_pointer is too broken to work on Cortex-M3 with multi-threaded code. At least on ARMCC 5.05u1, I didn't have the chance to check other versions.
If using a different compiler is an option for you, arm-none-eabi-gcc -fno-omit-frame-pointer seems to emit saner function epilogues, though.
Suppose I am given as input to a function foo some pointer *pL that points to a pointer to a struct that has a pointer field next in it. I know this is weird, but all I want to implement in assembly is the line of code with the ** around it:
typedef struct CELL *LIST;
struct CELL {
int element;
LIST next;
};
void foo(LIST *pL){
**(*pL)->next = NULL;**
}
How do I implement this in ARM assembly? The issue comes from having nested startements when I want to store such as:
.irrelevant header junk
foo:
MOV R1, #0
STR R1, [[R0,#0],#4] #This is gibberish, but [R0,#0] is to dereference and the #4 is to offeset that.
The sequence would be similar to:
... ; r0 = LIST *pL = CELL **ppC (ptr2ptr2cell)
ldr r0,[r0] ; r0 = CELL *pC (ptr2cell)
mov r1,#0 ; r1 = NULL
str r1,[r0,#4] ; (*pL)->next = pC->next = (*pC).next = NULL
The correct sequence would be (assuming ARM ABI and LIST *pL is in R0),
.global foo
foo:
ldr r0, [r0] # get *pL to R0
mov r1, #0 # set R1 to zero.
str r1, [r0, #4] # set (*pL)->List = NULL;
bx lr # return
You can swap the first two assembler statements, but it is generally better to interleave ALU with load/store for performance. With the ARM ABI, you can use R0-R3 without saving. foo() above should be callable from 'C' with most ARM compilers.
The 'C' might be simplified to,
void foo(struct CELL **pL)
{
(*pL)->next = NULL;
}
if I understand correctly.
I'm working on writing a program running on Cortex-m3.
At first I wrote an assembly file which executes 'svc'.
svc:
svc 0
bx lr
I decided to use gcc's inline asm, so I wrote it as follows, but the svc function was not inlined.
__attribute__((naked))
int svc(int no, ...)
{
(void)no;
asm("svc 0\n\tbx lr");
}
int f() {
return svc(0,1,2);
}
------------------ generated assembly ------------------
svc:
svc 0
bx lr
f:
mov r0, #0
mov r1, #1
mov r2, #2
b svc
I guess it's not inlined since it is naked, so I dropped the naked attribute and wrote like this.
int svc(int __no, ...)
{
register int no asm("r0") = __no;
register int ret asm("r0");
asm("svc 0" : "=r"(ret) : "r"(no));
return ret;
}
------------------ generated assembly ------------------
svc:
stmfd sp!, {r0, r1, r2, r3}
ldr r0, [sp]
add sp, sp, #16
svc 0
bx lr
f:
mov r0, #0 // missing instructions setting r1 and r2
svc 0
bx lr
Although I don't know why gcc adds some unnecessary stack operations, svc is good. The problem is that svc is not inlined properly, the variadic parameters were dropped.
Is there any svc primitive in gcc? If gcc does not have one, how do I write the right one?
Have a look at the syntax that is used in core_cmFunc.h which is supplied as part of the ARM CMSIS for the Cortex-M family. Here's an example that writes a value to the Priority Mask Register:
__attribute__ ((always_inline)) static inline void __set_PRIMASK(uint32_t priMask)
{
__ASM volatile ("MSR primask, %0"::"r" (priMask));
}
However, creating a variadic function like this sounds difficult.
You can use a macro like this.
#define __svc(sNum) __asm volatile("SVC %0" ::"M" (sNum))
And use it just like any compiler-primitive function, __svc(2);.
Since it is just a macro, it will only generate the provided instruction.