Is there any gcc compiler primitive for "svc"? - c

I'm working on writing a program running on Cortex-m3.
At first I wrote an assembly file which executes 'svc'.
svc:
svc 0
bx lr
I decided to use gcc's inline asm, so I wrote it as follows, but the svc function was not inlined.
__attribute__((naked))
int svc(int no, ...)
{
(void)no;
asm("svc 0\n\tbx lr");
}
int f() {
return svc(0,1,2);
}
------------------ generated assembly ------------------
svc:
svc 0
bx lr
f:
mov r0, #0
mov r1, #1
mov r2, #2
b svc
I guess it's not inlined since it is naked, so I dropped the naked attribute and wrote like this.
int svc(int __no, ...)
{
register int no asm("r0") = __no;
register int ret asm("r0");
asm("svc 0" : "=r"(ret) : "r"(no));
return ret;
}
------------------ generated assembly ------------------
svc:
stmfd sp!, {r0, r1, r2, r3}
ldr r0, [sp]
add sp, sp, #16
svc 0
bx lr
f:
mov r0, #0 // missing instructions setting r1 and r2
svc 0
bx lr
Although I don't know why gcc adds some unnecessary stack operations, svc is good. The problem is that svc is not inlined properly, the variadic parameters were dropped.
Is there any svc primitive in gcc? If gcc does not have one, how do I write the right one?

Have a look at the syntax that is used in core_cmFunc.h which is supplied as part of the ARM CMSIS for the Cortex-M family. Here's an example that writes a value to the Priority Mask Register:
__attribute__ ((always_inline)) static inline void __set_PRIMASK(uint32_t priMask)
{
__ASM volatile ("MSR primask, %0"::"r" (priMask));
}
However, creating a variadic function like this sounds difficult.

You can use a macro like this.
#define __svc(sNum) __asm volatile("SVC %0" ::"M" (sNum))
And use it just like any compiler-primitive function, __svc(2);.
Since it is just a macro, it will only generate the provided instruction.

Related

Does using the `__irq` specified for a function pointer declaration do anything?

I have to do a bit of embedded programming for a project and am learning by looking at some other projects. I found the following code that declares the vector table:
typedef void (*const vect_t)(void) __irq;
vect_t vector_table[]
__attribute__ ((section("vectors"))) = {
(vect_t) (RAM_BASE + RAM_SIZE),
(vect_t) Reset_Handler,
// ...
};
The reset handler is declared as follows:
void Reset_Handler(void) {
// ... no interesting
}
I read up on __irq and the ARM compiler docs state the following:
The compiler generates function entry and exit sequences suitable for
use in an interrupt handler when this attribute is present.
I'm guessing that vect_t is supposed to be a pointer to void functions that take no arguments, that are suitable to be used as interrupt handlers. This seems strange to me, as __irq should just be a compiler hint for the implementation, but not something that contributes to the type of a function (like arguments or return type do).
My assumption is that __irq should have been used on Reset_Handler (and on all other interrupt handlers) and not in the type definition. Is this correct?
Please note that I am not asking what __irq does. I understand that this is not part of the C standard and that it is an ARM compiler extension. I also understand that the code that is produced when using it depends on the CPU architecture.
Generally speaking, interrupt service routines (ISR) use different instructions for returning. A normal function just uses a "return from subroutine" instruction which pops the stack according to the calling convention. ISRs are however not called by the program but by hardware, so they often have a different calling convention. In order to generate these special instructions correctly, you need some non-standard interrupt syntax.
The code is an interrupt vector table, so the type definition is correct. However, in case the ISR is declared as a plain function without any special keywords void Reset_Handler(void), then this won't work. The incorrect cast here (vect_t) Reset_Handler will ensure that this function is called upon interrupt, but it will not return from that function correctly - likely crashing.
My assumption is that __irq should have been used on Reset_Handler (and on all other interrupt handlers) and not in the type definition. Is this correct?
It should be in the vector table and in the ISR function definition both.
Using gcc for example (attributes/directives/pragmas etc are specific to a tool not to the C language)
struct interrupt_frame;
__attribute__ ((interrupt))
void x (struct interrupt_frame *frame)
{
}
void y ( void )
{
}
Using a generic aarch32 type arm target:
Disassembly of section .text:
00000000 <x>:
0: e25ef004 subs pc, lr, #4
00000004 <y>:
4: e12fff1e bx lr
Now let's complicate this further
struct interrupt_frame;
unsigned int k;
__attribute__ ((interrupt))
void x (struct interrupt_frame *frame)
{
k=5;
}
void y ( void )
{
k=5;
}
00000000 <x>:
0: e92d000c push {r2, r3}
4: e3a02005 mov r2, #5
8: e59f3008 ldr r3, [pc, #8] ; 18 <x+0x18>
c: e5832000 str r2, [r3]
10: e8bd000c pop {r2, r3}
14: e25ef004 subs pc, lr, #4
0000001c <y>:
1c: e3a02005 mov r2, #5
20: e59f3004 ldr r3, [pc, #4] ; 2c <y+0x10>
24: e5832000 str r2, [r3]
28: e12fff1e bx lr
For an interrupt you need to preserve all the registers in an interrupt, for a regular function the calling convention dictates which registers are volatile within the function. So with this example you can see the primary reason for the directive, preserve the state and use the specific return from interrupt instruction.
Because the cortex-m architectures (armv6-m, 7-m and 8-m) were designed so that you could put C functions directly in the vector table without any wrapping of asm around them (the hardware takes care of both preserving state and the special return issues). The compiler generates code the same way, basically the attribute has no effect on that target:
00000000 <x>:
0: 2205 movs r2, #5
2: 4b01 ldr r3, [pc, #4] ; (8 <x+0x8>)
4: 601a str r2, [r3, #0]
6: 4770 bx lr
0000000c <y>:
c: 2205 movs r2, #5
e: 4b01 ldr r3, [pc, #4] ; (14 <y+0x8>)
10: 601a str r2, [r3, #0]
12: 4770 bx lr
And the last note is that you do not return from the reset vector so there is no reason for cortex-m to even bother with an attribute/directive like this for the reset vector. Well no architecture should you return from the reset vector if it is truly a bare-metal vector table (vs using the same scheme for general application entry sitting on an os, not-bare-metal) (or a bootloader calling this code you can certainly return).
Other architectures do not tend to lump reset in the list of "interrupts" or "exceptions" reset is reset, ARM docs and code tend to think of them as any other exception and as a result you have to still think of it differently.

gcc arm optimizes away parameters before System Call

I'm trying to implement some "OSEK-Services" on an arm7tdmi-s using gcc arm. Unfortunately turning up the optimization level results in "wrong" code generation. The main thing I dont understand is that the compiler seems to ignore the procedure call standard, e.g. passing parameters to a function by moving them into registers r0-r3. I understand that function calls can be inlined but still the parameters need to be in the registers to perform the system call.
Consider the following code to demonstrate my problem:
unsigned SysCall(unsigned param)
{
volatile unsigned ret_val;
__asm __volatile
(
"swi 0 \n\t" /* perform SystemCall */
"mov %[v], r0 \n\t" /* move the result into ret_val */
: [v]"=r"(ret_val)
:: "r0"
);
return ret_val; /* return the result */
}
int main()
{
unsigned retCode;
retCode = SysCall(5); // expect retCode to be 6 when returning back to usermode
}
I wrote the Top-Level software interrupt handler in assembly as follows:
.type SWIHandler, %function
.global SWIHandler
SWIHandler:
stmfd sp! , {r0-r2, lr} #save regs
ldr r0 , [lr, #-4] #load sysCall instruction and extract sysCall number
bic r0 , #0xff000000
ldr r3 , =DispatchTable #load dispatchTable
ldr r3 , [r3, r0, LSL #2] #load sysCall address into r3
ldmia sp, {r0-r2} #load parameters into r0-r2
mov lr, pc
bx r3
stmia sp ,{r0-r2} #store the result back on the stack
ldr lr, [sp, #12] #restore return address
ldmfd sp! , {r0-r2, lr} #load result into register
movs pc , lr #back to next instruction after swi 0
The dispatch table looks like this:
DispatchTable:
.word activateTaskService
.word getTaskStateService
The SystemCall function looks like this:
unsigned activateTaskService(unsigned tID)
{
return tID + 1; /* only for demonstration */
}
running without optimization everything works fine and the parameters are in the registers as to be expected:
See following code with -O0 optimization:
00000424 <main>:
424: e92d4800 push {fp, lr}
428: e28db004 add fp, sp, #4
42c: e24dd008 sub sp, sp, #8
430: e3a00005 mov r0, #5 #move param into r0
434: ebffffe1 bl 3c0 <SysCall>
000003c0 <SysCall>:
3c0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
3c4: e28db000 add fp, sp, #0
3c8: e24dd014 sub sp, sp, #20
3cc: e50b0010 str r0, [fp, #-16]
3d0: ef000000 svc 0x00000000
3d4: e1a02000 mov r2, r0
3d8: e50b2008 str r2, [fp, #-8]
3dc: e51b3008 ldr r3, [fp, #-8]
3e0: e1a00003 mov r0, r3
3e4: e24bd000 sub sp, fp, #0
3e8: e49db004 pop {fp} ; (ldr fp, [sp], #4)
3ec: e12fff1e bx lr
Compiling the same code with -O3 results in the following assembly code:
00000778 <main>:
778: e24dd008 sub sp, sp, #8
77c: ef000000 svc 0x00000000 #Inline SystemCall without passing params into r0
780: e1a02000 mov r2, r0
784: e3a00000 mov r0, #0
788: e58d2004 str r2, [sp, #4]
78c: e59d3004 ldr r3, [sp, #4]
790: e28dd008 add sp, sp, #8
794: e12fff1e bx lr
Notice how the systemCall gets inlined without assigning the value 5 t0 r0.
My first approach is to move those values manually into the registers by adapting the function SysCall from above as follows:
unsigned SysCall(volatile unsigned p1)
{
volatile unsigned ret_val;
__asm __volatile
(
"mov r0, %[p1] \n\t"
"swi 0 \n\t"
"mov %[v], r0 \n\t"
: [v]"=r"(ret_val)
: [p1]"r"(p1)
: "r0"
);
return ret_val;
}
It seems to work in this minimal example but Im not very sure whether this is the best possible practice. Why does the compiler think he can omit the parameters when inlining the function? Has somebody any suggestions whether this approach is okay or what should be done differently?
Thank you in advance
A function call in C source code does not instruct the compiler to call the function according to the ABI. It instructs the compiler to call the function according to the model in the C standard, which means the compiler must pass the arguments to the function in a way of its choosing and execute the function in a way that has the same observable effects as defined in the C standard.
Those observable effects do not include setting any processor registers. When a C compiler inlines a function, it is not required to set any particular processor registers. If it calls a function using an ABI for external calls, then it would have to set registers. Inline calls do not need to obey the ABI.
So merely putting your system request inside a function built of C source code does not guarantee that any registers will be set.
For ARM, what you should do is define register variables assigned to the required register(s) and use those as input and output to the assembly instructions:
unsigned SysCall(unsigned param)
{
register unsigned Parameter __asm__("r0") = param;
register unsigned Result __asm__("r0");
__asm__ volatile
(
"swi 0"
: "=r" (Result)
: "r" (Parameter)
: // "memory" // if any inputs are pointers
);
return Result;
}
(This is a major kludge by GCC; it is ugly, and the documentation is poor. But see also https://stackoverflow.com/tags/inline-assembly/info for some links. GCC for some ISAs has convenient specific-register constraints you can use instead of r, but not for ARM.) The register variables do not need to be volatile; the compiler knows they will be used as input and output for the assembly instructions.
The asm statement itself should be volatile if it has side effects other than producing a return value. (e.g. getpid() doesn't need to be volatile.)
A non-volatile asm statement with outputs can be optimized away if the output is unused, or hoisted out of loops if its used with the same input (like a pure function call). This is almost never what you want for a system call.
You also need a "memory" clobber if any of the inputs are pointers to memory that the kernel will read or modify. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details (and a way to use a dummy memory input or output to avoid a "memory" clobber.)
A "memory" clobber on mmap/munmap or other system calls that affect what memory means would also be wise; you don't want the compiler to decide to do a store after munmap instead of before.

Different gcc assembly when using designated initializers

I was checking some gcc generated assembly for ARM and noticed that I get strange results if I use designated initializers:
E.g. if I have this code:
struct test
{
int x;
int y;
};
__attribute__((noinline))
struct test get_struct_1(void)
{
struct test x;
x.x = 123456780;
x.y = 123456781;
return x;
}
__attribute__((noinline))
struct test get_struct_2(void)
{
return (struct test){ .x = 123456780, .y = 123456781 };
}
I get the following output with gcc -O2 -std=C11 for ARM (ARM GCC 6.3.0):
get_struct_1:
ldr r1, .L2
ldr r2, .L2+4
stm r0, {r1, r2}
bx lr
.L2:
.word 123456780
.word 123456781
get_struct_2: // <--- what is happening here
mov r3, r0
ldr r2, .L5
ldm r2, {r0, r1}
stm r3, {r0, r1}
mov r0, r3
bx lr
.L5:
.word .LANCHOR0
I can see the constants for the first function, but I don't understand how get_struct_2 works.
If I compile for x86, both functions just load the same single 64-bit value in a single instruction.
get_struct_1:
movabs rax, 530242836987890956
ret
get_struct_2:
movabs rax, 530242836987890956
ret
Am I provoking some undefined behavior, or is this .LANCHOR0 somehow related to these constants?
Looks like gcc shoots itself in the foot with an extra level of indirection after merging the loads of the constants into an ldm.
No idea why, but pretty obviously a missed optimization bug.
x86-64 is easy to optimize for; the entire 8-byte constant can go in one immediate. But ARM often uses PC-relative loads for constants that are too big for one immediate.

ARM assembly calling a function with registers as parameters using C

I have the following ARM assembly code:
mov r0, SP
mov r1, LR
bl func
Is there a way of calling the function func using C code? something like func(SP, LR)
Thanks!
Depends on what exactly you want to do and what compiler you use.
With gcc something like this could work:
extern void func(void*, void*);
void foo()
{
int dummy[4];
func(&dummy, __builtin_return_address(0));
}
This might not always give you the exact stack pointer, though. As per godbolt it produces the following assembly code:
foo():
push {lr}
sub sp, sp, #20
mov r1, lr
mov r0, sp
bl func(void*, void*)
add sp, sp, #20
ldr pc, [sp], #4
Use output registers to place LR and SP in variables:
void *lr, *sp;
asm ("mov %0, sp" : "=r" (sp));
asm ("mov %0, lr" : "=r" (lr));
func(lr, sp);

Inline Assembly: Passing pointers to a function and using it in that function in assembly

I'm using ARM/Cortex-A8 processor platform.
I have a simple function where I have to pass two pointers to a function. These pointers are later used in that function which has only my inline assembly code This plan is only to achieve performance.
function(unsigned char *input, unsigned char *output)
{
// What are the assembly instructions to use these two pointers here?
// I will inline the assembly instructions here
}
main()
{
unsigned char input[1000], output[1000];
function(input, output);
}
Thanks
Assuming you're using a normal ARM ABI, those two parameters will be passed in R0 and R1. Here is a quick example showing how to copy the bytes from the input buffer to the output buffer (gcc syntax):
.text
.globl _function
_function:
mov r2, #0 // initialize loop counter
loop:
ldrb r3, [r0, r2] // load r3 with input[r2]
strb r3, [r1, r2] // store r3 to output[r2]
add r2, r2, #1 // increment loop counter
cmp r2, #1000 // test loop counter
bne loop
mov pc, lr

Resources