What's the role of __irq in ARM System Programming? - arm

I understand __irq is used to define Interrupt Service Routine function for ARM7(v4) architecture. But what changes does it make to the function?
As per ARM Information Center:
The __irq keyword enables a C or C++ function to be used as an interrupt routine.
__irq is a function qualifier. It affects the type of the function.
What kind of special treatment does ARM compiler provide to routines defined with __irq function qualifier??

The compiler modifies the function exit/entry. This means adjusting lr, changing processor mode after return and saving & restoring registers that are not normally saved across function calls (normally r0-r3 and r12). Here is a short example:
void func()
{
...
}
Generated Assembler:
/* void func() */
stmfd sp!, {r4, lr}
...
ldmfd sp!, {r4, lr}
bx lr
Same function as IRQ:
/* void __attribute__ ((interrupt ("IRQ"))) func() */
sub lr, lr, #4
stmfd sp!, {r0, r1, r2, r3, r4, r5, ip, lr}
...
ldmfd sp!, {r0, r1, r2, r3, r4, r5, ip, pc}^
And as FIQ:
/* void __attribute__ ((interrupt ("FIQ"))) func() */
sub lr, lr, #4
stmfd sp!, {r0, r1, r2, r3, r4, lr}
...
ldmfd sp!, {r0, r1, r2, r3, r4, pc}^
Note that the exact register list also depends on some external parameters such as the ABI.

From gcc manual
The compiler generates function entry and exit sequences suitable for use in an interrupt handler when this attribute is present.
I believe armcc does the same, you can use objdump to see the difference in the created binary.
From the page you referenced:
All corrupted registers except floating-point registers are preserved, not only those that are normally preserved under the AAPCS. The default AAPCS mode must be used.

Related

gcc arm optimizes away parameters before System Call

I'm trying to implement some "OSEK-Services" on an arm7tdmi-s using gcc arm. Unfortunately turning up the optimization level results in "wrong" code generation. The main thing I dont understand is that the compiler seems to ignore the procedure call standard, e.g. passing parameters to a function by moving them into registers r0-r3. I understand that function calls can be inlined but still the parameters need to be in the registers to perform the system call.
Consider the following code to demonstrate my problem:
unsigned SysCall(unsigned param)
{
volatile unsigned ret_val;
__asm __volatile
(
"swi 0 \n\t" /* perform SystemCall */
"mov %[v], r0 \n\t" /* move the result into ret_val */
: [v]"=r"(ret_val)
:: "r0"
);
return ret_val; /* return the result */
}
int main()
{
unsigned retCode;
retCode = SysCall(5); // expect retCode to be 6 when returning back to usermode
}
I wrote the Top-Level software interrupt handler in assembly as follows:
.type SWIHandler, %function
.global SWIHandler
SWIHandler:
stmfd sp! , {r0-r2, lr} #save regs
ldr r0 , [lr, #-4] #load sysCall instruction and extract sysCall number
bic r0 , #0xff000000
ldr r3 , =DispatchTable #load dispatchTable
ldr r3 , [r3, r0, LSL #2] #load sysCall address into r3
ldmia sp, {r0-r2} #load parameters into r0-r2
mov lr, pc
bx r3
stmia sp ,{r0-r2} #store the result back on the stack
ldr lr, [sp, #12] #restore return address
ldmfd sp! , {r0-r2, lr} #load result into register
movs pc , lr #back to next instruction after swi 0
The dispatch table looks like this:
DispatchTable:
.word activateTaskService
.word getTaskStateService
The SystemCall function looks like this:
unsigned activateTaskService(unsigned tID)
{
return tID + 1; /* only for demonstration */
}
running without optimization everything works fine and the parameters are in the registers as to be expected:
See following code with -O0 optimization:
00000424 <main>:
424: e92d4800 push {fp, lr}
428: e28db004 add fp, sp, #4
42c: e24dd008 sub sp, sp, #8
430: e3a00005 mov r0, #5 #move param into r0
434: ebffffe1 bl 3c0 <SysCall>
000003c0 <SysCall>:
3c0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
3c4: e28db000 add fp, sp, #0
3c8: e24dd014 sub sp, sp, #20
3cc: e50b0010 str r0, [fp, #-16]
3d0: ef000000 svc 0x00000000
3d4: e1a02000 mov r2, r0
3d8: e50b2008 str r2, [fp, #-8]
3dc: e51b3008 ldr r3, [fp, #-8]
3e0: e1a00003 mov r0, r3
3e4: e24bd000 sub sp, fp, #0
3e8: e49db004 pop {fp} ; (ldr fp, [sp], #4)
3ec: e12fff1e bx lr
Compiling the same code with -O3 results in the following assembly code:
00000778 <main>:
778: e24dd008 sub sp, sp, #8
77c: ef000000 svc 0x00000000 #Inline SystemCall without passing params into r0
780: e1a02000 mov r2, r0
784: e3a00000 mov r0, #0
788: e58d2004 str r2, [sp, #4]
78c: e59d3004 ldr r3, [sp, #4]
790: e28dd008 add sp, sp, #8
794: e12fff1e bx lr
Notice how the systemCall gets inlined without assigning the value 5 t0 r0.
My first approach is to move those values manually into the registers by adapting the function SysCall from above as follows:
unsigned SysCall(volatile unsigned p1)
{
volatile unsigned ret_val;
__asm __volatile
(
"mov r0, %[p1] \n\t"
"swi 0 \n\t"
"mov %[v], r0 \n\t"
: [v]"=r"(ret_val)
: [p1]"r"(p1)
: "r0"
);
return ret_val;
}
It seems to work in this minimal example but Im not very sure whether this is the best possible practice. Why does the compiler think he can omit the parameters when inlining the function? Has somebody any suggestions whether this approach is okay or what should be done differently?
Thank you in advance
A function call in C source code does not instruct the compiler to call the function according to the ABI. It instructs the compiler to call the function according to the model in the C standard, which means the compiler must pass the arguments to the function in a way of its choosing and execute the function in a way that has the same observable effects as defined in the C standard.
Those observable effects do not include setting any processor registers. When a C compiler inlines a function, it is not required to set any particular processor registers. If it calls a function using an ABI for external calls, then it would have to set registers. Inline calls do not need to obey the ABI.
So merely putting your system request inside a function built of C source code does not guarantee that any registers will be set.
For ARM, what you should do is define register variables assigned to the required register(s) and use those as input and output to the assembly instructions:
unsigned SysCall(unsigned param)
{
register unsigned Parameter __asm__("r0") = param;
register unsigned Result __asm__("r0");
__asm__ volatile
(
"swi 0"
: "=r" (Result)
: "r" (Parameter)
: // "memory" // if any inputs are pointers
);
return Result;
}
(This is a major kludge by GCC; it is ugly, and the documentation is poor. But see also https://stackoverflow.com/tags/inline-assembly/info for some links. GCC for some ISAs has convenient specific-register constraints you can use instead of r, but not for ARM.) The register variables do not need to be volatile; the compiler knows they will be used as input and output for the assembly instructions.
The asm statement itself should be volatile if it has side effects other than producing a return value. (e.g. getpid() doesn't need to be volatile.)
A non-volatile asm statement with outputs can be optimized away if the output is unused, or hoisted out of loops if its used with the same input (like a pure function call). This is almost never what you want for a system call.
You also need a "memory" clobber if any of the inputs are pointers to memory that the kernel will read or modify. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details (and a way to use a dummy memory input or output to avoid a "memory" clobber.)
A "memory" clobber on mmap/munmap or other system calls that affect what memory means would also be wise; you don't want the compiler to decide to do a store after munmap instead of before.

arm7tdmi assembly explanation + crash debugging

I'm currently investigating a crash that happened compiled with gcc 4.2.1 on arm7tdmi architecture (I could use 4.9.3 on demand). I'm using LPC2387 and I'm getting wdog resets. Instead of wdog resets I'm using wdog interrupts, so when it would reset otherwise, it gets into my handler, which saves state and prints a whole memory dumps (64k only). So basically I know the registers before wdog reset and have a stack showing all of the call history.
On the stack I can see loads of references to the end of the function, and I see many instructions as data in the memory region. Which I think will become the reason for the halt and then the consequent wdog interrupt. Any ideas what might be happening?
I guess reasons can be when dereferencing a function pointer, but my function seems to be quite straight forward. It is touching many hardware registers (interrupt, peripheral enable/disable).
Like this:
2015/05/27 04:45:30: addr: 4000BF2C value:7FE00390 -->this is "svcvc 0x00e00390" according to gcc 4.2.1 and ".word 0x7fe00390" according to 4.9.3.
Also at the end of the function I see this in gcc 4.9.3
191d4: e89d6ff8 ldm sp, {r3, r4, r5, r6, r7, r8, r9, sl, fp, sp, lr}
191d8: e12fff1e bx lr
191dc: 7fe00390 .word 0x7fe00390
191e0: 40000044 .word 0x40000044
191e4: 00064de5 .word 0x00064de5
191e8: 00064dfb .word 0x00064dfb
191ec: 4000107c .word 0x4000107c
191f0: e0028000 .word 0xe0028000
191f4: e01fc000 .word 0xe01fc000
191f8: 40001084 .word 0x40001084
191fc: 4000113c .word 0x4000113c
19200: 3800b010 .word 0x3800b010
19204: 40002a78 .word 0x40002a78
19208: 40002ab4 .word 0x40002ab4
1920c: 40002aa0 .word 0x40002aa0
19210: 40001080 .word 0x40001080
19214: 400001a9 .word 0x400001a9
19218: e002c000 .word 0xe002c000
1921c: 40001134 .word 0x40001134
19220: 00064e0e .word 0x00064e0e
It used to look like this on gcc 4.2.1:
1953c: 7fe00390 svcvc 0x00e00390
19540: 40000044 andmi r0, r0, r4, asr #32
19544: 0006d74c andeq sp, r6, ip, asr #14
19548: 0006d764 andeq sp, r6, r4, ror #14
1954c: 400012d0 ldrmid r1, [r0], -r0
19550: e0028000 and r8, r2, r0
19554: e01fc000 ands ip, pc, r0
19558: 40001390 mulmi r0, r0, r3
1955c: 40001394 mulmi r0, r4, r3
19560: e002c040 and ip, r2, r0, asr #32
19564: 40002e54 andmi r2, r0, r4, asr lr
19568: e002c068 and ip, r2, r8, rrx
1956c: e002c000 and ip, r2, r0
19570: 40002e90 mulmi r0, r0, lr
19574: e002c02c and ip, r2, ip, lsr #32
19578: 3fffc000 svccc 0x00ffc000
1957c: 40002e7c andmi r2, r0, ip, ror lr
19580: 3fffc0a0 svccc 0x00ffc0a0
19584: 400012d4 ldrmid r1, [r0], -r4
19588: 400001a1 andmi r0, r0, r1, lsr #3
1958c: 400012d8 ldrmid r1, [r0], -r8
19590: 0006d778 andeq sp, r6, r8, ror r7
Can someone explain me what is in the end of the function? what are the .word regions? Why would I see pointers to this area on the stack?
Thanks,
Peter
The bytes after the end of the function chunk is usually data.
e.g. if I have void *somePtr = 0xABCDEF12; then you will typically get an LDR instruction that puts the value into a register and, assuming little-endian operation, you'll see the sequence 12 EF CD AB in hex.

Optimize C or assembly code in size for Cortex-M0

I need to reduce the code bloat for the Cortex-M0 microprocessor.
At startup the ROM data has to be copied to the RAM data once. Therefore I have this piece of code:
void __startup( void ){
extern unsigned int __data_init_start;
extern unsigned int __data_start;
extern unsigned int __data_end;
// copy .data section from flash to ram
s = & __data_init_start;
d = & __data_start;
e = & __data_end;
while( d != e ){
*d++ = *s++;
}
}
The assembly code that is generated by the compiler looks like this:
ldr r1, .L10+8
ldr r2, .L10+12
sub r0, r1, r2
lsr r3, r0, #2
add r3, r3, #1
lsl r1, r3, #2
mov r3, #0
.L4:
add r3, r3, #4
cmp r3, r1
beq .L9
.L5:
ldr r4, .L10+16
add r0, r2, r3
add r4, r3, r4
sub r4, r4, #4
ldr r4, [r4]
sub r0, r0, #4
str r4, [r0]
b .L4
How can I optimize this code so the code size is at minimum?
The compiler (or you!) does not realize that the range to copy is end - start. There seems to be some unnecessarily shuffling of data going on -- the 2 add and the sub in the loop. Also, it seems to me the compiler makes sure that the number of copies to make is a multiple of 4. An obvious optimization, then, is to make sure it is in advance! Below I assume it is (if not, the bne will fail and happily keep on copying and trample all over your memory).
Using my decade-old ARM assembler knowlegde (yes, that is a major disclaimer), and post-incrementing, I think the following short snippet is what it can be condensed to. From 18 instructions down to 8, not too bad. If it works.
ldr r1, __data_init_start
ldr r2, __data_start
ldr r3, __data_end
sub r4, r3, r2
.L1:
ldr r3, [r1], #4 ; safe to re-use r3 here
str r3, [r2], #4
subs r4, r4, #4
bne L1
May be that platform guarantees that writing to an unsigned int * you may change an unsigned int * value (i.e. it doesn't take advantage of type mismatch aliasing rules).
Then the code is inefficient because e is a global variable and the generated code logic must take in account that writing to *d may change the value of e.
Making at least e a local should solve this problem (most compilers know that aliasing a local that never had its address taken is not possible from a C point of view).

EXC_BAD_ACCESS when executing an arm blx rx

Here is the c-source code line which crashes on an armv7:
ret = fnPtr (param1, param2);
In the debugger, fnPtr has an address of 0x04216c00. When I disassemble at the pc where it's pointing at the statement above, here is what I get:
0x18918e: movw r0, #0x73c
0x189192: movt r0, #0x1
0x189196: add r0, r2
0x189198: ldr r0, [r0]
0x18919a: str r0, [sp, #0x20]
0x18919c: ldr r0, [sp, #0x20]
0x18919e: ldr r1, [sp, #0x28]
0x1891a0: ldr r2, [sp, #0x2c]
0x1891a2: str r0, [sp, #0x14]
0x1891a4: mov r0, r1
0x1891a6: mov r1, r2
0x1891a8: ldr r2, [sp, #0x14]
0x1891aa: blx r2
Now, when I disassemble the memory at address $r2 (=0x4216c00), I get what is seemingly valid code that should be executed without any problem:
(lldb) disassemble -s 0x4216c00 -C 10
0x4216c00: push {r4, r5, r6, r7, lr}
0x4216c04: add r7, sp, #0xc
0x4216c08: push {r8, r10, r11}
0x4216c0c: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
0x4216c10: sub r7, r7, #0x280
0x4216c14: mov r6, r0
0x4216c18: bx r1
0x4216c1c: add r7, r7, #0x280
Yet what really happens is this:
EXC_BAD_ACCESS (code=2, address=0x4216c00)
Can anyone explain what is wrong and why the address is considered illegal?
Full disclosure: I am no assembly expert. The code compiled and linked is all c-code. Compiler is clang.
Check the value of r2 before calling executing blx instruction. It might be odd, telling the cpu that address is in thumb mode however from the listing it looks like in arm mode.
Try forcing clang to only arm mode by -mno-thumb to test this.
The EXC_BAD_ACCESS exception has two bits of data in it, the first is the "kern_return_t" number describing the access failure, and the second is the address accessed. In your case the code is 2, which means (from /usr/include/mach/kern_return.h):
#define KERN_PROTECTION_FAILURE 2
/* Specified memory is valid, but does not permit the
* required forms of access.
*/
Not sure why this is happening, sounds like you are trying to execute code that doesn't have the execute permission set. What does:
(lldb) image lookup -va 0x4216c00
say?
BTW, the exception types are in /usr/include/mach/exception_types.h, and if the codes have machine specific meanings, those will be in, e.g. /usr/include/mach/i386/exception.h) For ARM info you may have to look in the header in the Xcode SDK.

Why does gcc save r4 in arm FIQ interrupt handlers?

Consider the following C code:
extern void dummy(void);
void foo1(void) __attribute__(( interrupt("IRQ") ));
void foo2(void) __attribute__(( interrupt("FIQ") ));
void foo1() {
dummy();
return;
}
void foo2() {
dummy();
return;
}
The code produced by arm gnueabi gcc is basically this:
foo1:
sub lr, lr, #4
stmfd sp!, {r0, r1, r2, r3, ip, lr}
bl dummy
ldmfd sp!, {r0, r1, r2, r3, ip, pc}^
foo2:
sub lr, lr, #4
stmfd sp!, {r0, r1, r2, r3, r4, lr}
bl dummy
ldmfd sp!, {r0, r1, r2, r3, r4, pc}^
The code for foo1 does not hold any surprises. r0-r3 and ip are saved, because the call to dummy may change their value. Also, after correcting lr, it is pushed and popped into pc in the end. This is fairly standard.
However, the code for foo2 is surprising. Saving the value of ip is not required, as it is a banked register. But that gcc saves r4 is surprising.
So why does gcc save r4? I don't see any reason to do that, since the call to dummy will not corrupt this register.
I suspect it does it to ensure 8-byte stack alignment required by EABI. The actual register used does not matter, it could be r12 or anything else - it's just used for the extra 4-byte adjustment.

Resources