How to convert while loop with arrays in C to ARM - c

I'm currently learning ARM and I have to convert a particular C snippet into ARM code to test on the machine. However, I don't know how to declare array in ARM specifically and I didn't find very relevant resources that explained it..
i = 0
while (i < 0xB) {
x[i] = i*2;
}
Many thanks in advance

Since, there is not answer to this question, I'm just going to use GodBolt and input the OP's code there.
I'm taking the liberty of formatting the given code as so:
void assign(int num) {
int x[10];
int i = 0;
while(i < 0xB) {
x[i] = i*2;
}
}
The compiler used to translate this is ARM gcc 5.4.1
Translated Code:
assign(int):
str fp, [sp, #-4]!
add fp, sp, #0
sub sp, sp, #60
str r0, [fp, #-56]
mov r3, #0
str r3, [fp, #-8]
.L3:
ldr r3, [fp, #-8]
cmp r3, #10
bgt .L4
ldr r3, [fp, #-8]
mov r2, r3, asl #1
ldr r3, [fp, #-8]
mov r3, r3, asl #2
sub r1, fp, #4
add r3, r1, r3
str r2, [r3, #-44]
b .L3
.L4:
mov r0, r0 # nop
sub sp, fp, #0
ldr fp, [sp], #4
bx lr
Source: https://godbolt.org/g/5Tv3gk

Just write the code in C and let the compiler translate it to ARM assembly. Then inspect the generated code.

Related

Exact behaviour of --mno-unaligned-access

I'll start with the question, and will follow it by an example.
The description of this flag in ARM Compiler armclang Reference Guide Version 6.4 (link) says:
If unaligned access is disabled, words in packed data structures are accessed one byte at a time.
As you can see in the following example, after the 1 byte access on line 1e0 there is (aligned) word access on line 1e2. By the above description I would expect that the form of access on 1e0 would be used to the rest of the bytes of M[1].A. I would like to ask for an exact description of the behavior with this flag set: does it always as in this example? meaning that over aligned addresses it will be able to extract words even on packed structs?
Example: for this code,
typedef struct __attribute__((packed, aligned(1))) MyStruct{
int A;
short B;
char C;
} MyStruct_t;
int main(void) {
MyStruct_t M[2];
int D, E;
M[0].A = 0xffffffff;
M[1].A = 0xeeeeeeee;
D = M[0].A;
E = M[1].A;
D = E;
return 0 ;
}
compiled with --mno-unaligned-access and like that (using MCUXpresso ide):
arm-none-eabi-gcc -nostdlib -Xlinker -Map="m7_experiments.map" -Xlinker --cref -Xlinker --gc-sections -Xlinker -print-memory-usage -mcpu=cortex-m7 -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mthumb -T "m7_experiments_Debug.ld" -o "m7_experiments.axf" $(OBJS) $(USER_OBJS) $(LIBS)
I'm getting the following machine code:
000001b0 <main>:
1b0: b480 push {r7}
1b2: b087 sub sp, #28
1b4: af00 add r7, sp, #0
1b6: f04f 33ff mov.w r3, #4294967295 ; 0xffffffff
1ba: 603b str r3, [r7, #0]
1bc: 2300 movs r3, #0
1be: f063 0311 orn r3, r3, #17
1c2: 71fb strb r3, [r7, #7]
1c4: 2300 movs r3, #0
1c6: f063 0311 orn r3, r3, #17
1ca: 723b strb r3, [r7, #8]
1cc: 2300 movs r3, #0
1ce: f063 0311 orn r3, r3, #17
1d2: 727b strb r3, [r7, #9]
1d2: 727b strb r3, [r7, #9]
1d4: 2300 movs r3, #0
1d6: f063 0311 orn r3, r3, #17
1da: 72bb strb r3, [r7, #10]
1dc: 683b ldr r3, [r7, #0]
1de: 617b str r3, [r7, #20]
1e0: 79fb ldrb r3, [r7, #7]
1e2: 68ba ldr r2, [r7, #8]
1e4: f022 427f bic.w r2, r2, #4278190080 ; 0xff000000
1e8: 0212 lsls r2, r2, #8
1ea: 4313 orrs r3, r2
1ec: 613b str r3, [r7, #16]
1ee: 693b ldr r3, [r7, #16]
1f0: 617b str r3, [r7, #20]
1f2: 2300 movs r3, #0
1f4: 4618 mov r0, r3
1f6: 371c adds r7, #28
1f8: 46bd mov sp, r7
1fa: f85d 7b04 ldr.w r7, [sp], #4
1fe: 4770 bx lr
EDIT: with the complementary flag munaligned-access we receive what would be expected on this case:
000001b0 <main>:
1b0: b480 push {r7}
1b2: b087 sub sp, #28
1b4: af00 add r7, sp, #0
1b6: f04f 33ff mov.w r3, #4294967295 ; 0xffffffff
1ba: 603b str r3, [r7, #0]
1bc: 2300 movs r3, #0
1be: f063 0311 orn r3, r3, #17
1c2: 71fb strb r3, [r7, #7]
1c4: 2300 movs r3, #0
1c6: f063 0311 orn r3, r3, #17
1ca: 723b strb r3, [r7, #8]
1cc: 2300 movs r3, #0
1ce: f063 0311 orn r3, r3, #17
1d2: 727b strb r3, [r7, #9]
1d4: 2300 movs r3, #0
1d6: f063 0311 orn r3, r3, #17
1da: 72bb strb r3, [r7, #10]
1dc: 683b ldr r3, [r7, #0]
1de: 617b str r3, [r7, #20]
1e0: f8d7 3007 ldr.w r3, [r7, #7]
1e4: 613b str r3, [r7, #16]
1e6: 693b ldr r3, [r7, #16]
1e8: 617b str r3, [r7, #20]
1ea: 2300 movs r3, #0
1ec: 4618 mov r0, r3
1ee: 371c adds r7, #28
1f0: 46bd mov sp, r7
1f2: f85d 7b04 ldr.w r7, [sp], #4
1f6: 4770 bx lr
The behaviour here is because even though the type is packed and potentially misaligned, the compiler knows that any instance of it on the stack must be aligned, and so aligned members of it can be accessed using word sized reads and writes.
If you access the packed struct through a pointer then the compiler doesn't know its alignment, and so the behaviour is very different.
I have not been able to reproduce this exact behaviour on godbolt because it doesn't have your version of armclang, but look at this example compiled with gcc 11:
typedef struct __attribute__((packed, aligned(1))) MyStruct{
int A;
short B;
char C;
} MyStruct_t;
int main(void) {
MyStruct_t M[2];
int D, E;
M[0].A = 0xffffffff;
M[1].A = 0xeeeeeeee;
D = M[0].A;
E = M[1].A;
D = E;
return 0 ;
}
int fn(MyStruct_t *M) {
int D, E;
M[0].A = 0xffffffff;
M[1].A = 0xeeeeeeee;
D = M[0].A;
E = M[1].A;
D = E;
return 0 ;
}
The same lines which use a str in the first function use four strb in the second.
Your struct is naturally aligned (and packed). Try this instead.
{
unsigned char C;
unsigned short B;
unsigned int A;
}
GCC appears to default to byte accesses, and clang (the one from llvm, I don't know what armclang is, nor did I have it nor try it) defaults to unaligned accesses.
I found that gnu always did stores a byte at a time but loads varied based on the command line option. And clang the store and loads were based on the command line.
The quote from the link is already flawed because it only mentions words not halfwords nor double words (nor floats of any kind). In any case you are correct, if the item is aligned the flag does not force it to be broken into byte accesses, the statement does not match (llvm) clang. I do not know why one would want it to force byte accesses for aligned items as in your example. It would make sense to have a flag to avoid unaligned accesses (as the name of the flag implies) and keep aligned ones.
The quote is also badly written as it implies the software might know if unaligned accesses are disabled. There is no check in the code to see if the processor is set to block unaligned accesses.
You can contact ARM and see if you can get them to fix the web page.
The flag causes packed structs to not generate unaligned accesses.
This is an example of something already significantly better.
The flag causes access to packed structs to only generate aligned accesses.
This is another.

Why does armclang not use VCVT instruction for efficient integer to float conversion?

I need to convert an integer value into a float value on a Cortex-M4 with FPU; for example:
float convert(int n) {
return (float) n;
}
armclang compiler translates this to:
push {r11, lr}
mov r11, sp
sub sp, sp, #8
str r0, [sp, #4]
ldr r0, [sp, #4]
bl __aeabi_i2f
mov sp, r11
pop {r11, lr}
bx lr
(Godbolt Link: https://godbolt.org/z/K59xGq78W)
The conversion from int to float is made by calling the library routine __aeabi_i2f which is much less efficient than using the FPU instruction VCVT.
For example, the GCC makes use of VCVT:
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
ldr r3, [r7, #4]
vmov s15, r3 # int
vcvt.f32.s32 s15, s15
vmov.f32 s0, s15
adds r7, r7, #12
mov sp, r7
ldr r7, [sp], #4
bx lr
(https://godbolt.org/z/Pdv3nEMYq)
Is there a way to tell armclang to use the VCVT instruction?
Use the option -march=armv7+fp to tell the compiler to generate code for a machine with an FPU.
Godbolt

ARM64 Backtrace from link register

I am currently trying to get backtrace based on stack pointer and link register on ARM64 device using C program.
Below is example of objdump
bar() calls foo() with 240444: ebfffd68 bl 23f9ec <foo##Base>
I can get link register (lr) and from that getting 23f9ec, save it to backtrace list as last routine.
My question: From below assembly code with current lr 0023f9ec <foo##Base>:, how to calculate to get previous routine with lr is 0023fe14 <bar##Base> using C language?
here is my implementation, but getting wrong previous lr
int bt(void** backtrace, int max_size) {
unsigned long* sp = __get_SP();
unsigned long* ra = __get_LR();
int* funcbase = (int*)(int)&bt;
int spofft = (short)((*funcbase));
sp = (char*)sp-spofft;
unsigned long* wra = (unsigned long*)ra;
int spofft;
int depth = 0;
while(ra) {
wra = ra;
while((*wra >> 16) != 0xe92d) {
wra--;
}
if(wra == 0)
return 0;
spofft = (short)(*wra & 0xffff);
if(depth < max_size)
backtrace[depth] = ra;
else
break;
ra =(unsigned long *)((unsigned long)ra + spofft);
sp =(unsigned long *)((unsigned long)sp + spofft);
depth++;
}
return 1;
}
0023f9ec <foo##Base>:
23f9ec: e92d42f3 push {r0, r1, r4, r5, r6, r7, r9, lr}
23f9f0: e1a09001 mov r9, r1
23f9f4: e1a07000 mov r7, r0
23f9f8: ebfffff9 bl 23f9e4 <__get_SP##Base>
23f9fc: e59f4060 ldr r4, [pc, #96] ; 23fa64 <foo##Base+0x78>
23fa00: e08f4004 add r4, pc, r4
23fa04: e1a05000 mov r5, r0
23fa08: ebfffff3 bl 23f9dc <__get_LR##Base>
23fa0c: e59f3054 ldr r3, [pc, #84] ; 23fa68 <foo##Base+0x7c>
23fa10: e3002256 movw r2, #598 ; 0x256
23fa14: e59f1050 ldr r1, [pc, #80] ; 23fa6c <foo##Base+0x80>
23fa18: e7943003 ldr r3, [r4, r3]
23fa1c: e08f1001 add r1, pc, r1
23fa20: e5934000 ldr r4, [r3]
23fa24: e1a03005 mov r3, r5
23fa28: e6bf4074 sxth r4, r4
23fa2c: e58d4004 str r4, [sp, #4]
23fa30: e1a06000 mov r6, r0
23fa34: e58d0000 str r0, [sp]
23fa38: e59f0030 ldr r0, [pc, #48] ; 23fa70 <foo##Base+0x84>
23fa3c: e08f0000 add r0, pc, r0
23fa40: ebfd456d bl 190ffc <printf#plt>
23fa44: e1a03009 mov r3, r9
23fa48: e1a02007 mov r2, r7
23fa4c: e1a01006 mov r1, r6
23fa50: e0640005 rsb r0, r4, r5
23fa54: ebffff70 bl 23f81c <get_prev_sp_ra2##Base>
23fa58: e3a00000 mov r0, #0
23fa5c: e28dd008 add sp, sp, #8
23fa60: e8bd82f0 pop {r4, r5, r6, r7, r9, pc}
23fa64: 003d5be0 eorseq r5, sp, r0, ror #23
23fa68: 000026c8 andeq r2, r0, r8, asr #13
23fa6c: 002b7ba6 eoreq r7, fp, r6, lsr #23
23fa70: 002b73e5 eoreq r7, fp, r5, ror #7
0023fe14 <bar##Base>:
23fe14: e92d4ef0 push {r4, r5, r6, r7, r9, sl, fp, lr}
23fe18: e24dde16 sub sp, sp, #352 ; 0x160
23fe1c: e59f76a8 ldr r7, [pc, #1704] ; 2404cc <bar##Base+0x6b8>
23fe20: e1a04000 mov r4, r0
23fe24: e59f66a4 ldr r6, [pc, #1700] ; 2404d0 <bar##Base+0x6bc>
23fe28: e1a03000 mov r3, r0
23fe2c: e59f26a0 ldr r2, [pc, #1696] ; 2404d4 <bar##Base+0x6c0>
23fe30: e08f7007 add r7, pc, r7
23fe34: e08f6006 add r6, pc, r6
23fe38: e3a00000 mov r0, #0
23fe3c: e08f2002 add r2, pc, r2
23fe40: e1a05001 mov r5, r1
23fe44: e3a01003 mov r1, #3
23fe48: e59f9688 ldr r9, [pc, #1672] ; 2404d8 <bar##Base+0x6c4>
.....................................................................
24043c: e3a0100f mov r1, #15
240440: e1a0000a mov r0, sl
240444: ebfffd68 bl 23f9ec <foo##Base>
240448: e59f2108 ldr r2, [pc, #264] ; 240558 <bar##Base+0x744>
24044c: e3a01003 mov r1, #3
240450: e08f2002 add r2, pc, r2
240454: e1a05000 mov r5, r0
240458: e1a03000 mov r3, r0
24045c: e3a00000 mov r0, #0
I don't think there's an easy way to do this.
Normally the register ABI of any operating system contains a "frame pointer" register. For example, on Apple's armv7 ABI, this is r7:
0x10006fc0 b0b5 push {r4, r5, r7, lr}
0x10006fc2 02af add r7, sp, 8
0x10006fc4 0448 ldr r0, [0x10006fd8]
0x10006fc6 d0e90c45 ldrd r4, r5, [r0, 0x30]
0x10006fca 0020 movs r0, 0
0x10006fcc fff7a6ff bl 0x10006f1c
0x10006fd0 0019 adds r0, r0, r4
0x10006fd2 6941 adcs r1, r5
0x10006fd4 b0bd pop {r4, r5, r7, pc}
If you dereference r7 there, you get to a pair of pointers, the second of which is lr, and the first of which is the r7 of the calling function, allowing you to repeat this process until you reach the bottom of the stack.
Judging by the assembly you posted, the codebase you're looking at doesn't have that. This means that the only way to obtain the return address is the same way that the code itself does: step forward through each instruction and parse/interpret them until you reach something that loads into pc. This is of course imperfect, since there may be functions in your call stack that do not ever return, but there's not much you can do about that.
It may be tempting to search backwards instead, and while you can do a heuristic approach and probably reach quite reasonable results with it, that is even less reliable than searching forward, since you have absolutely no way of telling whether you arrived at address X by stepping forward from the previous instruction or by explicitly jumping there from somewhere else.

Issues with ARMv7-A bare metal call stack [duplicate]

This question already has an answer here:
Rustc/LLVM generates faulty code for aarch64 with opt-level=0
(1 answer)
Closed 7 years ago.
I'm trying to get a small ARM kernel up and running on QEMU (Versatile Express for Cortex-A15). Currently it simply sets sp to the top of a small stack and sends a single character to UART0.
_start.arm:
.set stack_size, 0x10000
.comm stack, stack_size
.global _start
_start:
ldr sp, =stack+stack_size
bl start
1:
b 1b
.size _start, . - _start
start.c:
/* UART_0 is a struct overlaid on 0x1c090000 */
void printChar(char c)
{
while (UART_0->flags & TRANSMIT_FULL);
UART_0->data = c;
}
void start()
{
while (UART_0->flags & TRANSMIT_FULL);
UART_0->data = 'A';
printChar('a');
}
From GDB, I know that execution progresses through _start into start and successfully sends 'A' to UART_0. printChar gets called and completes, but doesn't seem to print anything to the serial port . When running without GDB, the kernel repeatedly prints 'A', though I'm not sure if this is the processor resetting or jumping incorrectly.
From objdump:
Disassembly of section .stub:
00010000 <_start>:
10000: e59fd004 ldr sp, [pc, #4] ; 1000c <__STACK_SIZE+0xc>
10004: eb000016 bl 10064 <start>
10008: eafffffe b 10008 <_start+0x8>
1000c: 000200d0 .word 0x000200d0
Disassembly of section .text:
00010010 <printChar>:
10010: e52db004 push {fp} ; (str fp, [sp, #-4]!)
10014: e28db000 add fp, sp, #0
10018: e24dd00c sub sp, sp, #12
1001c: e1a03000 mov r3, r0
10020: e54b3005 strb r3, [fp, #-5]
10024: e1a00000 nop ; (mov r0, r0)
10028: e3a03000 mov r3, #0
1002c: e3413c09 movt r3, #7177 ; 0x1c09
10030: e1d331ba ldrh r3, [r3, #26]
10034: e6ff3073 uxth r3, r3
10038: e2033020 and r3, r3, #32
1003c: e3530000 cmp r3, #0
10040: 1afffff8 bne 10028 <printChar+0x18>
10044: e3a03000 mov r3, #0
10048: e3413c09 movt r3, #7177 ; 0x1c09
1004c: e55b2005 ldrb r2, [fp, #-5]
10050: e6ff2072 uxth r2, r2
10054: e1c320b2 strh r2, [r3, #2]
10058: e24bd000 sub sp, fp, #0
1005c: e49db004 pop {fp} ; (ldr fp, [sp], #4)
10060: e12fff1e bx lr
00010064 <start>:
10064: e52db008 str fp, [sp, #-8]!
10068: e58de004 str lr, [sp, #4]
1006c: e28db004 add fp, sp, #4
10070: e1a00000 nop ; (mov r0, r0)
10074: e3a03000 mov r3, #0
10078: e3413c09 movt r3, #7177 ; 0x1c09
1007c: e1d331ba ldrh r3, [r3, #26]
10080: e6ff3073 uxth r3, r3
10084: e2033020 and r3, r3, #32
10088: e3530000 cmp r3, #0
1008c: 1afffff8 bne 10074 <start+0x10>
10090: e3a03000 mov r3, #0
10094: e3413c09 movt r3, #7177 ; 0x1c09
10098: e5d32002 ldrb r2, [r3, #2]
1009c: e3a02000 mov r2, #0
100a0: e3822041 orr r2, r2, #65 ; 0x41
100a4: e5c32002 strb r2, [r3, #2]
100a8: e5d32003 ldrb r2, [r3, #3]
100ac: e3a02000 mov r2, #0
100b0: e5c32003 strb r2, [r3, #3]
100b4: e3a00061 mov r0, #97 ; 0x61
100b8: ebffffd4 bl 10010 <printChar>
100bc: e24bd004 sub sp, fp, #4
100c0: e59db000 ldr fp, [sp]
100c4: e28dd004 add sp, sp, #4
100c8: e49df004 pop {pc} ; (ldr pc, [sp], #4)
000100cc <UART_0>:
100cc: 1c090000 ....
I may have missed something, but I am not seeing where you have enabled interrupts, or poll to see if you can send the next character. If you have enabled the interrupts and set up the UART hardware correctly, your driver my have a bug. If you have not setup the UART hardware correctly, it may not be generating interrupts, or it may not be doing the FIFO correctly, or any number of other problems.

arm c code disasembly confusing,

I have below the code:
struct inner{
uint32_t a;
uint32_t b;
};
struct outer{
struct inner *in;
};
void test_func(struct outer *o)
{
printh(o->in->b);
}
printh simple displays the value in hexformat.
The code is complied with the following flags -DSMP -marm -mcpu=cortex-a15
The assembly generated is as follows:
f0001cc0 <test_func>:
f0001cc0: e92d4800 push {fp, lr}
f0001cc4: e28db004 add fp, sp, #4
f0001cc8: e24dd008 sub sp, sp, #8
f0001ccc: e50b0008 str r0, [fp, #-8]
f0001cd0: e51b3008 ldr r3, [fp, #-8]
f0001cd4: e5933000 ldr r3, [r3]
f0001cd8: e5933004 ldr r3, [r3, #4]
f0001cdc: e1a00003 mov r0, r3
f0001ce0: ebfffb04 bl f00008f8 <printh>
f0001ce4: e24bd004 sub sp, fp, #4
f0001ce8: e8bd8800 pop {fp, pc}
With this code I get a data abort on f0001cd8 as r3 was loaded 0 in f0001cd4. But r3 was loaded with the address of o correctly in f0001cd4.
All I have is a simple line. I dont seem to understand why the following instruction is generated
f0001cd4: e5933000 ldr r3, [r3]
Because of this I am getting a data abort.
f0001cc0 <test_func>:
f0001cc0: e92d4800 push {fp, lr} ;\
f0001cc4: e28db004 add fp, sp, #4 ;> create stack frame
f0001cc8: e24dd008 sub sp, sp, #8 ;/
f0001ccc: e50b0008 str r0, [fp, #-8] ; save first arg (o) in stack
f0001cd0: e51b3008 ldr r3, [fp, #-8] ; load o
f0001cd4: e5933000 ldr r3, [r3] ; load o->in
f0001cd8: e5933004 ldr r3, [r3, #4] ; load o->in->b
f0001cdc: e1a00003 mov r0, r3 ; use as first arg to next fn
f0001ce0: ebfffb04 bl f00008f8 <printh> ; call printh
f0001ce4: e24bd004 sub sp, fp, #4 ;\
f0001ce8: e8bd8800 pop {fp, pc} ;/ destroy stack frame
The above (obviously compiled with no optimization) code is first loading o->in, and then o->in->b. o->in comes up at 0, which means that you haven't allocated memory for it.

Resources