Execution does not go out of the loop in ARM - c

I want to print in ARM assembly language a given number in decimal in hexadecimal. I'm doing the function that does the conversion and the printing. So far the conversion works but the printing not at all.
It does only print a char at a time and it's not at all what I want, I want a special format of output such that I have 0x and 8 digits.
I wrote a function printf using the given function I had, called _writec that is working but only printing a char at a time. So I wrote a loop until I get the end of string function but here it seems that it doesn't care.
I've followed the execution step-by-step using gdb and it suddenly crash for no appearing reason. When r0 contain 0 it should go to .end according to my beq but it does not.
ARM Code:
.global _print_hex
_print_hex:
push {lr}
#According to .c algorithm : r0 = dec; r1 = quotient;
# r2 = temp; r3 = i ; r4 = j
mov fp, sp
sub sp, sp, #100 # 100 times size of char
mov r1, r0
mov r3, #0
_while:
cmp r1, #0
bne _computing
ldr r0, =.hex_0x
bl _printf
mov r4, #8
_for:
cmp r4, #0
bge _printing
ldr r0, =.endline
bl _printf
mov sp, fp
pop {pc}
_computing:
and r2, r1, #0xF
cmp r2, #10
blt .temp_less_10
add r2, #7
.temp_less_10:
add r2, #48
strb r2, [sp, r3]
add r3, #1
lsr r1, #4
b _while
_printing:
ldrb r0, [sp,r4]
bl _writec
sub r4, #1
b _for
_printf:
push {r0, r1, r2, r3, lr}
mov r1, r0
mov r2, #0
.loop:
ldrb r0, [r1,r2]
cmp r0, #0
beq .end
bl _writec
add r2, #1
b .loop
.end:
pop {r0, r1, r2, r3, lr}
bx lr
.hex_0x:
.asciz "0x"
.align 4
.endline:
.asciz "\n"
.align 4
C code (that I tried to translate):
void dec_to_hex(int dec){
int quotient, i, temp;
char hex[100];
quotient = dec;
i = 0;
while (quotient != 0){
temp = quotient % 16;
if (temp < 10){
temp += 48; // it goes in the ascii table between 48 and 57 that correspond to [0..9]
} else {
temp += 55; //it goes in the first cap letters from 65 to 70 [A..F]
}
hex[i]=(char)temp;
i++;
quotient /= 16;
}
printf("0x");
for(int j=i; j>=0; j--){
printf("%c", hex[j]);
}
printf("\n");
}
Here is the code of _writec :
/*
* Sends a character to the terminal through UART0
* The character is given in r0.
* IF the TX FIFO is full, this function awaits
* until there is room to send the given character.
*/
.align 2
.global _writec
.type _writec,%function
.func _writec,_writec
_writec:
push {r0,r1,r2,r3,lr}
mov r1, r0
mov r3, #1
lsl r3, #5 // TXFF = (1<<5)
ldr r0,[pc]
b .TXWAIT
.word UART0
.TXWAIT:
ldr r2, [r0,#0x18] // flags at offset 0x18
and r2, r2, r3 // TX FIFO Full set, so wait
cmp r2,#0
bne .TXWAIT
strb r1, [r0,#0x00] // TX at offset 0x00
pop {r0,r1,r2,r3,pc}
.size _writec, .-_writec
.endfunc
So in ARM when debugging it crashed at my first call of _printf and when I comment all the call to _printf it does print the result but not as the desired format. I only got the hex value.

Related

ARM assembly routine, test, that accepts two arguments, r0 and r1, calls bit_pos for each argument and then adds the results

I've been attempting to write some assembly code that should be equivalent to this C code,
int test( unsigned int v1, unsigned int v2 )
{
int res1, res2;
res1 = bit_pos( v1 );
res1 = bit_pos( v2 );
return res1 + res2;
}
Test should accept two arguments, r0 and r1, calls bit_pos for each argument and then adds the results.
My current progress is the following :-
.arch armv4
.syntax unified
.arm
.text
.align 2
.type bit_pos, %function
.global bit_pos
bit_pos:
mov r1,r0
mov r0, #1
top:
cmp r1,#0
beq done
add r0,r0,#1
lsr r1, #1
b top
done:
mov pc, lr
.align 2
.type test, %function
.global test
test:
push {r0, r1, r2, lr}
mov r0, #0x80000000
bl bit_pos
mov r1, r0
mov r0, #0x00000001
bl bit_pos
mov r2, r0
pop {r0, r1, r2, lr}
mov pc, lr
I tried multiple attempts for the test function but the results always fail to pass, the issue is in the test function but the bit_pos is performing properly.
I need to pass the following test cases
checking 5 20 res=6
checking 1 0 res=-1
checking 175 100000 res=21
but currently I'm failing with
checking 5 20 res=5
got 5, expected 6
I'm really horrible at assembly but I promise I given this my best shot, It's been 5 hours and I'm tired.
My current attempt
test:
push {r4, lr}
mov r4, r0
bl bit_pos
bl bit_pos
add r0, r4
pop {r4, lr}
mov pc, lr
checking 5 20 res=6
checking 1 0 res=0
got 0, expected -1
Test failed

How to do the if or in assembly?

I am trying to translate the following code to assembly!
int i = 1
int a = 3
int b;
if(i == 1 || a == 3)
b = 95;
else
b = 0;
I am confused about the part where I have to use or in the if statement. Do you guys have any suggestions?
ldr r0, [r13, #0] //i = 1
ldr r1, [r13, #4] //a = 3
mov r2, #1 //put 1 in r2
mov r3, #3 //put 3 in r3
cmp r0, r2 //compare i and 1
orr r1, r3 //or a and 3
bgt else //if false branch to else
ldr r4 #0 // put 0 in r4
str r4, [r13, #8] //store it at location 208 with r13
b endif //branch to else if if true
ldr r5 #95 //put 95 on r5
str r5 [r13, #12] //store 95 on location 212 with r13
So far I have this!
Honestly looks wrong! So you can roast me I am here to learn so please teach me! :)
I don't recognize the assembly language. But the pseudo-code would be:
compare i with 1
if true, jump to if
compare a with 3
if true, jump to if
else:
store 0 in b
jump to endif
if:
store 95 in b
endif:
This also implements the short-circuiting of ||, since a == 3 is only tested if i == 1 fails.
Writing assembly code by hand will quickly get out of hand and become goto spaghetti and a modern compiler does a better job optimizing, with that said sometimes you want to write a few lines of assembler for some other reason.
I don't think you must load the constants into registers first and the cost of assigning a value to a register is low compared to conditional branching.
My approach here would be
Store the value of one of the braces (95) in one register.
Compare a with 1 and 3 and branch if equal
Overwrite the register with 0 if branch was not taken
Store the contents of the register in area of variable b
begin:
mov r5, #95 // The value
ldr r0, [r13, #0] //i = 1
ldr r1, [r13, #4] //a = 3
cmp r0, #1 //compare i and 1
beq else
cmp r1,#3 // compare a and 3
beq else //if false branch to else
mov r5,#0 // Clear r5
else:
str r5, [r13, #8] //store it at location 208 with r13
Edit: I Found this cheat sheet It looks like there are conditional variants of the mov instruction. Then the code could be written like this: Without any jumps at all.
begin:
mov r5, #95 // The value
mov r6,#0 // Clear r6
ldr r0, [r13, #0] // i = 1
ldr r1, [r13, #4] // a = 3
cmp r0, #1 // compare i and 1
moveq r5,r6 // conditional move
cmp r1,#3 // compare a and 3
moveq r5,r6
str r5, [r13, #8] // store it at location 208 with r13
If i equals 1 then branch to code that assigns 95 to b. Otherwise, if a is 3, branch to that same code. Otherwise, assign 0 to b and branch to just after the other assignment.
I find the comments in another answer interesting and disturbing at the same time. And more interesting that that answer did not simply ask a compiler.
int fun ( int i, int a )
{
int b;
if(i == 1 || a == 3)
b = 95;
else
b = 0;
return b;
}
00000000 <fun>:
0: e3510003 cmp r1, #3
4: 13500001 cmpne r0, #1
8: 03a0005f moveq r0, #95 ; 0x5f
c: 13a00000 movne r0, #0
10: e12fff1e bx lr
so that means
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
cmp r0, #1 //compare i with 1, interested in equal or not
cmpne r1, #3 //if not equal then test a with 3, interested in equal or not
moveq r5, #95 //if either of the two were equal set b = 95
movne r5, #0 //if neither of the two were equal set b = 0
which is this machine code
0: e59d0000 ldr r0, [sp]
4: e59d1004 ldr r1, [sp, #4]
8: e3500001 cmp r0, #1
c: 13510003 cmpne r1, #3
10: 03a0505f moveq r5, #95 ; 0x5f
14: 13a05000 movne r5, #0
As shown in the ARM documentation, start with the ARM Architectural Reference Manual for ARMv5 to get your feet wet with the basic 32 bit ARM instructions (and base (all thumb variants) thumb instructions). Notice in that documentation that the first nibble describes the condition code and all instructions can be conditionally executed (to avoid branches for if-then-else type things).
0: e3a0505f mov r5, #95 ; 0x5f
4: 03a0505f moveq r5, #95 ; 0x5f
8: 13a0505f movne r5, #95 ; 0x5f
c: e3500001 cmp r0, #1
10: 03500001 cmpeq r0, #1
14: 13500001 cmpne r0, #1
18: c3500001 cmpgt r0, #1
1c: b3500001 cmplt r0, #1
See how the first 4 bits change but the other 28 do not? A feature you see in ARM instruction sets specifically and not necessarily in others. Some others have similar features though.
Not heard of a32 instruction set, so it is not clear which of the handful or more of the arm instruction sets you are using. The above works on armv4t through armv7-a. But tell a modern compiler to build for armv7-a it is likely going to build thumb first then arm only if you can force it. See the ARM Architectural Reference Manual for armv7-ar (it also shows all the way back to armv4t each instruction indicating which architectures are supported).
This is arm code as well that runs on some arm processors:
0: 2903 cmp r1, #3 compare a with 3
2: bf18 it ne these two
4: 2801 cmpne r0, #1 do an if not equal then compare i with 1
6: bf0c ite eq these three do a
8: 205f moveq r0, #95 ; 0x5f if either are equal b = 95
a: 2000 movne r0, #0 else b = 0
c: 4770 bx lr
e: bf00 nop
(just to show that it matters very much which specific instruction set a question is asking about and for ARM which of the ARM instruction sets)
You are basically wanting to do a
if i == 1 set the z flag
else if a == 3 set the z flag
if the z flag is set (from either of the above) b = 95
else b = 0
There are many basic ways to do this and Simson's answer is a clean straightforward approach that saves a branch or two.
mov r5,#95
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
// if i == 1
cmp r0,#1
bne skip
// or if a == 3
cmp r1,#3
bne skip
// else
mov r5,#0 //neither were equal
skip:
str r5, [r13, #12]
I was focused on that answer, but looking at yours did you mean to place the result in two different places based on the result?
ldr r4 #0 // put 0 in r4
str r4, [r13, #8] //store it at location 208 with r13
ldr r5 #95 //put 95 on r5
str r5 [r13, #12] //store 95 on location 212 with r13
That breaks Simson's answer. And mine above.
Most folks would start with this, easy to read and follow, brute force straight from the high level code.
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
// if i == 1
check_i:
cmp r0,#1
bne check_a
b one_equal //folks will forget to do this one
check_a:
// or if a == 3
cmp r1,#3
beq one_equal
bne neither_equal //or just fall through
// else
neither_equal:
mov r4,#0
str r4, [r13, #8]
b the_end //many folks forget this branch
one_equal:
mov r5,#95
str r5, [r13, #12]
the_end:
Or something like it which can then be shortened slightly into this, some folks would start with something like this:
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
// if i == 1
cmp r0,#1
beq one_equal
// or if a == 3
cmp r1,#3
beq one_equal
// else
neither_equal:
mov r4,#0
str r4, [r13, #8]
b the_end //many folks forget this one
one_equal:
mov r5,#95
str r5, [r13, #12]
the_end:
Here is where you start to go off the rails
cmp r0, r2 //this is a valid starting point
orr r1, r3 //orr is a logical or, not an if this "or" that
// so we are confused by what you are doing here
bgt else //you are wanting to know if it is equal or not, not if greater
// than
It does not get any better after that
If you really meant the result in two different places then:
Still get the variables into registers from the stack
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
This still does an if this is equal or that is equal
cmp r0, #1 //is i == 1?
cmpne r1, #3 //if not then is a == 3?
You end up here with z set if either one is equal or z clear if neither are equal
moveq r4,#95 //one or the other is equal
streq r4,[r13, #8] //one or the other is equal
movne r5,#0 //neither are equal
strne r5,[r13, #12] //neither are equal
Final result:
ldr r0, [sp, #0] //i
ldr r1, [sp, #4] //a
cmp r0, #1 //is i == 1?
cmpne r1, #3 //if not then is a == 3
moveq r4,#95 //one or the other is equal
streq r4,[r13, #8] //one or the other is equal
movne r5,#0 //neither are equal
strne r5,[r13, #12] //neither are equal
It assembles fine, so the syntax is good
0: e59d0000 ldr r0, [sp]
4: e59d1004 ldr r1, [sp, #4]
8: e3500001 cmp r0, #1
c: 13510003 cmpne r1, #3
10: 03a0405f moveq r4, #95 ; 0x5f
14: 058d4008 streq r4, [sp, #8]
18: 13a05000 movne r5, #0
1c: 158d500c strne r5, [sp, #12]
I have edited this so many times I hope I did not leave any mistakes...I will get beat up for it if I did I am sure...Before doing any assembly language you need the proper documentation. In this case you want one of the ARM Architectural Reference Manuals, likely the oldest one which is directly derived from the printed versions before they distributed pdfs. The armv5 manual.
In general you will see a compiler will do the opposite and jump over
if(x==1)
{
y = 5;
}
cmp r0,#1
bne skip //C code is equal so branch if not
mov r1,#5
skip:
If you had if ((i==1)&&(a==3)) you would also want to look at the opposite, skip over if (i!=1) skip over if (a!=3) having the two paths skip to a common label.
But in the case of an this OR that you kind of want to have two paths land in the same place by branching to a common label and then have it fall through to the else code if neither are true. By doing the as written comparison if i == 1 branch to label, of a == 3 branch to label.

how is _Bool implemented?

What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it.
For ex:
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
This would print 1, 1. Trying to understand what is it that makes a variable of size Byte act as a single bit and when assigned a value > 1 which has LSB 0 still get converted to 1.
What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it. The compiler does.
For example on ARM (arm-none-eabi-gcc):
#include "stdio.h"
#include "stdbool.h"
int main()
{
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
return 0;
}
compiles to:
.LC0:
.ascii "%x , %lu\000"
main:
stmfd sp!, {fp, lr}
add fp, sp, #4
sub sp, sp, #8
mov r3, #1
strb r3, [fp, #-5]
ldrb r3, [fp, #-5] # zero_extendqisi2
mov r2, #1
mov r1, r3
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, lr}
bx lr
.L3:
.word .LC0
you can see in the instruction mov r3, #1 that the compiler directly converts the initialisation value 10 to 1 as specified by the standard .

ARM Assembly Arrays

I am trying to figure out how arrays work in ARM assembly, but I am just overwhelmed. I want to initialize an array of size 20 to 0, 1, 2 and so on.
A[0] = 0
A[1] = 1
I can't even figure out how to print what I have to see if I did it correctly. This is what I have so far:
.data
.balign 4 # Memory location divisible by 4
string: .asciz "a[%d] = %d\n"
a: .skip 80 # allocates 20
.text
.global main
.extern printf
main:
push {ip, lr} # return address + dummy register
ldr r1, =a # set r1 to index point of array
mov r2, #0 # index r2 = 0
loop:
cmp r2, #20 # 20 elements?
beq end # Leave loop if 20 elements
add r3, r1, r2, LSL #2 # r3 = r1 + (r2*4)
str r2, [r3] # r3 = r2
add r2, r2, #1 # r2 = r2 + 1
b loop # branch to next loop iteration
print:
push {lr} # store return address
ldr r0, =string # format
bl printf # c printf
pop {pc} # return address
ARM confuses me enough as it is, I don't know what i'm doing wrong. If anyone could help me better understand how this works that would be much appreciated.
This might help down the line for others who want to know about how to allocate memory for array in arm assembly language
here is a simple example to add corresponding array elements and store in the third array.
.global _start
_start:
MOV R0, #5
LDR R1,=first_array # loading the address of first_array[0]
LDR R2,=second_array # loading the address of second_array[0]
LDR R7,=final_array # loading the address of final_array[0]
MOV R3,#5 # len of array
MOV R4,#0 # to store sum
check:
cmp R3,#1 # like condition in for loop for i>1
BNE loop # if R3 is not equal to 1 jump to the loop label
B _exit # else exit
loop:
LDR R5,[R1],#4 # loading the values and storing in registers and base register gets updated automatically R1 = R1 + 4
LDR R6,[R2],#4 # similarly
add R4,R5,R6
STR R4,[R7],#4 # storing the values back to the final array
SUB R3,R3,#1 # decrment value just like i-- in for loop
B check
_exit:
LDR R7,=final_array # before exiting checking the values stored
LDR R1, [R7] # R1 = 60
LDR R2, [R7,#4] # R2 = 80
LDR R3, [R7,#8] # R3 = 100
LDR R4, [R7,#12] # R4 = 120
MOV R7, #1 # terminate syscall, 1
SWI 0 # execute syscall
.data
first_array: .word 10,20,30,40
second_array: .word 50,60,70,80
final_array: .word 0,0,0,0,0
as mentioned your printf has problems, you can use the toolchain itself to see what the calling convention is, and then conform to that.
#include <stdio.h>
unsigned int a,b;
void notmain ( void )
{
printf("a[%d] = %d\n",a,b);
}
giving
00001008 <notmain>:
1008: e59f2010 ldr r2, [pc, #16] ; 1020 <notmain+0x18>
100c: e59f3010 ldr r3, [pc, #16] ; 1024 <notmain+0x1c>
1010: e5921000 ldr r1, [r2]
1014: e59f000c ldr r0, [pc, #12] ; 1028 <notmain+0x20>
1018: e5932000 ldr r2, [r3]
101c: eafffff8 b 1004 <printf>
1020: 0000903c andeq r9, r0, ip, lsr r0
1024: 00009038 andeq r9, r0, r8, lsr r0
1028: 0000102c andeq r1, r0, ip, lsr #32
Disassembly of section .rodata:
0000102c <.rodata>:
102c: 64255b61 strtvs r5, [r5], #-2913 ; 0xb61
1030: 203d205d eorscs r2, sp, sp, asr r0
1034: 000a6425 andeq r6, sl, r5, lsr #8
Disassembly of section .bss:
00009038 <b>:
9038: 00000000 andeq r0, r0, r0
0000903c <a>:
903c:
the calling convention is generally first parameter in r0, second in r1, third in r2 up to r3 then use the stack. There are many exceptions to this, but we can see here that the compiler which normally works fine with a printf call, wants the address of the format string in r0. the value of a then the value of b in r1 and r2 respectively.
Your printf has the string in r0, but a printf call with that format string needs three parameters.
The code above used a tail optimization and branch to printf rather than called it and returned from. The arm convention these days prefers the stack to be aligned on 64 bit boundaries, so you can put some register, you dont necessarily care to preserve on the push/pop in order to keep that alignment
push {r3,lr}
...
pop {r3,pc}
It certainly wont hurt you to do this, it may or may not hurt to not do it depending on what downstream assumes.
Your setup and loop should function just fine assuming that r1 (label a) is a word aligned address. Which it may or may not be if you mess with your string, should put a first then the string or put another alignment statement before a to insure the array is aligned. There are instruction set features that can simply the code, but it appears functional as is.

LPC810 - why won't machine code execute from an array of uint8_t in flash?

I'm writing embedded C/assembler code for the NXP LPC810 microcontroller (just a hobby project).
I have a function fn. I also have an exact copy of that function's machine code in an array of uint8_t. (I have checked the hex file.)
I create a function pointer fnptr, with the same type as fn and point it at the array, using a cast.
It all cross-compiles without warnings.
When the MCU executes fn it works correctly.
When the MCU executes fnptr it crashes (I can't see any debug, as there are only 8 pins, all in use).
The code is position independent.
The array has the correct 4 byte alignment.
fn is in the .text section of the elf file.
The array is forced into the .text section of the elf file (still in flash, not RAM).
I have assumed that there is no NX-like functionality on such a basic Coretex M0+ MCU. (Cortex M3 and M4 do have some form of read-only memory protection for code.)
Are there other reasons why the machine code in the array does not work?
Update:
Here is the code:
#include "stdio.h"
#include "serial.h"
extern "C" void SysTick_Handler() {
// generate an interrupt for delay
}
void delay(int millis) {
while (--millis >= 0) {
__WFI(); // wait for SysTick interrupt
}
}
extern "C" int fn(int a, int b) {
return a + b;
}
/* arm-none-eabi-objdump -d firmware.elf
00000162 <fn>:
162: 1840 adds r0, r0, r1
164: 4770 bx lr
166: 46c0 nop ; (mov r8, r8)
*/
extern "C" const uint8_t machine_code[6] __attribute__((aligned (4))) __attribute__((section (".text"))) = {
0x40,0x18,
0x70,0x47,
0xc0,0x46
};
int main() {
LPC_SWM->PINASSIGN0 = 0xFFFFFF04UL;
serial.init(LPC_USART0, 115200);
SysTick_Config(12000000/1000); // 1ms ticks
int(*fnptr)(int a, int b) = (int(*)(int, int))machine_code;
for (int a = 0; ; a++) {
int c = fnptr(a, 1000000);
printf("Hello world2 %d.\n", c);
delay(1000);
}
}
And here is the disassembled output from arm-none-eabi-objdump -D -Mforce-thumb firmware.elf:
00000162 <fn>:
162: 1840 adds r0, r0, r1
164: 4770 bx lr
166: 46c0 nop ; (mov r8, r8)
00000168 <machine_code>:
168: 1840 adds r0, r0, r1
16a: 4770 bx lr
16c: 46c0 nop ; (mov r8, r8)
16e: 46c0 nop ; (mov r8, r8)
00000170 <main>:
...
I amended the code to call the original fn though a function pointer too, in order to be able to generate working and non-working assembly code that was hopefully near-identical.
machine_code has become much longer, as I am now using no optimisation (-O0).
#include "stdio.h"
#include "serial.h"
extern "C" void SysTick_Handler() {
// generate an interrupt for delay
}
void delay(int millis) {
while (--millis >= 0) {
__WFI(); // wait for SysTick interrupt
}
}
extern "C" int fn(int a, int b) {
return a + b;
}
/*
000002bc <fn>:
2bc: b580 push {r7, lr}
2be: b082 sub sp, #8
2c0: af00 add r7, sp, #0
2c2: 6078 str r0, [r7, #4]
2c4: 6039 str r1, [r7, #0]
2c6: 687a ldr r2, [r7, #4]
2c8: 683b ldr r3, [r7, #0]
2ca: 18d3 adds r3, r2, r3
2cc: 1c18 adds r0, r3, #0
2ce: 46bd mov sp, r7
2d0: b002 add sp, #8
2d2: bd80 pop {r7, pc}
*/
extern "C" const uint8_t machine_code[24] __attribute__((aligned (4))) __attribute__((section (".text"))) = {
0x80,0xb5,
0x82,0xb0,
0x00,0xaf,
0x78,0x60,
0x39,0x60,
0x7a,0x68,
0x3b,0x68,
0xd3,0x18,
0x18,0x1c,
0xbd,0x46,
0x02,0xb0,
0x80,0xbd
};
int main() {
LPC_SWM->PINASSIGN0 = 0xFFFFFF04UL;
serial.init(LPC_USART0, 115200);
SysTick_Config(12000000/1000); // 1ms ticks
int(*fnptr)(int a, int b) = (int(*)(int, int))fn;
//int(*fnptr)(int a, int b) = (int(*)(int, int))machine_code;
for (int a = 0; ; a++) {
int c = fnptr(a, 1000000);
printf("Hello world2 %d.\n", c);
delay(1000);
}
}
I compiled the code above, generating firmware.fn.elf and firmware.machinecode.elf by uncommenting //int(*fnptr)(int a, int b) = (int(*)(int, int))machine_code; (and commenting-out the line above).
The first code (fn) worked, the second code (machine_code) crashed.
fn's text and the code at machine_code are identical:
000002bc <fn>:
2bc: b580 push {r7, lr}
2be: b082 sub sp, #8
2c0: af00 add r7, sp, #0
2c2: 6078 str r0, [r7, #4]
2c4: 6039 str r1, [r7, #0]
2c6: 687a ldr r2, [r7, #4]
2c8: 683b ldr r3, [r7, #0]
2ca: 18d3 adds r3, r2, r3
2cc: 1c18 adds r0, r3, #0
2ce: 46bd mov sp, r7
2d0: b002 add sp, #8
2d2: bd80 pop {r7, pc}
000002d4 <machine_code>:
2d4: b580 push {r7, lr}
2d6: b082 sub sp, #8
2d8: af00 add r7, sp, #0
2da: 6078 str r0, [r7, #4]
2dc: 6039 str r1, [r7, #0]
2de: 687a ldr r2, [r7, #4]
2e0: 683b ldr r3, [r7, #0]
2e2: 18d3 adds r3, r2, r3
2e4: 1c18 adds r0, r3, #0
2e6: 46bd mov sp, r7
2e8: b002 add sp, #8
2ea: bd80 pop {r7, pc}
000002ec <main>:
...
The only difference in the calling code is the location of the code called:
$ diff firmware.fn.bin.xxd firmware.machine_code.bin.xxd
54c54
< 0000350: 0040 0640 e02e 0000 bd02 0000 4042 0f00 .#.#........#B..
---
> 0000350: 0040 0640 e02e 0000 d402 0000 4042 0f00 .#.#........#B..
The second address d402 is the address of the machine_code array.
Curiously, the first address bd02 is a little-endian odd number (d is odd in hex).
The address of fn is 02bc (bc02 in big endian), so the pointer to fn is not the address of fn, but the address of fn plus one (or with the low bit set).
Changing the code to:
...
int main() {
LPC_SWM->PINASSIGN0 = 0xFFFFFF04UL;
serial.init(LPC_USART0, 115200);
SysTick_Config(12000000/1000); // 1ms ticks
//int(*fnptr)(int a, int b) = (int(*)(int, int))fn;
int machine_code_addr_low_bit_set = (int)machine_code | 1;
int(*fnptr)(int a, int b) = (int(*)(int, int))machine_code_addr_low_bit_set;
for (int a = 0; ; a++) {
int c = fnptr(a, 1000000);
printf("Hello world2 %d.\n", c);
delay(1000);
}
}
Makes it work.
Googling, I found:
The mechanism for switching makes use of the fact that all instructions must be (at least) halfword-aligned, which means that bit[0] of the branch target address is redundant. Therefore this bit can be re-used to indicate the target instruction set at that address. Bit[0] cleared to 0 means ARM and bit[0] set to 1 means Thumb.
on http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka12545.html
tl;dr
You need to set the low bit on function pointers when executing data as code on ARM Thumb.

Resources