I am interested in converting a Fibonacci sequence code in C++ into ARM assembly language. The code in C++ is as follows:
#include <iostream>
using namespace std;
int main()
{
int range, first = 0 , second = 1, fibonacci;
cout << "Enter range for the Fibonacci Sequence" << endl;
cin >> range;
for (int i = 0; i < range; i++)
{
if (i <=1)
{
fibonacci = i;
}
else
{
fibonacci = first and second;
first = second;
second = fibonacci;
}
}
cout << fibonacci << endl;
return 0;
}
My attempt at converting this to assembly is as follows:
ldr r0, =0x00000000 ;loads 0 in r0
ldr r1, =0x00000001 ;loads 1 into r1
ldr r2, =0x00000002 ;loads 2 into r2, this will be the equivalent of 'n' in C++ code,
but I will force the value of 'n' when writing this code
ldr r3, =0x00000000 ;r3 will be used as a counter in the loop
;r4 will be used as 'fibonacci'
loop:
cmp r3, #2 ;Compares r3 with a value of 0
it lt
movlt r4, r3 ;If r3 is less than #0, r4 will equal r3. This means r4 will only ever be
0 or 1.
it eq ;If r3 is equal to 2, run through these instructions
addeq r4, r0, r1
moveq r0,r1
mov r1, r4
adds r3, r3, #1 ;Increases the counter by one
it gt ;Similarly, if r3 is greater than 2, run though these instructions
addgt r4, r0, r1
movgt r0, r1
mov r1, r4
adds r3, r3, #1
I'm not entirely sure if that is how you do if statements in Assembly, but that will be a secondary concern for me at this point. What I am more interested in, is how I can incorporate an if statement in order to test for the initial condition where the 'counter' is compared to the 'range'. If counter < range, then it should go into the main body of the code where the fibonacci statement will be iterated. It will then continue to loop until counter = range.
I am not sure how to do the following:
cmp r3, r2
;If r3 < r2
{
<code>
}
;else, stop
Also, in order for this to loop correctly, am I able to add:
cmp r3, r2
bne loop
So that the loop iterates until r3 = r2?
Thanks in advance :)
It's not wise to put if-statements inside a loop. Get rid of it.
An optimized(kinda) standalone Fibonacci function should be like this:
unsigned int fib(unsigned int n)
{
unsigned int first = 0;
unsigned int second = 1;
unsigned int temp;
if (n > 47) return 0xffffffff; // overflow check
if (n < 2) return n;
n -= 1;
while (1)
{
n -= 1;
if (n == 0) return second;
temp = first + second;
first = second;
second = temp
}
}
Much like factorial, optimizing Fibonacci sequence is somewhat nonsense in real world computing, because they exceed the 32-bit barrier really soon: It's 12 with factorial and 47 with Fibonacci.
If you really need them, you are served the best with very short lookup tables.
If you need this function fully implemented for larger values:
https://www.nayuki.io/page/fast-fibonacci-algorithms
Last but not least, here is the function above in assembly:
cmp r0, #47 // r0 is n
movhi r0, #-1 // overflow check
bxhi lr
cmp r0, #2
bxlo lr
sub r2, r0, #1 // r2 is the counter now
mov r1, #0 // r1 is first
mov r0, #1 // r0 is second
loop:
subs r2, r2, #1 // n -= 1
add r12, r0, r1 // temp = first + second
mov r1, r0 // first = second
bxeq lr // return second when condition is met
mov r0, r12 // second = temp
b loop
Please note that the last bxeq lr can be placed immediately after subs which might seem more logical, but with the multiple issuing capability of the Cortex series in mind, it's better in this order.
It might be not exactly the answer you were looking for, but keep this in mind: A single if statement inside a loop can seriously cripple the performance - a nested one even more.
And there are almost always ways avoiding these. You just have to look for them.
Conditionals compile to conditional jumps in almost all assembly language:
if (condition)
..iftrue..
else
..iffalse..
becomes
eval condition
conditional_jump_if_true truelabel
..iffalse..
unconditional_jump endlabel
truelabel:
..iftrue..
endlabel:
or the other way around (exchange false and true).
ARM supports conditional execution to eliminate these jumps when compiling the innermost conditionals: http://www.davespace.co.uk/arm/introduction-to-arm/conditional.html
IT... is a Thumb-2 instruction: http://en.wikipedia.org/wiki/ARM_architecture#Thumb-2 to support unified assemblies. See http://www.keil.com/support/man/docs/armasm/armasm_BABJGFDD.htm for more details.
Your code for looping (cmp and bne) is fine.
In general, try to rewrite your code using gotos instead of cycles, and else parts.
else can remain only at the deepest nesting level.
Then you can convert this semi-assembly code to assembly much more easily.
HTH
Related
I have written a code in keil in which there is a loop containing a multiplication which can be done either by shift left by 1 or MUL command. However, both are executed only in the 1st iteration and in the next ones, compiler executes the code, while nothing happens.
Is there someone who knows the probable reason?
Thanks
Here is the code of the loop:
loop ADD r9, r9, #1
;MUL r8, r7, r11 ;double the X
LSL r12, r7, #1
CMP r12, r4 ;CMP the X with P
BHI G1
BLS G0
it was solved. it occured because this is a consecutive shift and it should be shifted by 1 at each iteration. the written code copies the source into destination and then, shift by 1. So it is not modified after the 1st iteration. the solution is modifying r7 by LSL r7, #1. This is shifted r7 by 1 position at all iterations.
I have this program that i have to write in arm assembly to find the smallest element in an array. Normally this is a pretty easy thing to do in every programming language, but i just can't get my head around what i'm doing wrong in arm assembly. I'm a beginner in arm but i know my way around c. So I wrote the algorithm on how to find the smallest number in an array in c like this.
int minarray = arr[0];
for (int i =0; i < len; i++){
if (arr[i] < minarray){
minarray = arr[i];
}
It's easy and nothing special really.
Now i tried taking over the algorithm in arm almost the same. There are two things that have already been programmed from the beginning. The address of the first element is stored in register r0. The length of the array is stored in register r1. In the end, the smallest element must be stored back in register r0. Here is what i did:
This is almost the same algorithm as the one in c. First i load the first element into a new register r4. Now the first element is the smallest. Then once again, i load the first element in r8. I compare those two, if r8 <= r4, then copy the content of r8 to r4. After that (because i'm working with numbers of 32 bits) i add 4bytes to r0 to get on to the next element of the array. After that i subtract 1 from the array length to loop through the array until its below 0 to stop the program.
The feedback i'm getting from my testing function that was given to us to check if our program works says that it works partly. It says that it works for short arrays and arrays of length 0 but not for long arrays. I'm honestly lost. I think i'm making a really dumb mistake but i just cannot find it and i've been stuck at this easy problem for 3 days now but everything i have tried did not work or as i said, only worked "partly". I would really appreciate if someone could help me out.
This is the feedback that i get:
✗ min works with other numbers
✗ min works with a long array
✓ min works with a short array
✓ min tolerates size = 0
(x is for "it does not work", ✓ is for "it works")
So you see what i'm saying? i just do not understand how to implement the fact that its supposed to work with a longer array.
I'm not very good at ARM assembly by to my understanding R4 is expected to keep the value of minimum. R8 is used to keep the most recently fetched value from the input array.
The minimum is updated with this instruction:
MOVLE r8, r4
But it actually updated R8, not R4.
Try:
MOVLE r4, r8
EDIT
Other issue is using incorrect branch instruction:
SUBS r1, r1, #1
BPL loop1
works like:
r1 = r1 - 1
if (r1 >= 0) goto loop1;
For R1 equal to 1 the loop is exectured twice.
r1 = 1
... do stuff
r1 = r1 - 1 // r1 is 0 now
if (r1 >= 0) goto loop1; // 0>=0 TRUE!
... do stuff, overflow the input by indexing at `[r0 + 4]`
r1 = r1 - 1 // r1 is -1
if (r1 >= 0) goto loop1; // -1 >= 0 FALSE
// exit function
To fix it use branching only when input is non-zero.
BNE loop1
Coding in C use the correct types
You do not have to iterate from the index 0 only 1
int foo(const int *arr, size_t len)
{
int minarray = arr[0];
for (size_t i = 1; i < len; i++)
{
if (arr[i] < minarray)
{
minarray = arr[i];
}
}
return minarray;
}
And it generates this code:
foo:
mov r3, r0
subs r1, r1, #1
ldr r0, [r3], #4
beq .L1
.L3:
ldr r2, [r3], #4
cmp r0, r2
it ge
movge r0, r2
subs r1, r1, #1
bne .L3
.L1:
bx lr
I have a C code in my mind which I want to implement in ARM Programming Language.
The C code I have in my mind is something of this sort:
int a;
scanf("%d",&a);
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
What I have tried:
//arm equivalent of taking input to reg r0
//check for first condition
cmp r0,#1
moveq r0,#1
//if false
movne r0,#2
//check for second condition
cmp r0,#0
moveq r0,#1
Is this the correct way of implementing it?
Your code is broken for a=0 - single step through it in your head, or in a debugger, to see what happens.
Given this specific condition, it's equivalent to (unsigned)a <= 1U (because negative integer convert to huge unsigned values). You can do a single cmp and movls / movhi. Compilers already spot this optimization; here's how to ask a compiler to make asm for you so you can learn the tricks clever humans programmed into them:
int foo(int a) {
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
return a;
}
With ARM GCC10 -O3 -marm on the Godbolt compiler explorer:
foo:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
See How to remove "noise" from GCC/clang assembly output? for more about making functions that will have useful asm output. In this case, r0 is the first arg-passing register in the calling convention, and also the return-value register.
I also included another C version using if (a <= 1U) to show that it compiles to the same asm. (1U is an unsigned constant, so C integer promotion rules implicitly convert a to unsigned so the types match for the <= operator. You don't need to explicitly do (unsigned)a <= 1U.)
General case: not a single range
For a case like a==0 || a==3 that isn't a single range-check, you can predicate a 2nd cmp. (Godbolt)
foo:
cmp r0, #3 # sets Z if a was 3
cmpne r0, #0 # leaves Z unmodified if it was already set, else sets it according to a == 0
moveq r0, #1
movne r0, #2
bx lr
You can similarly chain && like a==3 && b==4, or for checks like a >= 3 && a <= 7 you can sub / cmp, using the same unsigned-compare trick as the 0 or 1 range check after sub maps a values into the 0..n range. See the Godbolt link for that.
No that does not work.
cmp r0,#1 is it a one
moveq r0,#1 yes, make it a one again?
movne r0,#2 otherwise make it a 2, what if it was a zero to start, now it is a 2
cmp r0,#0 at this point it is either a 1 or a 2 you forced it so it cannot be zero, what it started off is is now lost.
moveq r0,#1
You have the right concept but need to order things better.
following that line of thinking though
maybe use another register
x = 2;
if(a==0) x = 1;
if(a==1) x = 1;
a = x;
Ponder this
if(a==0) a = 1;
if(a!=1) a = 2;
Or as everyone else is going to say ask the compiler.
because of the or, test OR test, generically they need to be done separately the false condition of the first test does not mean the else condition you have to then do the other test before declaring false. But if true you need to hop over everything and not fall into the second test because that might (in this case will) be false...
As Peter points out you can use unsigned less than or equal and greater than conditions (even though in C it is a signed int, bits is bits).
LS Unsigned lower or same
HI Unsigned higher
Depending the ARM instruction sets is can be:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
or
cmp r0, #1
ite ls
movls r0, #1
movhi r0, #2
bx lr
Am I smarter than you? NO I simply use the compiler to compile the C code.
https://godbolt.org/z/dqxv64Eb9
I'm having trouble with an ARMv7 assembly function I'm trying to write. The function is below.
.global maxF32
// float maxF32(const float x[], uint32_t count)
// returns the maximum value in the array (x) containing count entries
/* R0 = float x[], R1 = uint32_t count */
maxF32:
CMP R1, #0 # SET FLAGS OF COUNT
BEQ maxF32_end # IF COUNT == 0, GO TO END
MOV R2, R0 # COPY R0 INTO R2
MOV R0, #0 # ZERO OUT R0
MOV R3, #0 # ZERO OUT R3
MOV S0, R0 # ZERO OUT S0
MOV S1, R0 # ZERO OUT S1
MOV S2, R0 # ZERO OUT S2
B maxF32_loop # BRANCH TO LOOP
maxF32_loop:
CMP R1, #0 # CHECK IF COUNT == 0
BEQ maxF32_end # IF SO, GO TO END
VLDR.F32 S0, [R2] # LOAD CURRENT ELEMENT INTO S0
ADD R2, R2, #4 # INCREMENT VECTOR POINTER
VLDR.F32 S1, [R2] # LOAD NEXT ELEMENT INTO S1
SUB R1, R1, #1 # DECREMENT COUNT
CMP S0, S1 # SET FLAGS; CHECK S0 - S1
BMI maxF32_update # IF RESULT IS NEGATIVE (S1 > S0), GO TO UPDATE GREATEST
B maxF32_loop # REPEAT LOOP
maxF32_update:
VMOV R3, R1, S1 # MOVE GREATER INTO GREATEST REGISTER
B maxF32_loop # GO BACK TO LOOP
maxF32_end:
MOV R0, R3 # COPY GREATEST INTO R0
BX LR # RETURN
I test with the following C code:
extern float maxF32(const float x[], uint32_t count);
#define COUNT 3
float x[] = {1.2, 1.3, 1.4};
float result;
result = maxF32(x, COUNT);
printf("result is %lf\n", result);
Using gdb, right around the second to last or last loop through the array, I get a "bus error" right around the CMP S0, S1 line. I sometimes also get an "expected comma" at MOV R3, R1, S1, which confuses me as I don't know where I would need an extra comma.
Last note, I am coding on a Raspberry Pi Model 3 B+.
Thanks for any and all help.
On a count of "3" you compare s0 to s1 3 times.... meaning you look for four elements in the array. (With n elements you really should only have n-1 comparisons)
The logic...
decrement count
cmp element0 element1 ... loop
decrement count
compare element1 element2 ... loop
decrement count
compare element2 with 'bus error' -- nothing there.
Moving SUB R1, R1, #1 to the first line after maxF32_loop: probably fixes it as it puts the decrement before the check for count being 0.
Need to convert the following C code into ARM assembly subroutine:
int power(int x, unsigned int n)
{
int y;
if (n == 0)
return 1;
if (n & 1)
return x * power(x, n - 1);
else
{ y = power(x, n >> 1);
return y * y;
}
}
Here's what I have so far but cant figure out how to get the link register to increment after each return (keeps looping back to the same point)
pow CMP r0, #0
MOVEQ r0, #1
BXEQ lr
TST r0, #1
BEQ skip
SUB r0, r0, #1
BL pow
MUL r0, r1, r0
BX lr
skip LSR r0, #1
BL pow
MUL r3, r0, r3
BX lr
The BL instruction does not automatically push or pop anything from the stack. This saves a memory access. It's the way it works with RISC processors (in part because they offer 30 general purpose registers.)
STR lr, [sp, #-4]! ; "PUSH lr"
BL pow
LDR lr, [sp], #4 ; "POP lr"
If you repeat a BL call, then you want to STR/LDR on the stack outside of the loop.