I'm writing a program in arm in which I have to get the bits of a register. My idea is make left logical shift and then look the value of c flag. The problem is if the value shifted is 0 the c flag is unchanged.
So how to set the C to 0.
my program contain :
lsl r1, r1, #1
bcs bit_1
b bit_0
Maybe there are better solution.
Related
In arm assembly language, the instruction ADCS will add with condition flags C and set condition flags.
And the CMP instruction do the same things, so the condition flags will be recovered.
How can I solve it ?
This is my code, it is doing BCD adder with r0 and r1 :
ldr r8, =#0
ldr r9, =#15
adds r7, r8, #0
ADDLOOP:
and r4, r0, r9
and r5, r1, r9
adcs r6, r4, r5
orr r7, r6, r7
add r8, r8, #1
mov r9, r9, lsl #4
cmp r8, #3
bgt ADDEND
bl ADDLOOP
ADDEND:
mov r0, r7
I tried to save the state of condition flags, but I don't know how to do.
To save/restore the Carry flag, you could create a 0/1 integer in a register (perhaps with adc reg, zeroed_reg, #0?), then next iteration cmp reg, #1 or rsbs reg, reg, #1 to set the carry flag from it.
ARM can't materialize C as an integer 0/1 with a single instruction without any setup; compilers normally use movcs r0, #1 / movcc r0, #0 when not in a loop (Godbolt), but in a loop you'd probably want to zero a register once outside the loop instead of using two instructions predicated on carry-set / carry-clear.
Loop without modifying C
Use teq r8, #4 / bne ADDLOOP as the loop branch, like the bottom of a do{}while(r8 != 4).
Or count down from 4 with tst r8,r8 / bne ADDLOOP, using sub r8, #1 instead of add.
TEQ updates N and Z but not C or V flags. (Unless you use a shifted source operand, then it can update C). docs - unlike cmp, it sets flags like eors. The eq / ne conditions work the same: subtraction and XOR both produce zero when the inputs are equal, and non-zero in every other case. But teq doesn't even set C or V flags, and greater / less wouldn't be meaningful anyway.
This is what optimized BigInt code like GMP does, for example in its mpn_add_n function (source) which adds two bigint inputs (arrays of 32-bit chunks).
IDK why you were jumping forwards over a bl (branch-and-link) which sets lr as a return address. Don't do that, structure your asm loops like a do{}while() because it's more efficient, especially when the trip-count is known to be non-zero so you don't have to worry about running the loop zero times in some cases.
There are cbz/cbnz instructions (docs) that jump on a register being zero or non-zero without affecting flags, but they can only jump forwards (out of the loop, past an unconditional branch). They're also only available in Thumb mode, unlike teq which was probably specifically designed to give ARM an efficient way to write BigInt loops.
BCD adding
Your algorithm has bugs; you need base-10 carry, like 0x05 + 0x06 = 0x11 not 0x0b in packed BCD.
And even the binary Carry flag isn't set by something like 0x0005000 + 0x0007000; there's no carry-out from the high bit, only into the next nibble. Also, adc adds the carry-in at the bottom of the register, not at nibble your mask isolated.
So maybe you need to do something like subtract 0x000a000 from the sum (for that example shift position), because that will carry-out. (ARM sets C as a !borrow on subtraction, so maybe rsb reverse-subtract or swap the operands.)
NEON should make it possible to unpack to 8-bit elements (mask odd/even and interleave) and do all nibbles in parallel, but carry propagation is a problem; ARM doesn't have an efficient way to branch on SIMD vector conditions (unlike x86 pmovmskb). Just byte-shifting the vector and adding could generate further carries, as with 999999 + 1.
IDK if this can be cut down effectively with the same techniques hardware uses, like carry-select or carry-lookahead, but for 4-bit BCD digits with SIMD elements instead of single bits with hardware full-adders.
It's not worth doing for binary bigint because you can work in 32 or 64-bit chunks with the carry flag to help, but maybe there's something to gain when primitive hardware operations only do 4 bits at a time.
I have a C code in my mind which I want to implement in ARM Programming Language.
The C code I have in my mind is something of this sort:
int a;
scanf("%d",&a);
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
What I have tried:
//arm equivalent of taking input to reg r0
//check for first condition
cmp r0,#1
moveq r0,#1
//if false
movne r0,#2
//check for second condition
cmp r0,#0
moveq r0,#1
Is this the correct way of implementing it?
Your code is broken for a=0 - single step through it in your head, or in a debugger, to see what happens.
Given this specific condition, it's equivalent to (unsigned)a <= 1U (because negative integer convert to huge unsigned values). You can do a single cmp and movls / movhi. Compilers already spot this optimization; here's how to ask a compiler to make asm for you so you can learn the tricks clever humans programmed into them:
int foo(int a) {
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
return a;
}
With ARM GCC10 -O3 -marm on the Godbolt compiler explorer:
foo:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
See How to remove "noise" from GCC/clang assembly output? for more about making functions that will have useful asm output. In this case, r0 is the first arg-passing register in the calling convention, and also the return-value register.
I also included another C version using if (a <= 1U) to show that it compiles to the same asm. (1U is an unsigned constant, so C integer promotion rules implicitly convert a to unsigned so the types match for the <= operator. You don't need to explicitly do (unsigned)a <= 1U.)
General case: not a single range
For a case like a==0 || a==3 that isn't a single range-check, you can predicate a 2nd cmp. (Godbolt)
foo:
cmp r0, #3 # sets Z if a was 3
cmpne r0, #0 # leaves Z unmodified if it was already set, else sets it according to a == 0
moveq r0, #1
movne r0, #2
bx lr
You can similarly chain && like a==3 && b==4, or for checks like a >= 3 && a <= 7 you can sub / cmp, using the same unsigned-compare trick as the 0 or 1 range check after sub maps a values into the 0..n range. See the Godbolt link for that.
No that does not work.
cmp r0,#1 is it a one
moveq r0,#1 yes, make it a one again?
movne r0,#2 otherwise make it a 2, what if it was a zero to start, now it is a 2
cmp r0,#0 at this point it is either a 1 or a 2 you forced it so it cannot be zero, what it started off is is now lost.
moveq r0,#1
You have the right concept but need to order things better.
following that line of thinking though
maybe use another register
x = 2;
if(a==0) x = 1;
if(a==1) x = 1;
a = x;
Ponder this
if(a==0) a = 1;
if(a!=1) a = 2;
Or as everyone else is going to say ask the compiler.
because of the or, test OR test, generically they need to be done separately the false condition of the first test does not mean the else condition you have to then do the other test before declaring false. But if true you need to hop over everything and not fall into the second test because that might (in this case will) be false...
As Peter points out you can use unsigned less than or equal and greater than conditions (even though in C it is a signed int, bits is bits).
LS Unsigned lower or same
HI Unsigned higher
Depending the ARM instruction sets is can be:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
or
cmp r0, #1
ite ls
movls r0, #1
movhi r0, #2
bx lr
Am I smarter than you? NO I simply use the compiler to compile the C code.
https://godbolt.org/z/dqxv64Eb9
I'm new to arm programming and I am trying to understand what the following code does:
.macro set_bit reg_addr bit
ldr r4, =\reg_addr
ldr r5, [r4]
orr r5, #(1 << \bit)
str r5, [r4]
.endm
In particular I am confused about the orr r5,#(1<< \bit) part, I understand that orr stands for logical orr but I am not sure what that means in the given context. I think #(1<<\bit) is seeing if 1 is greater than the given bit, so that will return a true or false statement but I am not sure what the orr command will do.
You are right about the ORR instruction is used for performing logical OR operation. The instruction is used in the following format in the context of this question-
ORR {Register} {Constant}
Now, the constant here is (1 <<\bit) which basically means to left shift 1 by the \bit amount. Here bit is a number between 0-31 which decides which bit needs to be set. The ARM instruction set allows the constant to have such shifting operations as well.
So my problem is one I though was rather simple and I have an algorithm, but I can't seem to make it work using thumb-2 instructions.
Amway, I need to reverse the bits of r0, and I thought the easiest way to do this would be to Logically shift the number right into a temporary register and then shift that left into the result register. However LSL, LSR don't seem to allow you to store the shifted bit that is lost to the Most significant bit or least significant bit(while also shifting the bits of that register). Is there some part of the instruction I am miss understanding.
This is my ARM reference:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cjacbgca.html
The bit being shifted out can be copied into the C bit (carry flag) if you use the S suffix ("set flags"). And the RRX instruction uses C to set the bit 31 of the result. So you can probably do something like:
; 32 iterations
MOV R2, #32
; init result
MOV R1, #0
loop
; copy R0[31] into C and shift R0 to left
LSLS R0, R0, #1
; shift R1 to right and copy C into R1[31]
RRX R1, R1
; decrement loop counter
SUBS R2, #1
BNE loop
; copy result back to R0
MOV R0, R1
Note that this is a pretty slow way of reversing bits. If RBIT is available, you should use it, otherwise check some bit twiddling tricks.
How about using the rbit instruction? My copy of the ARMARM shows it having a Thumb-2 Encoding in ARMv6T2 and above.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjgdid.html
Using the -S flag in gcc I created a assembly file from my C code and in order to better understand how memory is used. Here is some assembly from the top of the main function:
main:
mov r1, r4 ; FP = SP
add #2, r4 ; FP += 2
add #llo(-14), r1 ; SP -= 14 ?
mov #llo(-16), r15 ; ???
add r4, r15 ; r15 += FP
add #4, r15
Comments were placed by me as I tried to dissect what is happening. My question is the use of the #llo macro, and how the memory on the stack is being used, and lastly what is going into r15?
For context I have a variables including a structure being placed on the stack at the beginning of main that takes up 14 bytes (7 16bit words). What I don't understand is what is the #llo macro and what is r15 being used for? I know r4 is the frame pointer and r1 is the stack pointer.
The llo macro returns the lower 16 bits of it's argument. I guess that is needs it to avoid overflow when using a negative number (or the compiler is lazy).
It looks like the code computes the location of some object in R15. It's hard to tell with only part of the code... Also, if R4 isn't use more in the function, this code could be optimized a lot.
The line add #llo(-14), r1 allocates space on the stack.
It would be interesting to see what other compilers do with code like this (gcc for the MSP430 isn't really state of the art).