I'm new to arm programming and I am trying to understand what the following code does:
.macro set_bit reg_addr bit
ldr r4, =\reg_addr
ldr r5, [r4]
orr r5, #(1 << \bit)
str r5, [r4]
.endm
In particular I am confused about the orr r5,#(1<< \bit) part, I understand that orr stands for logical orr but I am not sure what that means in the given context. I think #(1<<\bit) is seeing if 1 is greater than the given bit, so that will return a true or false statement but I am not sure what the orr command will do.
You are right about the ORR instruction is used for performing logical OR operation. The instruction is used in the following format in the context of this question-
ORR {Register} {Constant}
Now, the constant here is (1 <<\bit) which basically means to left shift 1 by the \bit amount. Here bit is a number between 0-31 which decides which bit needs to be set. The ARM instruction set allows the constant to have such shifting operations as well.
Related
In arm assembly language, the instruction ADCS will add with condition flags C and set condition flags.
And the CMP instruction do the same things, so the condition flags will be recovered.
How can I solve it ?
This is my code, it is doing BCD adder with r0 and r1 :
ldr r8, =#0
ldr r9, =#15
adds r7, r8, #0
ADDLOOP:
and r4, r0, r9
and r5, r1, r9
adcs r6, r4, r5
orr r7, r6, r7
add r8, r8, #1
mov r9, r9, lsl #4
cmp r8, #3
bgt ADDEND
bl ADDLOOP
ADDEND:
mov r0, r7
I tried to save the state of condition flags, but I don't know how to do.
To save/restore the Carry flag, you could create a 0/1 integer in a register (perhaps with adc reg, zeroed_reg, #0?), then next iteration cmp reg, #1 or rsbs reg, reg, #1 to set the carry flag from it.
ARM can't materialize C as an integer 0/1 with a single instruction without any setup; compilers normally use movcs r0, #1 / movcc r0, #0 when not in a loop (Godbolt), but in a loop you'd probably want to zero a register once outside the loop instead of using two instructions predicated on carry-set / carry-clear.
Loop without modifying C
Use teq r8, #4 / bne ADDLOOP as the loop branch, like the bottom of a do{}while(r8 != 4).
Or count down from 4 with tst r8,r8 / bne ADDLOOP, using sub r8, #1 instead of add.
TEQ updates N and Z but not C or V flags. (Unless you use a shifted source operand, then it can update C). docs - unlike cmp, it sets flags like eors. The eq / ne conditions work the same: subtraction and XOR both produce zero when the inputs are equal, and non-zero in every other case. But teq doesn't even set C or V flags, and greater / less wouldn't be meaningful anyway.
This is what optimized BigInt code like GMP does, for example in its mpn_add_n function (source) which adds two bigint inputs (arrays of 32-bit chunks).
IDK why you were jumping forwards over a bl (branch-and-link) which sets lr as a return address. Don't do that, structure your asm loops like a do{}while() because it's more efficient, especially when the trip-count is known to be non-zero so you don't have to worry about running the loop zero times in some cases.
There are cbz/cbnz instructions (docs) that jump on a register being zero or non-zero without affecting flags, but they can only jump forwards (out of the loop, past an unconditional branch). They're also only available in Thumb mode, unlike teq which was probably specifically designed to give ARM an efficient way to write BigInt loops.
BCD adding
Your algorithm has bugs; you need base-10 carry, like 0x05 + 0x06 = 0x11 not 0x0b in packed BCD.
And even the binary Carry flag isn't set by something like 0x0005000 + 0x0007000; there's no carry-out from the high bit, only into the next nibble. Also, adc adds the carry-in at the bottom of the register, not at nibble your mask isolated.
So maybe you need to do something like subtract 0x000a000 from the sum (for that example shift position), because that will carry-out. (ARM sets C as a !borrow on subtraction, so maybe rsb reverse-subtract or swap the operands.)
NEON should make it possible to unpack to 8-bit elements (mask odd/even and interleave) and do all nibbles in parallel, but carry propagation is a problem; ARM doesn't have an efficient way to branch on SIMD vector conditions (unlike x86 pmovmskb). Just byte-shifting the vector and adding could generate further carries, as with 999999 + 1.
IDK if this can be cut down effectively with the same techniques hardware uses, like carry-select or carry-lookahead, but for 4-bit BCD digits with SIMD elements instead of single bits with hardware full-adders.
It's not worth doing for binary bigint because you can work in 32 or 64-bit chunks with the carry flag to help, but maybe there's something to gain when primitive hardware operations only do 4 bits at a time.
I understand how the SBC instruction in ARM works.
But, I don't seem to understand how it will be useful, as the intended answer is always less by 1.
Example:
MOV r1, #0x88
MOV r2, #0x44
SUB r3, r1, r2
SBC r4, r1, r2
After this operation, r3 has 0x44 (correct) and r4 has 0x43 (incorrect).
I don't see in which case SBC is a more relevant operation than SUB.
Thanks.
This operation is a substration that adds the carry (PSTATE.C) to the result:
r4 = r1 - r2 - (1-CPSR.C)
CPSR.NZCV has been set by a previous operation that sets flags (For example CMP orADDS).
This type of operation can be useful for large integer additions.
For example, in Aarch32 if you want to calculate a 64-bit addition, you add the 32-bit bottom bits (ADDS) then use ADDC to do the top 32-bit with carry propagation.
Environment: GCC 4.7.3 (arm-none-eabi-gcc) for ARM Cortex m4f. Bare-metal (actually MQX RTOS, but here that's irrelevant). The CPU is in Thumb state.
Here's a disassembler listing of some code I'm looking at:
//.label flash_command
// ...
while(!(FTFE_FSTAT & FTFE_FSTAT_CCIF_MASK)) {}
// Compiles to:
12: bf00 nop
14: f04f 0300 mov.w r3, #0
18: f2c4 0302 movt r3, #16386 ; 0x4002
1c: 781b ldrb r3, [r3, #0]
1e: b2db uxtb r3, r3
20: b2db uxtb r3, r3
22: b25b sxtb r3, r3
24: 2b00 cmp r3, #0
26: daf5 bge.n 14 <flash_command+0x14>
The constants (after expending macros, etc.) are:
address of FTFE_FSTAT is 0x40020000u
FTFE_FSTAT_CCIF_MASK is 0x80u
This is compiled with NO optimization (-O0), so GCC shouldn't be doing anything fancy... and yet, I don't get this code. Post-answer edit: Never assume this. My problem was getting a false sense of security from turning off optimization.
I've read that "uxtb r3,r3" is a common way of truncating a 32-bit value. Why would you want to truncate it twice and then sign-extend? And how in the world is this equivalent to the bit-masking operation in the C-code?
What am I missing here?
Edit: Types of the thing involved:
So the actual macro expansion of FTFE_FSTAT comes down to
((((FTFE_MemMapPtr)0x40020000u))->FSTAT)
where the struct is defined as
/** FTFE - Peripheral register structure */
typedef struct FTFE_MemMap {
uint8_t FSTAT; /**< Flash Status Register, offset: 0x0 */
uint8_t FCNFG; /**< Flash Configuration Register, offset: 0x1 */
//... a bunch of other uint_8
} volatile *FTFE_MemMapPtr;
The two uxtb instructions are the compiler being stupid, they should be optimized out if you turn on optimization. The sxtb is the compiler being brilliant, using a trick that you wouldn't expect in unoptimized code.
The first uxtb is due to the fact that you loaded a byte from memory. The compiler is zeroing the other 24 bits of register r3, so that the byte value fills the entire register.
The second uxtb is due to the fact that you're ANDing with an 8-bit value. The compiler realizes that the upper 24-bits of the result will always be zero, so it's using uxtb to clear the upper 24-bits.
Neither of the uxtb instructions does anything useful, because the sxtb instruction overwrites the upper 24 bits of r3 anyways. The optimizer should realize that and remove them when you compile with optimizations enabled.
The sxtb instruction takes the one bit you care about 0x80 and moves it into the sign bit of register r3. That way, if bit 0x80 is set, then r3 becomes a negative number. So now the compiler can compare with 0 to determine whether the bit was set. If the bit was not set then the bge instruction branches back to the top of the while loop.
So my problem is one I though was rather simple and I have an algorithm, but I can't seem to make it work using thumb-2 instructions.
Amway, I need to reverse the bits of r0, and I thought the easiest way to do this would be to Logically shift the number right into a temporary register and then shift that left into the result register. However LSL, LSR don't seem to allow you to store the shifted bit that is lost to the Most significant bit or least significant bit(while also shifting the bits of that register). Is there some part of the instruction I am miss understanding.
This is my ARM reference:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cjacbgca.html
The bit being shifted out can be copied into the C bit (carry flag) if you use the S suffix ("set flags"). And the RRX instruction uses C to set the bit 31 of the result. So you can probably do something like:
; 32 iterations
MOV R2, #32
; init result
MOV R1, #0
loop
; copy R0[31] into C and shift R0 to left
LSLS R0, R0, #1
; shift R1 to right and copy C into R1[31]
RRX R1, R1
; decrement loop counter
SUBS R2, #1
BNE loop
; copy result back to R0
MOV R0, R1
Note that this is a pretty slow way of reversing bits. If RBIT is available, you should use it, otherwise check some bit twiddling tricks.
How about using the rbit instruction? My copy of the ARMARM shows it having a Thumb-2 Encoding in ARMv6T2 and above.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjgdid.html
I'm currently messing around with an LPC 2378 which has an application board attached. On the fan there are several components, 2 of which are a fan and a heater.
If bits 6 and 7 of port 4 are connected to the fan (motor controller), the following code will turn on the fan:
FanOn
STMFD r13!,{r0,r5,r14} ; Push r0, r5 and LR
LDR R5, =FIO4PIN ; Address of FIO4PIN
LDR r0, [r5] ; Read current Port4
ORR r0, r0, #0x80
STR r0, [r5] ; Output
LDMFD r13!,{r0,r5,r14} ; Pop r0, r5 and LR
mov pc, r14 ; Put link register back into PC
How can I rewrite this block of code to turn on a heater connected to bit 5 of port 4, (Setting the bit to 1 will turn it on, setting it to 0 will turn it off).
Edit after answered question:
;==============================================================================
; Turn Heater On
;==============================================================================
heaterOn
STMFD r13!,{r0,r5,r14} ; Push r0, r5 and LR
LDR R5, =FIO4PIN ; Address of FIO4PIN
LDR r0, [r5] ; Read current Port4
ORR r0, r0, #0x20
STR r0, [r5] ; Output
LDMFD r13!,{r0,r5,r14} ; Pop r0, r5 and LR
mov pc, r14 ; Put link register back into PC
;==============================================================================
; Turn The Heater Off
;==============================================================================
heaterOff
STMFD r13!,{r0,r5,r14} ; Push r0, r5 and LR
LDR R5, =FIO4PIN ; Address of FIO4PIN
LDR r0, [r5] ; Read current Port4
AND r0, r0, #0xDF
STR r0, [r5] ; Output
LDMFD r13!,{r0,r5,r14} ; Pop r0, r5 and LR
mov pc, r14 ; Put link register back into PC
As best as I understand the code, the fan is connected only to bit 7 (if bits are numerated from 0).
The following line is responsible for turning the fan-bit on:
ORR r0, r0, #0x80
You are setting all the bits that are 1 in the "mask" to 1. Since the mask is 0x80, that is 1000 0000 in binary, it only turns on bit 7.
Now, if you need to turn on the heater instead of the fan, and you have to set bit 5 instead of 7 (on the same port), you only need to change the mask in that line. New mask should be 0010 0000 binary, that is 0x20 in hexa, so the new code should be:
ORR r0, r0, #0x20
Also, if you want to turn the heater off at some point later, you do it by unsetting only bit 5, by anding with a mask that has 1s everywhere except on bit 5. If the mnemonic for bitwise and is BIC, the line would be:
BIC r0, r0, 0xDF
Now, I have not done any assembly in months but if I am not very mistaken, the code snippet you gave is actually a subroutine. You would call it from you main functionality with something like call to the FanOn address. And, to me, it seems that the subroutine is nice in a way that it preserves all the registers it uses, e.g. it pushes them to a stack in the first line and recovers them in the end.
So, to re-use the code, you could just write a new subroutine for turning the heater on, and one for turning each thing off if you want, and only change the label/subroutine name for each one, e.g. FanOff, HeaterOn...
Since all of them preserve all the registers, you can use them sequentially without worries.
The ORR instruction turns ON a bit, the #0x80 constant determines the bit(s) (in this case, only bit 7 is turned on). To turn OFF the bit, you will need an AND instruction and compute the appropriate mask (e.g., to turn OFF bit 7, you would AND with constant #0x7F). The appropriate constants for bit 5 are #0x20 and #0xDF.