MOVS with LSL Carry Flag - arm

I'm learning The Arm System Developers Guide and I am wondering about one of the examples in the book:
cpsr = nzcvqiFt_USER // capital indicates flag is set
r0 = 0x00000000
r1 = 0x80000004
MOVS r0, r1, LSL #1
cpsr = nzCvqiFt_USER // capital indicates flag is set
r0 = 0x00000008
r1 = 0x80000004
I understand that the 8 is carried over and the C flag is set due to this carry based on the logical shift left. Why is the 4 in r1 not shifted as well? Is the MOVS operation only moving the carried 8?

Because your result ends up in r0.
So, what happens is:
carry = r1[31] = 1
r0 = r1 << 1
Thats why r0 becomes 0x00000008 because the MSB of r1 got shifted out to the carry and the third bit (2^3=4) got shifted to the fourth bit (2^4=8).

Related

Compare two 64 bit variables on 32 bit microcontroller

I have the following issue: I have two 64 bit variables and they have to be compared as quick as possible, my Microcontroller is only 32bit.
My thoughts are that it is necessary to divide 64 bit variable into two 32 bit variables, like this
uint64_t var = 0xAAFFFFFFABCDELL;
hiPart = (uint32_t)((var & 0xFFFFFFFF00000000LL) >> 32);
loPart = (uint32_t)(var & 0xFFFFFFFFLL);
and then to compare hiParts and loParts, but I am sure that this approach is slow and there is much better solution
The first rule should be: Write your program, so that is readable to a human.
When in doubt, don't assume anything, but measure it. Let's see, what godbolt gives us.
#include <stdint.h>
#include <stdbool.h>
bool foo(uint64_t a, uint64_t b) {
return a == b;
}
bool foo2(uint64_t a, uint64_t b) {
uint32_t ahiPart = (uint32_t)((a & 0xFFFFFFFF00000000ULL) >> 32);
uint32_t aloPart = (uint32_t)(a & 0xFFFFFFFFULL);
uint32_t bhiPart = (uint32_t)((b & 0xFFFFFFFF00000000ULL) >> 32);
uint32_t bloPart = (uint32_t)(b & 0xFFFFFFFFULL);
return ahiPart == bhiPart && aloPart == bloPart;
}
foo:
eor r1, r1, r3
eor r0, r0, r2
orr r0, r0, r1
rsbs r1, r0, #0
adc r0, r0, r1
bx lr
foo2:
eor r1, r1, r3
eor r0, r0, r2
orr r0, r0, r1
rsbs r1, r0, #0
adc r0, r0, r1
bx lr
As you can see, they result in the exact same assembly code, but you decide, which one is less error prone and easiert to read?
There was a time some years ago where you need to do tricks to be more smart than a compiler. But in 99.999% the compiler will be more smart than you.
And your variables are unsigned. So use ULL instead of LL.
The fastest way is to let the compiler do it. Most compilers are much better than humans at micro-optimization.
uint64_t var = …, other_var = …;
if (var == other_var) …
There aren't many ways to go about it. Under the hood, the compiler will arrange to load the upper 32 bits and the lower 32 bits of each variables into registers, and compare the two registers that contain upper 32 bits and the two registers that contain lower 32 bits. The assembly code might look something like this:
load 32 bits from &var into r0
load 32 bits from &other_var into r1
if r0 != r1: goto different
load 32 bits from &var + 4 into r2
load 32 bits from &other_var + 4 into r3
if r2 != r3: goto different
// code for if-equal
different:
// code for if-not-equal
Here are some things the compiler knows better than you:
Which registers to use, based on the needs of the surrounding code.
Whether to reuse the same registers to compare the upper and lower parts, or to use different registers.
Whether to process one part and then the other (as above), or to load one variable then the other. The best order depends on the pressure on registers and on the memory access times and pipelining of the particular processor model.
If you work with a union you could compare Hi and Lo Part without any extra calculations:
typedef union
{
struct
{
uint32_t loPart;
uint32_t hiPart;
};
uint64_t complete;
}uint64T;
uint64T var.complete = 0xAAFFFFFFABCDEULL;

What is the use of SBC instruction in arm?

I understand how the SBC instruction in ARM works.
But, I don't seem to understand how it will be useful, as the intended answer is always less by 1.
Example:
MOV r1, #0x88
MOV r2, #0x44
SUB r3, r1, r2
SBC r4, r1, r2
After this operation, r3 has 0x44 (correct) and r4 has 0x43 (incorrect).
I don't see in which case SBC is a more relevant operation than SUB.
Thanks.
This operation is a substration that adds the carry (PSTATE.C) to the result:
r4 = r1 - r2 - (1-CPSR.C)
CPSR.NZCV has been set by a previous operation that sets flags (For example CMP orADDS).
This type of operation can be useful for large integer additions.
For example, in Aarch32 if you want to calculate a 64-bit addition, you add the 32-bit bottom bits (ADDS) then use ADDC to do the top 32-bit with carry propagation.

Logical Orr ARM

I'm new to arm programming and I am trying to understand what the following code does:
.macro set_bit reg_addr bit
ldr r4, =\reg_addr
ldr r5, [r4]
orr r5, #(1 << \bit)
str r5, [r4]
.endm
In particular I am confused about the orr r5,#(1<< \bit) part, I understand that orr stands for logical orr but I am not sure what that means in the given context. I think #(1<<\bit) is seeing if 1 is greater than the given bit, so that will return a true or false statement but I am not sure what the orr command will do.
You are right about the ORR instruction is used for performing logical OR operation. The instruction is used in the following format in the context of this question-
ORR {Register} {Constant}
Now, the constant here is (1 <<\bit) which basically means to left shift 1 by the \bit amount. Here bit is a number between 0-31 which decides which bit needs to be set. The ARM instruction set allows the constant to have such shifting operations as well.

How does this disassembly correspond to the given C code?

Environment: GCC 4.7.3 (arm-none-eabi-gcc) for ARM Cortex m4f. Bare-metal (actually MQX RTOS, but here that's irrelevant). The CPU is in Thumb state.
Here's a disassembler listing of some code I'm looking at:
//.label flash_command
// ...
while(!(FTFE_FSTAT & FTFE_FSTAT_CCIF_MASK)) {}
// Compiles to:
12: bf00 nop
14: f04f 0300 mov.w r3, #0
18: f2c4 0302 movt r3, #16386 ; 0x4002
1c: 781b ldrb r3, [r3, #0]
1e: b2db uxtb r3, r3
20: b2db uxtb r3, r3
22: b25b sxtb r3, r3
24: 2b00 cmp r3, #0
26: daf5 bge.n 14 <flash_command+0x14>
The constants (after expending macros, etc.) are:
address of FTFE_FSTAT is 0x40020000u
FTFE_FSTAT_CCIF_MASK is 0x80u
This is compiled with NO optimization (-O0), so GCC shouldn't be doing anything fancy... and yet, I don't get this code. Post-answer edit: Never assume this. My problem was getting a false sense of security from turning off optimization.
I've read that "uxtb r3,r3" is a common way of truncating a 32-bit value. Why would you want to truncate it twice and then sign-extend? And how in the world is this equivalent to the bit-masking operation in the C-code?
What am I missing here?
Edit: Types of the thing involved:
So the actual macro expansion of FTFE_FSTAT comes down to
((((FTFE_MemMapPtr)0x40020000u))->FSTAT)
where the struct is defined as
/** FTFE - Peripheral register structure */
typedef struct FTFE_MemMap {
uint8_t FSTAT; /**< Flash Status Register, offset: 0x0 */
uint8_t FCNFG; /**< Flash Configuration Register, offset: 0x1 */
//... a bunch of other uint_8
} volatile *FTFE_MemMapPtr;
The two uxtb instructions are the compiler being stupid, they should be optimized out if you turn on optimization. The sxtb is the compiler being brilliant, using a trick that you wouldn't expect in unoptimized code.
The first uxtb is due to the fact that you loaded a byte from memory. The compiler is zeroing the other 24 bits of register r3, so that the byte value fills the entire register.
The second uxtb is due to the fact that you're ANDing with an 8-bit value. The compiler realizes that the upper 24-bits of the result will always be zero, so it's using uxtb to clear the upper 24-bits.
Neither of the uxtb instructions does anything useful, because the sxtb instruction overwrites the upper 24 bits of r3 anyways. The optimizer should realize that and remove them when you compile with optimizations enabled.
The sxtb instruction takes the one bit you care about 0x80 and moves it into the sign bit of register r3. That way, if bit 0x80 is set, then r3 becomes a negative number. So now the compiler can compare with 0 to determine whether the bit was set. If the bit was not set then the bge instruction branches back to the top of the while loop.

Reversing bits in a register Thumb-2

So my problem is one I though was rather simple and I have an algorithm, but I can't seem to make it work using thumb-2 instructions.
Amway, I need to reverse the bits of r0, and I thought the easiest way to do this would be to Logically shift the number right into a temporary register and then shift that left into the result register. However LSL, LSR don't seem to allow you to store the shifted bit that is lost to the Most significant bit or least significant bit(while also shifting the bits of that register). Is there some part of the instruction I am miss understanding.
This is my ARM reference:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cjacbgca.html
The bit being shifted out can be copied into the C bit (carry flag) if you use the S suffix ("set flags"). And the RRX instruction uses C to set the bit 31 of the result. So you can probably do something like:
; 32 iterations
MOV R2, #32
; init result
MOV R1, #0
loop
; copy R0[31] into C and shift R0 to left
LSLS R0, R0, #1
; shift R1 to right and copy C into R1[31]
RRX R1, R1
; decrement loop counter
SUBS R2, #1
BNE loop
; copy result back to R0
MOV R0, R1
Note that this is a pretty slow way of reversing bits. If RBIT is available, you should use it, otherwise check some bit twiddling tricks.
How about using the rbit instruction? My copy of the ARMARM shows it having a Thumb-2 Encoding in ARMv6T2 and above.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjgdid.html

Resources