How can I represent this if statement as an ARM instruction? - arm

if (R1 > R2 || R3 > R4)
R1 = R1 + 1
I'm having trouble figuring out what to do with multiple conditions. Would the instructions look something like this?
CMP R1, R2
CMPNE R3, R4
ADDEQ R1, R1, #1

if (R1 > R2 || R3 > R4)
R1 = R1 + 1
The OR condition makes this tricky since it requires a short-circuit evaluation. What you can do is make a fall through comparison that gets evaluated only if r1 is NOT greater than r2, and then have the add instruction look at the results of the flag from the last comparison done.
cmp r1, r2 ; compare r1 and r2
cmpls r3, r4 ; if lower or same, compare r3 and r4
addhi r1, r1, #1 ; if higher, add 1

Related

ARM assembly program would not store the result in register 0

I need to write a ARM assembly program will “iteratively” sum up (the integer multiplication of each integer element in the array_D by 4) and also the next element by looping until the end of the array signaled by 0. In each iteration, the current element x of the array_D will be replaced with the new result of summation as (x * 4 + [next element]). For instance, in case array_D = [2020, -97, 2441, -11, 0], the final result stored in r0 should be : (2020 * 4 - 97) + (-97 * 4 + 2441) + (2441 * 4 -11) + (-11 * 4 + 0)= 19745, with the array_D updated as [7983, 2053, 9753, -44, 0]. When it reaches the end of the array, the ARM assembly program will exit the loop and then terminate the program execution with the result of summation stored in register r0. Here is the program I've written:
# File : simple2.s------------------------
.data
array_D: .word 2020, -97, 2441, -11, 0
.text
.global main
main:
LDR r1,=array_D # load base addr. of array_D into r1
MOV r2, #0 # r2 as the array pointer
loop:
LDR r3, [r1,r2] # r3 as the array element
LSL r3, r3, #2 # multiply r3 by 4
MOV r4, r2 # copy r2 to r4
ADD r2, r2, #4 # r2 points to next element
LDR r5, [r1,r2] # r5 as the next element
ADD r3, r3, r5 # add r5 to r3
ADD r0, r0, r3 # sum the new element to r0 (change this to ADD r6, r6,
r3 and it worked)
STR r3, [r1,r4] # store the new value to the array_D
CMP r5, #0 # test if the next element is 0
BNE loop # loop if not 0
SWI 0x11
.end
However this program will not store the final result in r0 but zero instead (register view 1). I have tried r6 to store the result and it worked (register view 2). But the assignment requires me to store it in r0. What is wrong with my program?
register view 1
register view 2

How to write a If, Else if in assembly

I need to convert the following C code to its equivalent in Assembly. I've only taken a few classes on assembly and don't really have a grasp of the language yet.
int x = 45
int y = 27
while (x != y) {
if (x > y)
x = x - y;
else if (y > x)
y = y - x;
}
return x; // Sends exit code containing GCD
I have written what i believe to work and am about to use a debugger to find inevitable flaws but wanted to ask if i am headed in the right direction with the if else statement.
.global _start
_start
mov R1, #45 #R1 = 45
mov R2, #27 #R2 = 27
loopTop:
cmp R1, R2
beq allDone
bge R1,R2
sub R1,R1,R0
bge R2,R1
sub R2,R2,R1
b loopTop
allDone:
SWI R1
Any help/tips would be greatly appreciated!
Thanks!
Here's your solution:
mov R1, #45 #x
mov R2, #27 #y
loop:
cmp R1, R2 #compare R1 with R2
bne label1 #is R1 not equals to R2? "Branch is Not Equal" to label1, else...
#<replace with the return R1 statement here>
label1:
cmp R1, R2
bgt sub_xy #"Branch on Greater Than"
cmp R1, R2
blt sub_yx #"Branch on Lower Than"
sub_xy:
sub R1, R1, R2 #R1 = R1 - R2
b loop #branch to loop
sub_yx:
sub R2, R2, R1 #R2 = R2 - R1
b loop
Came up with this as a newb x86_64 programmer trying to code for ARM for the first time.
EDIT: Corrections/ explanations are in the comments.

How to understand why an ARM exception happens?

I'm trying understand what is the reason of ARM exception that I encounter.
It happens randomly during system startup, and may looks in few different ways.
One of simplest is following:
0x8004e810 in ti_sysbios_family_arm_a8_intcps_Hwi_vectors ()
#0 0x8004e810 in ti_sysbios_family_arm_a8_intcps_Hwi_vectors ()
#1 0x80002f04 in ti_sysbios_family_arm_exc_Exception_excHandlerDataAsm(int0_t) ()
at /home/rnd_share/sysbios/bios_6_51_00_15/packages/ti/sysbios/family/arm/exc/Exception_asm_gnu.asm:103
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
r0 0x20000197 536871319
r1 0x20000197 536871319
r2 0x20000197 536871319
r3 0x20000197 536871319
r4 0x20000197 536871319
r5 0x6 6
r6 0x80000024 2147483684
r7 0x80007a0c 2147514892
r8 0x8004f0a8 2147807400
r9 0x80041340 2147750720
r10 0x80040a3c 2147748412
r11 0xffffffff 4294967295
r12 0x20000197 536871319
sp 0x7fffff88 0x7fffff88
lr 0x80002f04 2147495684
pc 0x8004e810 0x8004e810 <ti_sysbios_family_arm_a8_intcps_Hwi_vectors+16>
cpsr 0x20000197 536871319
PC = 8004E810, CPSR = 20000197 (ABORT mode, ARM IRQ dis.)
R0 = 20000197, R1 = 20000197, R2 = 20000197, R3 = 20000197
R4 = 20000197, R5 = 00000006, R6 = 80000024, R7 = 80007A0C
USR: R8 =8004F0A8, R9 =80041340, R10=80040A3C, R11 =FFFFFFFF, R12 =20000197
R13=80212590, R14=80040A3C
FIQ: R8 =AEE1D6FA, R9 =C07BA930, R10=1B0B137A, R11 =7EC3F1DF, R12 =2000019F
R13=80065CF8, R14=00000000, SPSR=00000000
SVC: R13=4030CB20, R14=00022071, SPSR=00000000
ABT: R13=7FFFFF88, R14=80002F04, SPSR=20000197
IRQ: R13=F4ADFD8A, R14=80041020, SPSR=8000011F
UND: R13=80085CF8, R14=ED0F7EF1, SPSR=00000000
(gdb) frame
#0 0x8004e810 in ti_sysbios_family_arm_a8_intcps_Hwi_vectors ()
(gdb) frame 1
#1 0x80002f04 in ti_sysbios_family_arm_exc_Exception_excHandlerDataAsm(int0_t) ()
at /home/rnd_share/sysbios/bios_6_51_00_15/packages/ti/sysbios/family/arm/exc/Exception_asm_gnu.asm:103
103 mrc p15, #0, r12, c5, c0, #0 # read DFSR into r12
(gdb) list
98 .func ti_sysbios_family_arm_exc_Exception_excHandlerDataAsm__I
99
100 ti_sysbios_family_arm_exc_Exception_excHandlerDataAsm__I:
101 stmfd sp!, {r0-r12} # save r4-r12 while we're at it
102
103 mrc p15, #0, r12, c5, c0, #0 # read DFSR into r12
104 stmfd sp!, {r12} # save DFSR
105 mrc p15, #0, r12, c5, c0, #1 # read IFSR into r12
106 stmfd sp!, {r12} # save DFSR
107 mrc p15, #0, r12, c6, c0, #0 # read DFAR into r12
(gdb) monitor cp15 6 0 0 0
Reading CP15 register (6,0,0,0 = 0x7FFFFF54)
My understanding is that, there was some ongoing exception, which can be seen in frame 1.
It tries to save registers onto stack:
101 stmfd sp!, {r0-r12} # save r4-r12 while we're at it
But, stack pointer was incorrect at:
ABT: R13=7FFFFF88
I don't understand both:
What can be a cause of such value of SP in ABT and IRQ contexts ?
what is actually in frame 0 ? in other words, how Cortex reacted to data abort while being already in exception handler ?
This device usually starts normally, such situation happens like 3 times per 10 boots. It never happens when starting from debugger, only release and only when started from bootloader.
Two weeks later...
Boot procedure is following:
2nd stage bootloader loads application to memory
2nd stage bootloader jumps to application start.
main function of application is entered.
It turns out that sometimes statically initialized values of application have correct values after 1 step of booting, but then in 3 step they are corrupted. I mean application image is corrupted.
Caches haven't been flushed correctly between step 1 and 2.
Disabling caches at 2nd stage bootloader fixed problem at all.
Now need to fix that correctly.

PMU counters in ARM11

I am programming raspbery pi model b ARM1176 bare metal (in assembly and c). I need to calculate the clock cycles used to execute an assembly code.
I am using the following code for PMU counter:
mov r0,#1
MCR p15, 0, r0, c15, c12, 0 ; Write Performance Monitor Control Register
/* Reset Cycle Counter */
mov r0,#5
MCR p15, 0, r0, c15, c12, 0 ; Write Performance Monitor Control Register
/* Meaure */
MRC p15, 0, r0, c15, c12, 1 # Read Cycle Counter Register
<MY CODES>
MRC p15, 0, r1, c15, c12, 1 # Read Cycle Counter Register
From this if I have
add r3,#3
in place of my code i get r1=8 and r0=0, which seems correct since arm11 has 8 pipeline stages and it takes 8 clock cycles to execute it.
But when I add more instructions I am getting ridiculous results like
add r3,#3
add r4,#1
r0=0,r1=97/96/94 (the result of r1 should also be constant!!!)
I am using uart to see results of registers on minicom.
Okay, seeing the same thing, that is very interesting.
# nop
.globl test
test:
mov r0,#1
MCR p15, 0, r0, c15, c12, 0
mov r0,#5
MCR p15, 0, r0, c15, c12, 0
MRC p15, 0, r0, c15, c12, 1
add r3,#3
add r2,#1
MRC p15, 0, r1, c15, c12, 1
sub r0,r1,r0
bx lr
I am calling this from C so if I muck with r4 in the code under test would have to save it on the stack so messed with r2. Without the add r2 line the return value was 8 with the add r2 line the return value was 0x68 then 0x65. Note this is on a pi zero. So some clocks are a little faster than yours.
Remember this is running from dram and dram is painfully slow. So you may be seeing some of that.
Initial alignment of the code:
00008024 <test>:
8024: e3a00001 mov r0, #1
8028: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
802c: e3a00005 mov r0, #5
8030: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
8034: ee1f0f3c mrc 15, 0, r0, cr15, cr12, {1}
8038: e2833003 add r3, r3, #3
803c: e2822001 add r2, r2, #1
8040: ee1f1f3c mrc 15, 0, r1, cr15, cr12, {1}
8044: e0410000 sub r0, r1, r0
8048: e12fff1e bx lr
Yep if I uncomment the nop in front of .globl test, and I comment out the add r2 I only have the add r3 as the code under test, but the nop pushes the alignment of the whole block of code. with the add r3 and no nop I get 8 counts with the add r3 and the nop I get 0x67 counts.
So I think this is just a case of measuring the fetch. I have not enabled the arm cache, but there may be a deeper cache or an mmu or other since this ram is shared between the arm and the gpu.
If I go one step further and uncomment the nop have both the add r3 and the add r2 it is 0x69 counts. or basically on par or barely longer than one instruction, so we forced a fetch in there.
so I my case if I add more nops so the initial read of the count is aligned on an 8 word boundary, and I have the two instructions being measured
00008030 <test>:
8030: e3a00001 mov r0, #1
8034: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
8038: e3a00005 mov r0, #5
803c: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
8040: ee1f0f3c mrc 15, 0, r0, cr15, cr12, {1}
8044: e2833003 add r3, r3, #3
8048: e2822001 add r2, r2, #1
804c: ee1f1f3c mrc 15, 0, r1, cr15, cr12, {1}
8050: e0410000 sub r0, r1, r0
8054: e12fff1e bx lr
I get a count of 8. I put a third instruction in there an add r3 and two add r2s. still a count of 8.
If I go back to this where at least part of it is in a different fetch line.
00008024 <test>:
8024: e3a00001 mov r0, #1
8028: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
802c: e3a00005 mov r0, #5
8030: ee0f0f1c mcr 15, 0, r0, cr15, cr12, {0}
8034: ee1f0f3c mrc 15, 0, r0, cr15, cr12, {1}
8038: e2833003 add r3, r3, #3
803c: e2822001 add r2, r2, #1
8040: ee1f1f3c mrc 15, 0, r1, cr15, cr12, {1}
8044: e0410000 sub r0, r1, r0
8048: e12fff1e bx lr
And I do three runs without changing anything, and then enable the l1 cache (instruction) and do three more runs I get
00000068
0000001D
0000001D
0000001F
00000008
00000008
So I think you are dealing with dram which is slow, fetch lines, cache misses and and hits and resulting cache line fetches.
If you were expecting to see the number of clocks it took to execute an instruction you wont, you dont have zero wait state memory unless you can keep the entire code under test in the l1 cache.
I dont think there is on chip sram that you can use for this kind of thing for this chip/board, you are going to end up hitting dram and that dram is shared with the gpu. So basically program execution time is not expected to be deterministic and as with your computer or phone or other the cpu is not the bottleneck has not been for a long time it is sitting around waiting to be fed data or instructions.

GCC asm inline constraints, conflicting register allocation

I've made some ARM-inline assembler code.
Looking in Semaphore.s, I see that gcc is using register r3 for both two variables: "success" and "change". I wonder if there is a problem with my constraints?
First most relevant code lines:
asm inline:
"1: MVN %[success], #0 # success=TRUE=~FALSE\n\t"
"LDREX %[value], %[signal] # try to get exclusive access\n\t"
"ADDS %[newValue], %[value], %[change] # new value = value + change\n\t"
constraints:
: [signal] "+m" (signal), [success] "=r" (success), [locked] "=r" (locked), [newValue] "=r" (newValue), [value] "=r" (value)
: [borderValue] "r" (borderValue), [change] "r" (change)
: "cc"
symbol file:
1: MVN r3, #0 # success=TRUE=~FALSE
LDREX r0, [r7, #12] # try to get exclusive access
ADDS r1, r0, r3 # new value = value + change
More source and generated symbol is below.
BOOLEAN Semaphore_exclusiveChange (INT32U * signal, INT32S change, INT32U borderValue)
{
BOOLEAN success;
INT32U locked;// exclusive status
INT32U newValue;
INT32U value;
asm (
"1: MVN %[success], #0 # success=TRUE=~FALSE\n\t"
"LDREX %[value], %[signal] # new to get exclusive access\n\t"
"ADDS %[newValue], %[value], %[change] # new value = value + change\n\t"
"ITE MI # if (new value<0) \n\t"
" SUBSMI %[newValue], %[newValue] # (new value<0): new value=0, set zero flag \n\t"
"# else\n\t"
" CMPPL %[newValue], %[borderValue] # (new value>=0): if new value > border value \n\t"
"\n\t# zero flag is either: new value=0 or =bordervalue\n\t"
"ITE HI # if new signal level > border value \n\t" //
" MOVHI %[success], #0 # fail to raise signal, success=FALSE \n\t"
"\t# else\n\t"
" MOVLS %[value], %[newValue] # use new value \n\t" // ok
"STREX %[locked], %[value], %[signal] # new exclusive store of value\n\t"
"TST %[locked],%[locked] # is locked? \n\t"
"IT NE # if locked \n\t"
"BNE 1b # try again\n\t"
"DMB # memory barrier\n\t" //
: [signal] "+m" (signal), [success] "=r" (success), [locked] "=r" (locked), [newValue] "=r" (newValue), [value] "=r" (value)
: [borderValue] "r" (borderValue), [change] "r" (change)
: "cc" );
return success;
}
Relevant text from symbol file:
Semaphore_exclusiveChange:
.LFB2:
.loc 1 10 0
# args = 0, pretend = 0, frame = 32
# frame_needed = 1, uses_anonymous_args = 0
# link register save eliminated.
push {r7}
.LCFI0:
sub sp, sp, #36
.LCFI1:
add r7, sp, #0
.LCFI2:
str r0, [r7, #12]
str r1, [r7, #8]
str r2, [r7, #4]
.loc 1 16 0
ldr r2, [r7, #4]
ldr r3, [r7, #8]
# 16 "../drivers/Semaphore.c" 1
1: MVN r3, #0 # success=TRUE=~FALSE
LDREX r0, [r7, #12] # new to get exclusive access
ADDS r1, r0, r3 # new value = value + change
ITE MI # if (new value<0)
SUBSMI r1, r1 # (new value<0): new value=0, set zero flag
# else
CMPPL r1, r2 # (new value>=0): if new value > border value
# zero flag is either: new value=0 or =bordervalue
ITE HI # if new signal level > border value
MOVHI r3, #0 # fail to raise signal, success=FALSE
# else
MOVLS r0, r1 # use new value
STREX r2, r0, [r7, #12] # new exclusive store of value
TST r2,r2 # is locked?
IT NE # if locked
BNE 1b # try again
DMB # memory barrier
# 0 "" 2
.thumb
strb r3, [r7, #19]
str r2, [r7, #20]
str r1, [r7, #24]
str r0, [r7, #28]
.loc 1 38 0
ldrb r3, [r7, #19] # zero_extendqisi2
.loc 1 39 0
mov r0, r3
add r7, r7, #36
mov sp, r7
pop {r7}
bx lr
You need to constrain "success" further with '&':
: [signal] "+m" (signal), [success] "=&r" (success), [locked] "=r" (locked), [newValue] "=r" (newValue), [value] "=r" (value)
which marks it as an 'early clobber'. Otherwise the compiler will assume that all outputs are produced after all inputs are consumed and is free to use the same register for a different output and input.
If you have a "input/output" value, you need to use the "repeating value" constraint.

Resources