ARM Branch Prediction based on code - arm

so I'm trying to study for this test and one of the things on the study guide gives us some ARM Code and tells us to fill in a branch prediction table based on how the code runs.
I can understand how I'm supposed to do it, but with branch prediction the actual outcome comes into play, too, and I can't figure out the outcome (per cycle) from the code. The code is:
MOV r0, #4
B1 MOV r2, #5; Branch 1
SUB r2, r2, r0
B2 SUBS r2, r2, #1; Branch 2
BNE B2
SUBS r0, r0, #1
BNE B1
What's confusing me is the BNE statements. Usually when I see one of those conditional statements there has been a CMP statement earlier in the code, and that way I can know whether to take the branch or not. But I see no compare statement anywhere in this code, so I don't know how to determine if I should take a branch or not.

The SUBS performs a subtraction and a comparison with zero of the result in one instruction. The BNE conditional branches use the condition flags that were set by these SUBS instructions

Related

What would be a reason to use ADDS instruction instead of the ADD instruction in ARM assembly?

My course notes always use ADDS and SUBS in their ARM code snippets, instead of the ADD and SUB as I would expect. Here's one such snippet for example:
__asm void my_capitalize(char *str)
{
cap_loop
LDRB r1, [r0] // Load byte into r1 from memory pointed to by r0 (str pointer)
CMP r1, #'a'-1 // compare it with the character before 'a'
BLS cap_skip // If byte is lower or same, then skip this byte
CMP r1, #'z' // Compare it with the 'z' character
BHI cap_skip // If it is higher, then skip this byte
SUBS r1,#32 // Else subtract out difference to capitalize it
STRB r1, [r0] // Store the capitalized byte back in memory
cap_skip
ADDS r0, r0, #1 // Increment str pointer
CMP r1, #0 // Was the byte 0?
BNE cap_loop // If not, repeat the loop
BX lr // Else return from subroutine
}
This simple code for example converts all lowercase English in a string to uppercase. What I do not understand in this code is why they are not using ADD and SUB commands instead of ADDS and SUBS currently being used. The ADDS and SUBS command, afaik, update the APSR flags NZCV, for later use. However, as you can see in the above snippet, the updated values are not being utilized. Is there any other utility of this command then?
Arithmetic instructions (ADD, SUB, etc) don't modify the status flag, unlike comparison instructions (CMP,TEQ) which update the condition flags by default. However, adding the S to the arithmetic instructions(ADDS, SUBS, etc) will update the condition flags according to the result of the operation. That is the only point of using the S for the arithmetic instructions, so if the cf are not going to be checked, there is no reason to use ADDS instead of ADD.
There are more codes to append to the instruction (link), in order to achieve different purposes, such as CC (the conditional flag C=0), hence:
ADDCC: do the operation if the carry status bit is set to 0.
ADDCCS: do the operation if the carry status bit is set to 0 and afterwards, update the status flags (if C=1, the status flags are not overwritten).
From the cycles point of view, there is no difference between updating the conditional flags or not. Considering an ARMv6-M as example, ADDS and ADD will take 1 cycle.
Discard the use of ADD might look like a lazy choice, since ADD is quite useful for some cases. Going further, consider these examples:
SUBS r0, r0, #1
ADDS r0, r0, #2
BNE go_wherever
and
SUBS r0, r0, #1
ADD r0, r0, #2
BNE go_wherever
may yield different behaviours.
As old_timer has pointed out, the UAL becomes quite relevant on this topic. Talking about the unified language, the preferred syntax is ADDS, instead of ADD (link). So the OP's code is absolutely fine (even recommended) if the purpose is to be assembled for Thumb and/or ARM (using UAL).
ADD without the flag update is not available on some cortex-ms. If you look at the arm documentation for the instruction set (always a good idea when doing assembly language) for general purpose use cases that is not available until a thumb2 extension on armv7-m (cortex-m3, cortex-m4, cortex-m7). The cortex-m0 and cortex-m0+ and generally wide compatibility code (which would use armv4t or armv6-m) doesn't have an add without flags option. So perhaps that is why.
The other reason may be to get the 16-bit instruction not the 32, but but that is a slippery slope as it gets even more into assemblers and their syntax (syntax is defined by the assembler, the program that processes assembly language, not the target). For example not syntax unified gas:
.thumb
add r1,r2,r3
Disassembly of section .text:
00000000 <.text>:
0: 18d1 adds r1, r2, r3
The disassembler knows reality but the assembler doesn't:
so.s: Assembler messages:
so.s:2: Error: instruction not supported in Thumb16 mode -- `adds r1,r2,r3'
but
.syntax unified
.thumb
adds r1,r2,r3
add r1,r2,r3
Disassembly of section .text:
00000000 <.text>:
0: 18d1 adds r1, r2, r3
2: eb02 0103 add.w r1, r2, r3
So not slippery in this case, but with the unified syntax you start to get into blahw, blah.w, blah, type syntax and have to spin back around to check to see that the instructions you wanted are being generated. Non-unified has its own games as well, and of course all of this is assembler-specific.
I suspect they were either going with the only choice they had, or were using the smaller and more compatible instruction, especially if this were a class or text, the more compatible the better.

How to write at least two `area`s in ARM assembly?

I'm trying to write an area which defines my data on the RAM, and an area for my code. I tried to do that, but I just can't make it work. I also tried using EXPORT and IMPORT but I couldn't solve various errors while using them.
AREA HEAP, READWRITE, ALIGN=3
MYSTR DCB "JaVid",0
AREA RESET, CODE, READONLY
;IMPORT MYSTR
ENTRY
ADR R0, MYSTR ;STRING POINTER
NEXT LDRB R1, [R0] ;CHARACTER HOLDER
CMP R1, #'a'
BLT OK
CMP R1, #'z'
BGT OK
;WE NEED TO SWITCH
SUB R1, #'a'-'A'
OK STRB R1, [R0], #1
B NEXT
END
Would you please give me an example of how it's done?
I suspect you may be having problems because you cannot use ADR to refer to a symbol in another AREA. See this section of the fine manual
You probably want to use LDR r0, =MYSTR instead.
Also, your loop does not seem to terminate.
[Note: It would be helpful it you showed the exact error message you are getting and the command lines you are using with the code above.]

Is shift operation running in separated instruction in thumb ISA?

ARM instructions may utilize barrel shifter in its second source operand (see below assembly listed), which is part of the data process instruction so save one instruction to do shifting. I am wondering could thumb instruction utilize barrel shift in DP instructions? Or should it separate the shift operation into an independent instruction? I am asking this since thumb may not has sufficient space in the instruction to code barrel shifter.
mov r0, r1, LSL #1
That example's not great, since it's an alternate form of the canonical lsl r0, r1, #1, which does have a 16-bit Thumb encoding (albeit with flag-setting restrictions).
An alternative ARM instruction such as add r0, r0, r1, lsl #1 would indeed have to be done as two Thumb instructions because as you say there just isn't room to squeeze both operations into 16 bits (hence also why you're limited to r0-r7 so registers can be encoded in 3 bits rather than 4).
Thumb-2, on the other hand, generally does have 32-bit encodings for immediate shifts of operands, so on ARMv6T2 and later architectures you can encode add r0, r0, r1, lsl #1 as a single instruction.
The register-shifted register form, however, (e.g. add r0, r0, r1, lsl r2) isn't available even in Thumb-2 - you'd have to do that in 2 Thumb instructions like so:
lsl r1, r2
add r0, r1
Note that unlike the ARM instruction this sequence changes the value in r1 - if you wanted to preserve that as well you'd need an intermediate register and an extra mov instruction (or a Thumb-2 3-register lsl) - failing that the last resort would be to bx to ARM code.

Cortex m3 first instruction execution

I am using Sourcery CodeBench Lite 2012.03-56 compiler and gdb suite with texane gdb server.
Today I wanted to try FreeRTOS demo example for cheap STM32VLDISCOVERY board, I copied all the source files needed, compiled without errors but the example didn't work. I fired up debugger and noticed that example fails when it is trying to dereference pointer to GPIO registers. Global array variable which contains pointers to GPIO registers:
GPIO_TypeDef* GPIO_PORT[LEDn] = {LED3_GPIO_PORT, LED4_GPIO_PORT};
was not properly initialized and was filled with some random values. I checked preprocessor defines LED3_GPIO_PORT and LED3_GPIO_PORT and they were valid.
After some researching where the problem might be I looked at the start-up file provided for trueSTUDIO found in CMSIS lib. Original startup_stm32f10x_md_vl.S file:
.section .text.Reset_Handler
.weak Reset_Handler
.type Reset_Handler, %function
Reset_Handler:
/* Copy the data segment initializers from flash to SRAM */
movs r1, #0
b LoopCopyDataInit
CopyDataInit:
ldr r3, =_sidata
ldr r3, [r3, r1]
str r3, [r0, r1]
adds r1, r1, #4
LoopCopyDataInit:
ldr r0, =_sdata
ldr r3, =_edata
adds r2, r0, r1
cmp r2, r3
bcc CopyDataInit
ldr r2, =_sbss
b LoopFillZerobss
...
During the debugging I noticed that register r1 is never initialized to zero by first instruction movs r1, #0. Register r1 is used as a counter in loop so when the execution reaches loop LoopCopyDataInit it never enters the loop since register r1 is loaded with some garbage data from previous execution. As the result of this the startup code never initializes the .data section.
When I placed two nop instructions before movs r1, #0 instruction then register r1 was initialized to 0 and the example began to work:
Modified part of startup_stm32f10x_md_vl.S file:
/* Copy the data segment initializers from flash to SRAM */
nop
nop
movs r1, #0
b LoopCopyDataInit
This is disassembly of relevant parts of final code:
Disassembly of section .isr_vector:
08000000 <g_pfnVectors>:
8000000: 20002000 andcs r2, r0, r0
8000004: 08000961 stmdaeq r0, {r0, r5, r6, r8, fp}
...
Disassembly of section .text:
...
8000960 <Reset_Handler>:
8000960: 2100 movs r1, #0
8000962: f000 b804 b.w 800096e <LoopCopyDataInit>
08000966 <CopyDataInit>:
8000966: 4b0d ldr r3, [pc, #52] ; (800099c <LoopFillZerobss+0x16>)
8000968: 585b ldr r3, [r3, r1]
800096a: 5043 str r3, [r0, r1]
800096c: 3104 adds r1, #4
As you can see the ISR vector table is properly pointing to Reset_Handler address. So, what is happening? Why the first instruction movs r1, #0 was never executed in original start-up code?
EDIT:
The original code works when I power off the board and power it back on again. I can reset the MCU multiple times and it works. When I start gdb-server then the code doesn't work, even after reset. I have to power cycle it again to work. I guess this is some debugger weirdness going on.
NOTE:
I had a look what start-up code are other people using with this MCU and they either disable interrupts or load SP register with a linker defined value which is in both cases redundant. If they got hit by this odd behaviour they would not notice it, ever.
Sounds like a bug in your debugger. Probably it sets a breakpoint on the first instruction and either skips it completely or somehow reexecuting it doesn't work properly. The issue could be complicated by the fact that it's a reset vector, maybe it's just not possible to reliably stop at the first instruction. Since the NOPs help, I'd recommend leaving them in place while you're developing your program.
However, there is an alternative solution. Since it's unlikely that you'll need to modify the array, you don't really need it in writable section. To have the compiler put the array in the flash, usually it's enough to declare it as const:
GPIO_TypeDef* const GPIO_PORT[LEDn] = {LED3_GPIO_PORT, LED4_GPIO_PORT};
Nothing is hopping out right away as to what could be wrong there. First, how are you debugging this code? Are you attaching a debugger and then issuing a reset to the processor via JTAG? I would try putting b Reset_Handler in there right after your Reset_Handler: label as your first instruction, flash it on, turn on the board, then connect the JTAG just so you can minimize any possible weirdness from the debugger. Then set your PC to that mov instruction and see if it works. Is a bootloader or boot ROM launching this code? It could be something weird going on with the instruction or data cache.

ARM v7 ORRS mnemonic without operand2

What does:
ORRS R1, R3 do?
Is it just R1 |= R3, possibly setting the N- or Z flags?
This might be blatantly obvious, but I haven't found any documentation that describes the ORR mnemonic without operand2.
Feedback is greatly appreciated.
-b.
It's a Thumb instruction. In a 16-bit Thumb opcode you can only fit the two operands, you don't get the extra operand2.
As you guessed, it does R1 |= R3. The S flag's presence indicates this instruction is part of an If-Then block (what Thumb has instead of proper ARM conditional execution); ORR and ORRS generate the same opcode and differ only by context.
In ARM assembly, instructions where the destination register (Rd) is the same as the source register (Rn) can omit Rd. So these are the same and will assemble to exactly the same instruction.
orrs r1, r1, r3
orrs r1, r3
add r1, r1, #5
add r1, #5
and so on.
It may be in the armv7m trm or its own trm, but they have a, I dont remember the term unified, assembly language, so
add r0,r3
will assemble for both arm and thumb, for thumb it is as is for arm the above equates to add r0,r0,r3. For thumb you dont get the three register option, but the functionality is implied r0=r0+r3.

Resources