I'm working on a STM32l475 micro-controller which runs a Cortex-M4 processor and ARM/Thumb instruction sets. I see (from objdump) that there are beq.n and bne.n instructions generated in the binary of an ARM program (I added the -mthumb flag when compile the program). However, I don't find these branch instructions in the latest ARMv7-M manual.
Can anyone tell me the reason? And what are the instructions available in the manual that are equivalent to these two branch instructions?
beq and bne are conditional branches; in other words, they are conditional versions of the unconditional branch b. eq and ne are two different condition codes; they are described in section A7.3. beq means branch if equal and bne means branch if not equal.
The b branch instruction has two different encodings in Thumb mode. The encoding you're seeing is probably encoding T1 described in section A7.7.12:
B<c> <label>
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 [-<cond>--] [--------imm8---------]
In this encoding, the condition code (like eq or ne) is encoded directly into the instruction, in bits 8-11. The disassembly from objdump displays the condition code in place of <c> above. So using the condition code table in section A7.3, you would encode beq as 11010000[imm8].
Related
My C compiler (GCC) is producing this code which I don't think is optimal
8000500: 2b1c cmp r3, #28
8000502: bfd4 ite le
8000504: f841 3c70 strle.w r3, [r1, #-112]
8000508: f841 0c70 strgt.w r0, [r1, #-112]
It seems to me that the compiler could happily omit ITE LE instruction as the two stores following it use the LE and GT flags from the CMP instruction so only one will actuall be performed. The ITE instruction means that only one of the STRs will be tested and performed so the time should be equal, but it is using an extra word of instruction memory.
Any opinions on this ?
In Thumb mode, the instruction opcodes (other than branch instructions) don't have any space for conditional execution. In Thumb1, this meant that one simply had to use branches to skip instructions if necessary.
In Thumb2 mode, the IT instruction was added, which adds the conditional execution capability, without embedding it into the instruction opcodes themselves. In your case, the le condition part of the strle.w instruction is not embedded in the opcode f841 3c70, but is actually inferred from the preceding ite le instruction by the disassembler. If you use a hex editor to change the ite le instruction to something else, the strle.w and strgt.w will both suddenly disassemble into plain str.w.
See the other linked answer, https://stackoverflow.com/a/26001101, for more details.
The unified assembler syntax, which supports A32 and T32 targets, has added some confusion here. What is being shown in the disassembly is more verbose than what is encoded in the opcodes.
Your ITE instruction is very much a thumb instruction set placeholder, it defines an IT block which spans the following two instructions (and being thumb, those two instructions are not individually conditional). From a micro-architecture/timing point of view, it is only necessary to execute one instruction (but you shouldn't assume that this folding always takes place).
The strle/strgt syntax could be used on it's own for a T32 target, where the IT block is not necessary since the instruction set has a dedicated condition code field.
In order to write (or disassemble) code which can be used by both A32 and T32 assemblers, what you have here is both approaches to conditional execution written together. This has the advantage that the same assembly routine can be more portable (even if the resulting code is not identical - optimisations in the target cpu will also be different).
With T32, the combination of an it and a single 16 bit instruction matches the instruction density of the equivalent A32 instruction, if more than one conditional instruction can be combined, there is an overall win.
I'm just getting started with the ARM architecture on my Nucleo STM32F303RE, and I'm trying to understand how the instructions are encoded.
I have running a simple LED-blinking program, and the first few disassembled application instructions are:
08000188: push {lr}
0800018a: sub sp, #12
235 __initialize_hardware_early ();
0800018c: bl 0x80005b8 <__initialize_hardware_early>
These instructions resolve to the following in the hex file (displayed weird in Eclipse -- each 32-bit word is in MSB order, but Eclipse doesn't seem to know it... but that's for another topic):
address 0x08000188: B083B500 FA14F000
Using the ARM Architecture Ref Manual, I've confirmed the first 2 instructions, push (0xB500) and sub (0xB083). But I can't make any sense out of the "bl" instruction.
The hex instruction is 0xFA14F000. The Ref Manual says it breaks down like this:
31.28 27 26 25 24 23............0
cond 1 0 1 L signed_immed_24
The first "F" (0xF......) makes sense: all conditions are set (ALways).
The "A" doesn't make sense though, since the L bit should be set (1011). Shouldn't it be 0xFB......?
And the signed_immed_24 doesn't make sense, either. The ref manual says:
- start with 0x14F000
- sign extend to 30 bits (signed 2's-complement), giving 0x0014F000
- shift left to form 32-bit value, giving 0x0053C000
- add to the PC, which is the current instruction + 8, giving 0x0800018c + 8 + 0x0053C000, or 0x0853C194.
So I get a branch address of 0x0853C194, but the disassembly shows 0x080005B8.
What am I missing?
Thanks!
-Eric
bl is two, separate, 16 bit instructions. The armv5 (and older) ARM ARM does a better job of documenting them.
111HHoffset11
From the ARM ARM
The first Thumb instruction has H == 10 and supplies the high part of
the branch offset. This instruction sets up for the subroutine call
and is shared between the BL and BLX forms.
The second Thumb instruction has H == 11 (for BL) or H == 01 (for
BLX). It supplies the low part of the branch offset and causes the
subroutine call to take place.
0xFA14 0xF000
0xF000 is the first instruction upper offset is zeros
0xFA14 is the second instruction offset is 0x214
If starting at 0x0800018c then it is 0x0800018C + 4 + (0x0000214<<1) = 0x080005B8. The 4 is the two instructions head for the current PC. And the offset is units of (16 bit) instructions.
I guess the armv7-m ARM ARM covers it as well, but is harder to read, and apparently features were added. But they do not affect you with this branch link.
The ARMv5 ARM ARM does a better job of describing what happens as well. you can certaily take these two separate instructions and move them apart
.byte 0x00,0xF0
nop
nop
nop
nop
nop
.byte 0x14,0xFA
and it will branch to the same offset (relative to the second instruction). Maybe the broke that in some cores, but I know in some (after armv5) it works.
Task 1: Write the corresponding ARM assembly representation for the following instructions:
11101001_000111000001000000010000
11100100_110100111000000000011001
10010010_111110100100000011111101
11100001_000000010010000011111010
00010001_101011101011011111001100
Task 2: Write the instruction code for the following instructions:
STMFA R13!, {R1, R3, R5-R11}
LDR R11, [R3, R5, LSL #2]
MOVMI R6, #1536
LDR R1, [R0, #4]!
EORS R3, R5, R10, RRX
I have zero experience with this material and the professor has left us students out to dry. Basically I've found the various methods for decoding these instructions but I have three major doubts still.
I don't have any idea on how to get started on decoding binary to ARM Instructions which is the first part of the homework.
I can't find some of these suffixes for example on EORS what is the S? Is it the set condition bit? Is it set to 1 when there is an S in front of the instruction?
I don't what to make of having multiple registers in one instruction line. Example:
EORS R3,R5,R10,RRx
I don't understand what's going on there with so many registers.
Any nudge in the right direction is greatly appreciated. Also I have searched the ARM manual, they're not very helpful for someone with no understanding of what they're looking for. They do have the majority of instructions for coding and decoding but have little explanation for the things I asked above.
If you have the ARM v7 A+R architecture manual (DDI0406C) there is a good table-based decode/disassembly description in chapter A5. You start at table A5.1 and and depending on the value of different bits in the instruction word it refers to more and more specific tables leading to the instruction.
As an example, consider the following instruction:
0001 0101 1001 1111 0000 0000 0000 1000
According to the first table it is an unsigned load/store instruction since the condition is not 1111 and op1 is 010. The encoding of this is further expanded in A5.3
From this section we see that A=0, op1=11001, Rn=1111 (PC), and B=0. This implies that the instruction is LDR(literal). Checking the page describing this instruction and remembering that cond=0001 we see that the instruction isLDRNE R0, [PC, #4].
To do the reverse procedure you look up the instruction in the alphabetical list of instructions and follow the pattern.
Looking at a different part of (one of) the ARM architectural reference manuals (not the cortex-m (armv6m armv7m) you want the ARMv5 one or the ARMv7-AR one) going to look at a thumb instruction, but the ARM instructions work the same way and are a couple chapters prior.
it says thumb instruction set as the section/chapter, then shortly into that is a table that shows thumb instruction set encoding or you can just search for that. one if them is called Add/subtract register, there are a lot of hardcoded ones and zeros up front then bit 9 is opc, then rm, rn and rd bits.
arm toolchains are easy to come by for windows mac and linux or can easily build from sources (just need binutils). assembling this
.thumb
add r1,r2,r3
add r1,r2,r4
add r1,r2,r5
add r1,r2,r6
add r1,r2,r7
then disassembling gives
00000000 <.text>:
0: 18d1 adds r1, r2, r3
2: 1911 adds r1, r2, r4
4: 1951 adds r1, r2, r5
6: 1991 adds r1, r2, r6
8: 19d1 adds r1, r2, r7
from that chart in the ARM ARM the add register starts with hardcoded bits 000110 the instructions above start with 0x18 or 0x19 which both start with the 6 bits 000110 (00011000 or 00011001). In the alphabetical list of thumb instructions we look for the add instructions. find the three register one and in this case it has 7 bits which happen to match the bits we are decoding 0001100 so we are on the right track. the the last 9 bits are three sets of three
0x1951 is 0001100101010001 or 0001100 101 010 001, the last nine represent r5, r2, and r1. Looking at the syntax part of the instruction it shows add rd, rn, rm but the machine code has rm, rn, rd so we take the machine code and rearrange per the syntax and get add r1,r2,r5. Not bad it matches, now unfortunately the s here is confusing, this thumb instruction doesnt have an s bit there wasnt room so this one always updates the flags, so the nature of this instruction and toolchain and how I used it requires the add without the s on the assembly end and disassembles with the s. confusing, sorry. when using arm instructions the letter s works as expected.
Just repeat this with the ARM instructions in either direction. The immediate values are going to be the most challenging part of this. Or not depends on the specific instruction and immediate encoding.
I am working on a software-based implementation of ARM processor in C.
Given an ARM data processing instruction:
instruction = 0xE3A01808; 1110 0 0 1 1101 0 0000 0001 1000 00001000
Which translates to: MOV r0,#8; shifted by 8 bits.
How to check whether the 8 bit shift is right or left shift?
With ARM 12-bit modified immediate constants, there is no shift, in any direction - it's a rotation, specifically, <7:0> rotated right by 2*<11:8>. Thus the encoding 0x808 represents 8 ROR (2*8), meaning 0xE3A01808 disassembles to mov, r1, #0x80000.
(Note that the canonical encoding of a modified immediate constant is the one with the smallest rotation, so mov, r1, #0x80000 would assemble to 0xE3A01702, i.e. 2 ROR 14, rather than 8 ROR 161).
As for implementing bitwise rotation in C, to solve that there's either compiler intrinsics or the standard shift-part-in-each-direction idiom x>>n | x<<(32-n).
[1] To get a specific encoding, UAL assembly allows an immediate syntax with the constant and rotation specified separately, i.e. mov r1, #8, 16. For full detail, this is all spelled out in the ARM ARM (section A5.2.4 in the v7 issue C I have here) - essentially, the choice of encodings permits a little funny business with flags in certain situations.
I'm not sure if this is what you're referring to, but here's some documentation that seems relevant:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0040d/ch05s05s01.html
The 'WITHDRAWN' pasted over the docs doesn't inspire much confidence.
It would seem to suggest a rotate right. Which plays with how I remember the barrel shifter being addressed on arm more generally (There's a ROR operation for instance, see http://www.davespace.co.uk/arm/introduction-to-arm/barrel-shifter.html)
How to differentiate arm instruction and thumb instruction? For example:
add r1, r2, r3 ;add r2 and r3, then store the result in r1 register
How does the above instruction work in terms of arm and thumb instruction?
Well go to infocenter.arm.com and get the architectural reference manual for the architecture in question, or just get the ARMv7 manual (not the -M but the -A or -R) which will include all instruction codings to date from ARMv4 to ARMv7 including thumb and the most mature thumb2 extensions. (you might need multiple architectural reference manuals and/or technical reference manuals as the encoding of instructions is hit or miss in the arm manuals)
Under thumb instructions look at the register based ADD instruction, there is one encoding with three registers Encoding T1 which is listed as all thumb variants (ARMv4T to the present(ARMv4T, ARMv5, ARMv6, ARMv7 and likely ARMv8))
bits 15 to 9 are 0b0001100 three bits of rm, three bits of rn and three bits of rd (typically thumb instructions are limited to r0-r7 needing three bits to encode, thumb2 extensions and a few special thumb instructions allow higher numbered registers (four bits of encoding)).
The instruction is listed as ADDS rd,rn,rm in the description, the S means save flags which is from the parent ARM instruction from which the thumb instruction was derived, for ARM instructions you have the choice to modify flags or not, thumb instructions you do not (thumb2 has a way to control this but it has limitations (for the add instruction)).
ADDS rd,rn,rm
0001100 rm rn rd
So ADDS r1,r2,r3 would be this chunk of bits
0001100 011 010 001 = 0001100011010001 = 0001 1000 1101 0001 = 0x18D1
looking at the ADD instruction in ARM mode you start with a condition field, as you have written your question this is an ALWAYS or a 1110 pattern (always execute) also as you have written your question you wrote add not adds so not saving flags so the s bit is zero in the encoding
so add rd,rn,shifter operand we start off with the bit pattern 0b111000I01000 then four
bits for rn four for rm and 11 for the shifter operand. Yes that is an I an bit position 25 not a one. the I is part of the shifter operand encoding
Now go to the section of the manual that describes the shifter operand encoding. the encoding that is just a register rm is bit 25 (the I bit) is zero, and 11 to 4 are zero with 3 to 0 being rm so add rd,rn,rm
1110 00 0 01000 rn rd 00000 000 rm
1110 00 0 01000 0001 0010 00000 000 0011 = 1110 0000 1000 0001 0010 0000 0000 0011 = 0xE0812003
Now we can test this, take this program
add r1,r2,r3
.thumb
add r1,r2,r3
call it add.s assemble then disassemble
arm-none-eabi-as add.s -o add.o
arm-none-eabi-objdump -D add.o
and get
Disassembly of section .text:
00000000 <.text>:
0: e0821003 add r1, r2, r3
4: 18d1 adds r1, r2, r3
which matches the hand encoding.
Now if you are trying to disassemble a chunk of bytes that you dont know what kind they are, that is a different story, this can be very difficult at best, ideally you want to disassemble the whole binary by following the execution and mode changes (which you might not be able to figure out without simulating the execution). One clue is that ARM instructions generally use the ALways condition which is 0xE at the beginning of the instruction so if you see lots of 32 bit words in the form 0xExxxxxxx those are likely arm instructions and not data and not thumb instructions. Pure thumb will have a not so typical pattern say 0x6xxx and 0x7xxx but also a mixture of all other starting values. Thumb2 extensions can start on either halfword boundary and will have a more distinctive start pattern for 32 bit words but because they are mixed with non-thumb2 extensions and not always aligned on 32 bit boundaries thumb with or without thumb2 extensions are not so easy to visually isolate from data only the ARM instructions are easy to visually isolate.
In practice, no reason to compile a library as arm unless you intentionally choose to make everything harder.
Switching between the arm and thumb modes takes some nanoseconds, it's hardware backed, and is by the way much faster than switching between the kernel and user mode you usually experience.
If you ask me why the whole set of Google's libraries is arm, I'll tell you that there's absolutely no reason albeit that they should keep the things backward compatible and consistent.