Disassembly of a mixed ARM/Thumb2 ELF file - arm

I'm trying to disassemble an ELF executable which I compiled using arm-linux-gnueabihf to target thumb-2. However, ARM instruction encoding is making me confused while debugging my disassembler. Let's consider the following instruction:
mov.w fp, #0
Which I disassembled using objdump and hopper as a thumb-2 instruction. The instruction appears in memory as 4ff0000b which means that it's actually0b00f04f (little endian). Therefore, the binary encoding of the instruction is:
0000 1011 0000 0000 1111 0000 0100 1111
According to ARM architecture manual, it seems like ALL thumb-2 instructions should start with 111[10|01|11]. Therefore, the above encoding doesn't correspond to any thumb-2 instruction. Further, it doesn't match any of the encodings found on section A8.8.102 (page 484).
Am I missing something?

I think you're missing the subtle distinction that wide Thumb-2 encodings are not 32-bit words like ARM encodings, they are a pair of 16-bit halfwords (note the bit numbering above the ARM ARM encoding diagram). Thus whilst the halfwords themselves are little-endian, they are still stored in 'normal' order relative to each other. If the bytes in memory are 4ff0000b, then the actual instruction encoded is f04f 0b00.

thumb2 are extensions to the thumb instruction set, formerly undefined instructions, now some of them defined. arm is a completely different instruction set. if the toolchain has not left you clues as to what code is thumb vs arm then the only way to figure it out is start with an assumption at an entry point and disassemble in execution order from there, and even there you might not figure out some of the code.
you cannot distinguish arm instructions from thumb or thumb+thumb2 extension simply by bit pattern. also remember arm instructions are aligned on 4 byte boundaries where thumb are 2 byte and a thumb 2 extension doesnt have to be in the same 4 byte boundary as its parent thumb, making this all that much more fun. (thumb+thumb2 is a variable length instruction set made from multiples of 16 bit values)
if all of your code is thumb and there are no arm instructions in there then you still have the problem you would have with a variable length instruction set and to do it right you have to walk the code in execution order. For example it would not be hard to embed a data value in .text that looks like the first half of a thumb2 extension, and follow that by a real thumb 2 extension causing your disassembler to go off the rails. elementary variable word length disassembly problem (and elementary way to defeat simple disassemblers).
16 bit words A,B,C,D
if C + D are a thumb 2 instruction which is known by decoding C, A is say a thumb instruction and B is a data value which resembles the first half of a thumb2 extension then linearly decoding through ram A is the thumb instruction B and C are decoded as a thumb2 extension and D which is actually the second half of a thumb2 extension is now decoded as the first 16 bits of an instruction and all bets are off as to how that decodes or if it causes all or many of the following instructions to be decoded wrong.
So start off looking to see if the elf tells you something, if not then you have to make passes through the code in execution order (you have to make an assumption as to an entry point) following all the possible branches and linear execution to mark 16 bit sections as first or additional blocks for instructions, the unmarked blocks cannot be determined necessarily as instruction vs data, and care must be taken.
And yes it is possible to play other games to defeat disassemblers, intentionally branching into the second half of a thumb2 instruction which is hand crafted to be a valid thumb instruction or the begnning of a thumb2.
fixed length instruction sets like arm and mips, you can linearly decode, some data decodes as strange or undefined instructions but your disassembler doesnt go off the rails and fail to do its job. variable length instruction sets, disassembly at best is just a guess...the only way to truly decode is to execute the instructions the same way the processor would.

Related

Detecting Thumb-2 instruction and location of PC offset

i'm kinda new to ARM and i am trying to understand how instructions are interpreted/executed:
From what i know, on ARM is quite simple since every instruction takes up 4 bytes and it's all aligned by 4 bytes also.
The problem comes with Thumb-2 where their instructions can be both 16/32bit long. I've read that to determine if the current instruction is 16/32 bits long the processor reads a word (32bit) and evaluates the first half-word on certain bits [15:11]. If those bits are 0b11101/0b11110/0b11111 then that halfword is the first halfword of a 32 bit instruction else it's a 16bit instruction (I don't quite get why those specific bytes determine that). So an example should be:
0x4000 16-bit
0x4002 32-bit
0x4006 16-bit
0x4008 16-bit
0x400a 32-bit
Then the processor should grab from 0x4000 to 0x4004, evaluate the first half-word (0x4000 to 0x4002) and if the instruction is 16 bit then it just jumps to the next half-word and repeats the process but if the half-word indicates a 32bit address then it skips the next half-word and executes that 32bit instruction?
Also, i'm confused on where does PC point in thumb-2, is it still two instructions further?
Most of us don't/won't know exactly how it is implemented in the logic (and there are various cores so each could be different). But what used to be undefined instructions became thumb-2 extensions a couple dozen in armv6-m then like 150 new ones in armv7-m.
Think of the processor fetching 16 bit instructions, and sometimes it runs across a variable length one. Just like other variable length processors, the x86 will look at the one byte instruction then based on that it may or may not need to look at the next byte and so on until it has resolved the whole instruction. Same here, it looks at a halfword determines if it has everything it needs, if not it grabs the next halfword for the rest of the information.
0x4000 16-bit
0x4002 32-bit
0x4006 16-bit
0x4008 16-bit
0x400a 32-bit
the processor grabs 0x4000 sees it has what it needs, executes. The processor grabs 0x4002, sees it needs another halfword, grabs 0x4004, executes. processor grabs 0x4006 has what it needs executes. grabs 0x4008 has what it needs executes. grabs 0x400A sees it needs another halfword, grabs 0x400C, executes.
Those bit patterns were formerly undefined instructions, now they are part of the definition of a variable length instruction. Just like instructions that start with 0b010000 are data processing instructions and to determine is it an add or an xor, you have to look at other bits. These bit patterns define thumb-2 extensions then other bits in those two half words define what the full instruction is.
Why these bit patterns? You can think of it is arbitrary if you want, all instruction sets someone(/group) sat down and decided what bit patterns where going to mean what, no different here. There was room in the instruction set space with certain patterns so those were used. Not uncommon to add instructions later in the life of a processor family, take x86 for example. Plus many others, for an 8 bitter like x86 or 6502 or whatever you can either consume an 8 bit instruction/opcode as your next new instruction or you take that formerly unused byte/opcode and expand it into many more for example you take a byte/opcode that was unused and that byte now means look at the next byte, that next byte could be up to 256 new instructions or it could simply supplement the first byte specifying registers or operations, etc. No different here, down the road arm extended the thumb instruction set, some percentage of the instruction is consumed indicating this is a variable length instruction, but of those 32 bits there still remains quite a few bits to allow for a larger instruction with more options. (but losing the one to one relationship between thumb and arm instructions, all thumb instructions (not thumb-2 extensions) map directly into a full sized arm instruction).
Each core is different they don't all fetch a word at a time, thumb-2 extensions don't have to be aligned so a whole thumb-2 instruction won't necessarily fit in an aligned word fetch for the processors that do word fetches. Think of the (pre)fetcher and decoder as two separate things, since they are, functionally the decoder takes 16 bits at a time in thumb mode, how is it specifically implemented? I don't know. Do they wait for two half words to be ready before decoding the first? I don't know. Is every implementation the same? I don't know, would expect not. As far as fetching goes they are not the same as you can see in the ARM documentation and I think at least one if not more the chip vendor can choose at compile time.
If you are coming from for example a MIPS based textbook and trying to understand other processors, this can be confusing, understand that those text books and terms are for understanding and vocabulary, pipelines are not that depth in general and you don't fetch whole instructions at a time in general (the x86 does not fetch one byte at a time, it fetches MANY instructions at a time). Risc-v has even worse of a problem than arm and mips as you can have 16 bit compressed instructions, 32 bit instructions, and 64 bit instructions, the 32 bit instructions do not have to be aligned on a risc-v (nor the 64 bit) so fetching 32 at a time doesn't get you a whole instruction, the fetcher is separate from the decoder once enough is there then the decoder can complete.
I want to say that thumb is two ahead (independent of a thumb2 extension or not) so pc+4, should be easy to figure out though.
Disassembly of section .text:
00000000 <hello-0xe>:
0: e005 b.n e <hello>
2: bf00 nop
4: bf00 nop
6: f000 b802 b.w e <hello>
a: bf00 nop
c: bf00 nop
0000000e <hello>:
e: bf00 nop
Yes, so two thumb sized halfwords ahead (pc+4) in both cases. It would be significantly more complicated if it were two instructions ahead which is how it used to be to make it easy to remember. If it were two instructions ahead then sometimes pc+4, sometimes pc+6, and sometimes pc+8 the logic would have to decode two instructions in order to know how the pc was offset for the first of the two, so sticking with pc+4 as it has always been for thumb mode is the sane way to do it.

ARM/Thumb interworking confusion regarding Thumb-2

I've been going through ARM ISA related documentation since a while and so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking. I'll quickly summarize that in the following:
Instructions can be either 4 byte aligned (ARM) or 2 byte aligned (Thumb).
Thumb and ARM instructions reside in separate regions i.e. they are not intermixed without explicit processor state change.
State change can happen upon executing either of bx, blx, ldm, ldr. Choosing between ARM or Thumb depends on the value of the least significant bit in the address which can be 0 or 1 respectively.
The current state of the processor can be either ARM or thumb. That depends on the state of bit 5 of CPSR.
Rules for state change can be summarized in the following figure taken from this paper:
However, Thumb-2 instructions have confused me a bit. For instance, let's inspect the encoding of instruction ADC which can be found in section A8.8.2 of the ARMv7-A/R reference manual. Basically, the same instruction has 3 distinct encodings 16 bit (Thumb), 32 bit (Thumb2), and 32 bit (ARM).
Here are my questions:
Does the 32-bit Thumb-2 instructions execute in ARM or Thumb mode of the processor? (I'm assuming its the latter but not sure)
Some resources mention that ARM/Thumb instructions can be "freely" intermixed in thumb-2. Does that mean explicit state change using bx, blx, ldm or ldr doesn't need to happen?
Final note, this is the closest question to mine, however, I'm focusing on interworking.
Choicing a mode
so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking.
Well, that is useful, it is really part of an older story. Originally, there was only ARM 32-bit instructions (1980-mid 1990s). Then ARM made a mode that was like a compression front-end that expanded a strictly 16bit opcodes to 32 bits. This was thumb mode (mid 1990s to ~2005). Then ARM came out with thumb2 (which is somewhat nebulous) mainly typified by a mix of both 16bit and 32bit instructions (~2005 to current).
The concept of interworking is only useful for a CPU with thumb (old) and ARM functions. If you have a thumb2 CPU and a good compiler with normal memory (1+ wait states), then the thumb2 is almost always the best choice.
Thumb2 intermixing
In a thumb2 capable processor, you do not need interworking! Ie, you don't change modes. You can use the thumb 16bit encodings and if you ask for a mnemonic where this is not possible, the assembler emits a 32bit version. The Cortex-M CPUs only have a thumb2 mode (really thumb mode with instruction extensions).
Disassembling
There are not really three types of opcodes but two with one extension.
Original 32 bit ARM opcodes.
16 bit only thumb encodings.
the thumb2 extension with all thumb opcodes plus more.
As the thumb opcodes are more dense, it is not possible to do all types of operations. So the thumb ADC is limited compared to the ARM. However, for most instructions ARM Holding updated the thumb2 (the only mode in the CPU is thumb; thumb2 is extra instructions/opcodes) to have all the capabilities of the ARM mode ADC.
There are discussions on recognizing the mode in a binary elsewhere. Assuming the code is not trying to obfuscate and people made rational choices, you will only have a two types of disassembly.
ARM 32 bit
thumb2
A thumb2 disassembler should work with pure thumb code. Most people do not use interworking. If they do, a large part of the binary will be thumb mode, with a small performance critical section in ARM mode.
A difficulty with thumb2 is the mixed 16/32 bit can lead a disassembler to mis-interpret an instruction stream if it decodes a 32bit encoding mid stream.
Final note, this is the closest question to mine, however, I'm focusing on interworking.
Interworking makes no sense on a thumb2 CPU. Since you question is tagged disassembling, I tried to answer with that focus versus the other questions that is mainly about what the modes are. For elf disassembly, the disassembler should have no trouble to locate major function entry points and should be able to disassemble without much issues.
Does the 32-bit Thumb-2 instructions execute in ARM or Thumb mode of the processor?
Thumb-2 instructions are accessible as were Thumb instructions when the processor is in Thumb state, that is, the T bit in the CPSR is 1 and the J bit in the CPSR is 0. (source)
Some resources mention that ARM/Thumb instructions can be "freely" intermixed in thumb-2. Does that mean explicit state change using bx, blx, ldm or ldr doesn't need to happen?
No state change needs to happen, since Thumb-2 instructions and ordinary Thumb instructions execute in the same state. As for how this fits with the instruction encoding, the ARM Architecture Reference Manual : Thumb-2 Supplement says this:
The new 32-bit Thumb instructions are added in the space previously occupied by the Thumb BL and BLX
instructions. This is made possible by treating the BL and BLX instructions as 32-bit instructions, instead of
treating them as two 16-bit instructions.

Why is PC still 32 bits in Thumb mode on ARM processor?

As Thumb mode can reference 16-bit of the general registers (r1-r14), why PC (r15) is still 32-bit? Will b Label in this mode update all 32 bits of PC, or just lower 16 bits?
Despite the question being based on an entirely incorrect premise, there is a cheeky sort-of-answer to the title question taken literally - being in Thumb mode means you're on a v4T or later architecture which means you must have a full 32 bits of PC, rather than the pre-v3 architectures where it was only 26 bits.
I'm not sure where you got the idea that Thumb operates on 16 bits of a register - the restriction with must Thumb instructions is that they can only access the "low registers" r0-r7, due to limited encoding space. There are 3 different "32-bit"s at play in ARM - the register width, the address space (which wasn't always the case as mentioned above), and the size of the fixed-width instruction encoding. Thumb only changes the third one. One result of this size reduction is that most Thumb encodings only have 3 bits per operand to encode register numbers - only a handful of instructions can spare the extra bits to encode the "high registers" r8-r15.
In operation, Thumb instructions are no different to the equivalent subset of ARM instructions - Thumb is just an alternative fetch/decode stage on the front of the same pipeline, after all.
thumbv1 instructions are 16 bits in size but what does that have to do with the size of the registers? (Nothing) the registers including r15 are 32 bits all the time. Branches with immediate values are simply offsets not absolute addresses.
this is all clearly documented in the ARM architectural reference manuals infocenter.arm.com

Are all of the ARM opcodes 1 byte?

In x86 architecture there are one, two and three byte opcodes. What about ARM? For example, when I disassemble a binary, can I always take first byte as opcode?
No. There are now a few arm instruction sets. The primary arm instruction set, all instructions are 32 bits and the decoding of the instruction is not isolated to one field within the instruction. The thumb instruction set is based on 16 bit instructions, same answer though, depending on the instruction the decoding is in different places. Then there are thumb2 extensions to the thumb instruction set, same answer. The thumb2 instructions are one of the undefined instructions from thumb then add some more bits to distinguish which instruction.
Get one of the ARM Architecture Reference Manuals from infocenter.arm.com to see all of this. There are a couple of nice diagrams which show how this all works, the msbits are used to divide the instructions into different types then depending on the type the rest of the bits are decoded.
x86 comes from the 8 bit world where the designs tended to use one memory location(/access) (at the time that was a byte) for the more commonly used instructions, then have some opcodes multiplex into a second byte. Also immediates were added in their entirety. It was okay then, but is inefficient today. x86 should be the last instruction set you learn if ever, not representative of a good instruction set, you will need to unlearn a number of things to move forward.
wikipedia has a number of instruction sets on the wiki page itself, others it often has a direct link. IP vendors like arm and mips, as well as many chip vendors you can go right to their site and get the documentation.
ARM instructions don't really have a concept of 'opcodes'. I don't know the x86 instruction set very well, but what you describe sounds like 6502 machine code where some bytes identify which instruction to execute and others specify data that the instruction uses.
Ignoring Thumb, all ARM
instructions are four bytes long. The operands used by the instruction are contained within those four bytes.

What is the ARM Thumb Instruction set?

under "The Thumb instruction set" in section 1-34 of "ARM11TechnicalRefManual" it said that:
"The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions.Thumb instructions are 16 bits long,and have a corresponding 32-bit ARM instruction that has the same effect on processor model."
can any one explain more about this especially second sentence and say how does processor perform it?
The ARM processor has 2 instruction sets, the traditional ARM set, where the instructions are all 32-bit long, and the more condensed Thumb set, where most common instructions are 16-bit long (and some are 32-bit long). Which instruction set to run can be chosen by the developer, and only one set can be active (i.e. once the processor is switched to Thumb mode, all instructions will be decoded as using the Thumb instead of ARM).
Although they are different instruction sets, they share similar functionality, and can be represented using the same assembly language. For example, the instruction
ADDS R0, R1, R2
can be compiled to ARM (E0910002 / 11100000 10010001 00000000 00000010) or Thumb (1888 / 00011000 10001000). Of course, they perform the same function (add r1 and r2 and store the result to r0), even if they have different encodings. This is the meaning of Thumb instructions are 16 bits long,and have a corresponding 32-bit ARM instruction that has the same effect on processor model.
Every* instruction in Thumb encoding also has a corresponding encoding in ARM, which is meant by the "subset" sentence.
*: Not strictly true, there is not "IT" instruction in ARM, although ARM doesn't need "IT" anyway (it will be ignored by the assembler).

Resources