Long consecutive number of "add" opcodes after disassembling - disassembly

After disassembling PE 32 files using radare2 to extract opcodes from, I noticed that near the beginning of the opcodes I get a long consecutive list of "add" opcodes. I am wondering if anyone knows the reason for that and why it is there.

Related

Detecting and extracting opcode sequences

May I get an explanation about what opcode sequences are and how to find them in PE32 files?
I am trying to extract them from PE32 files.
what opcode sequences are
A CPU instruction is composed from 1 to multiple bytes, each of of those bytes have different meanings.
An opcode (operation code) is the part of an instruction that defines the behavior of the instruction itself (as in, this instruction is an 'ADD', or an 'XOR', a NOP, etc.).
For x86 / x64 CPUs (IA-32; IA-32e in Intel linguo) an instruction is composed of at least an opcode (1 to 3 bytes), but can comes with multiple other bytes (various prefixes, ModR/M, SIB, Disp. and Imm.) depending on its encoding:
Opcode is often synonym with "instruction" (since the opcode defines the behavior of the instruction); therefore when you have multiple instructions you then have an opcode sequence (which is a bit of a misnomer since it's really a instruction sequence, unless all instructions in the sequence are only composed of opcodes).
how to find them in PE32 files?
As instructions can be multiple bytes long, you can't just start at a random location in the .text section (which, for a PE file, contains the executable code of the program). There's a specific location in the PE file - called the "entry point" - which defines the start of the program.
The entry point for a PE File is given by the AddressOfEntryPoint member of the IMAGE_OPTIONAL_HEADER structure (parts of the PE header structures). Note that this member is an RVA, not a "full" VA.
From there you know you are at the start of an instruction. You can start disassembling / counting instructions from this point, following the encoding rules for instructions (these rules are explained to great length in the Intel and AMD manuals).
Most instruction are "fall-through", which means that once an instruction has executed, the next to execute is the following one (this seems obvious, but!). The trick is when there's a non-fall-through instruction, you must know what this instruction does to continue your disassembling (e.g. it might jump somewhere, go to a specific handler, etc.)
Use the radare 2 library, it can extract opcode sequences very quickly.

Detecting Thumb-2 instruction and location of PC offset

i'm kinda new to ARM and i am trying to understand how instructions are interpreted/executed:
From what i know, on ARM is quite simple since every instruction takes up 4 bytes and it's all aligned by 4 bytes also.
The problem comes with Thumb-2 where their instructions can be both 16/32bit long. I've read that to determine if the current instruction is 16/32 bits long the processor reads a word (32bit) and evaluates the first half-word on certain bits [15:11]. If those bits are 0b11101/0b11110/0b11111 then that halfword is the first halfword of a 32 bit instruction else it's a 16bit instruction (I don't quite get why those specific bytes determine that). So an example should be:
0x4000 16-bit
0x4002 32-bit
0x4006 16-bit
0x4008 16-bit
0x400a 32-bit
Then the processor should grab from 0x4000 to 0x4004, evaluate the first half-word (0x4000 to 0x4002) and if the instruction is 16 bit then it just jumps to the next half-word and repeats the process but if the half-word indicates a 32bit address then it skips the next half-word and executes that 32bit instruction?
Also, i'm confused on where does PC point in thumb-2, is it still two instructions further?
Most of us don't/won't know exactly how it is implemented in the logic (and there are various cores so each could be different). But what used to be undefined instructions became thumb-2 extensions a couple dozen in armv6-m then like 150 new ones in armv7-m.
Think of the processor fetching 16 bit instructions, and sometimes it runs across a variable length one. Just like other variable length processors, the x86 will look at the one byte instruction then based on that it may or may not need to look at the next byte and so on until it has resolved the whole instruction. Same here, it looks at a halfword determines if it has everything it needs, if not it grabs the next halfword for the rest of the information.
0x4000 16-bit
0x4002 32-bit
0x4006 16-bit
0x4008 16-bit
0x400a 32-bit
the processor grabs 0x4000 sees it has what it needs, executes. The processor grabs 0x4002, sees it needs another halfword, grabs 0x4004, executes. processor grabs 0x4006 has what it needs executes. grabs 0x4008 has what it needs executes. grabs 0x400A sees it needs another halfword, grabs 0x400C, executes.
Those bit patterns were formerly undefined instructions, now they are part of the definition of a variable length instruction. Just like instructions that start with 0b010000 are data processing instructions and to determine is it an add or an xor, you have to look at other bits. These bit patterns define thumb-2 extensions then other bits in those two half words define what the full instruction is.
Why these bit patterns? You can think of it is arbitrary if you want, all instruction sets someone(/group) sat down and decided what bit patterns where going to mean what, no different here. There was room in the instruction set space with certain patterns so those were used. Not uncommon to add instructions later in the life of a processor family, take x86 for example. Plus many others, for an 8 bitter like x86 or 6502 or whatever you can either consume an 8 bit instruction/opcode as your next new instruction or you take that formerly unused byte/opcode and expand it into many more for example you take a byte/opcode that was unused and that byte now means look at the next byte, that next byte could be up to 256 new instructions or it could simply supplement the first byte specifying registers or operations, etc. No different here, down the road arm extended the thumb instruction set, some percentage of the instruction is consumed indicating this is a variable length instruction, but of those 32 bits there still remains quite a few bits to allow for a larger instruction with more options. (but losing the one to one relationship between thumb and arm instructions, all thumb instructions (not thumb-2 extensions) map directly into a full sized arm instruction).
Each core is different they don't all fetch a word at a time, thumb-2 extensions don't have to be aligned so a whole thumb-2 instruction won't necessarily fit in an aligned word fetch for the processors that do word fetches. Think of the (pre)fetcher and decoder as two separate things, since they are, functionally the decoder takes 16 bits at a time in thumb mode, how is it specifically implemented? I don't know. Do they wait for two half words to be ready before decoding the first? I don't know. Is every implementation the same? I don't know, would expect not. As far as fetching goes they are not the same as you can see in the ARM documentation and I think at least one if not more the chip vendor can choose at compile time.
If you are coming from for example a MIPS based textbook and trying to understand other processors, this can be confusing, understand that those text books and terms are for understanding and vocabulary, pipelines are not that depth in general and you don't fetch whole instructions at a time in general (the x86 does not fetch one byte at a time, it fetches MANY instructions at a time). Risc-v has even worse of a problem than arm and mips as you can have 16 bit compressed instructions, 32 bit instructions, and 64 bit instructions, the 32 bit instructions do not have to be aligned on a risc-v (nor the 64 bit) so fetching 32 at a time doesn't get you a whole instruction, the fetcher is separate from the decoder once enough is there then the decoder can complete.
I want to say that thumb is two ahead (independent of a thumb2 extension or not) so pc+4, should be easy to figure out though.
Disassembly of section .text:
00000000 <hello-0xe>:
0: e005 b.n e <hello>
2: bf00 nop
4: bf00 nop
6: f000 b802 b.w e <hello>
a: bf00 nop
c: bf00 nop
0000000e <hello>:
e: bf00 nop
Yes, so two thumb sized halfwords ahead (pc+4) in both cases. It would be significantly more complicated if it were two instructions ahead which is how it used to be to make it easy to remember. If it were two instructions ahead then sometimes pc+4, sometimes pc+6, and sometimes pc+8 the logic would have to decode two instructions in order to know how the pc was offset for the first of the two, so sticking with pc+4 as it has always been for thumb mode is the sane way to do it.

Disassembly of a mixed ARM/Thumb2 ELF file

I'm trying to disassemble an ELF executable which I compiled using arm-linux-gnueabihf to target thumb-2. However, ARM instruction encoding is making me confused while debugging my disassembler. Let's consider the following instruction:
mov.w fp, #0
Which I disassembled using objdump and hopper as a thumb-2 instruction. The instruction appears in memory as 4ff0000b which means that it's actually0b00f04f (little endian). Therefore, the binary encoding of the instruction is:
0000 1011 0000 0000 1111 0000 0100 1111
According to ARM architecture manual, it seems like ALL thumb-2 instructions should start with 111[10|01|11]. Therefore, the above encoding doesn't correspond to any thumb-2 instruction. Further, it doesn't match any of the encodings found on section A8.8.102 (page 484).
Am I missing something?
I think you're missing the subtle distinction that wide Thumb-2 encodings are not 32-bit words like ARM encodings, they are a pair of 16-bit halfwords (note the bit numbering above the ARM ARM encoding diagram). Thus whilst the halfwords themselves are little-endian, they are still stored in 'normal' order relative to each other. If the bytes in memory are 4ff0000b, then the actual instruction encoded is f04f 0b00.
thumb2 are extensions to the thumb instruction set, formerly undefined instructions, now some of them defined. arm is a completely different instruction set. if the toolchain has not left you clues as to what code is thumb vs arm then the only way to figure it out is start with an assumption at an entry point and disassemble in execution order from there, and even there you might not figure out some of the code.
you cannot distinguish arm instructions from thumb or thumb+thumb2 extension simply by bit pattern. also remember arm instructions are aligned on 4 byte boundaries where thumb are 2 byte and a thumb 2 extension doesnt have to be in the same 4 byte boundary as its parent thumb, making this all that much more fun. (thumb+thumb2 is a variable length instruction set made from multiples of 16 bit values)
if all of your code is thumb and there are no arm instructions in there then you still have the problem you would have with a variable length instruction set and to do it right you have to walk the code in execution order. For example it would not be hard to embed a data value in .text that looks like the first half of a thumb2 extension, and follow that by a real thumb 2 extension causing your disassembler to go off the rails. elementary variable word length disassembly problem (and elementary way to defeat simple disassemblers).
16 bit words A,B,C,D
if C + D are a thumb 2 instruction which is known by decoding C, A is say a thumb instruction and B is a data value which resembles the first half of a thumb2 extension then linearly decoding through ram A is the thumb instruction B and C are decoded as a thumb2 extension and D which is actually the second half of a thumb2 extension is now decoded as the first 16 bits of an instruction and all bets are off as to how that decodes or if it causes all or many of the following instructions to be decoded wrong.
So start off looking to see if the elf tells you something, if not then you have to make passes through the code in execution order (you have to make an assumption as to an entry point) following all the possible branches and linear execution to mark 16 bit sections as first or additional blocks for instructions, the unmarked blocks cannot be determined necessarily as instruction vs data, and care must be taken.
And yes it is possible to play other games to defeat disassemblers, intentionally branching into the second half of a thumb2 instruction which is hand crafted to be a valid thumb instruction or the begnning of a thumb2.
fixed length instruction sets like arm and mips, you can linearly decode, some data decodes as strange or undefined instructions but your disassembler doesnt go off the rails and fail to do its job. variable length instruction sets, disassembly at best is just a guess...the only way to truly decode is to execute the instructions the same way the processor would.

Instruction disassembler ARM. [ARM/Thumb mode]

I would like to ask you how to determine in which ISA (ARM/Thumb/Thumb-2) an instruction is encoded?
First of all, I tried to do it following the instructions here (section 4.5.5).
However, when I use readelf -s ./arm_binary, and arm_binary was built in release mode, it appears that there is no .symtab in the binary. And anyway, I don't understand how to use this command to find the type for the instructions.
Secondly, I know the other way to differentiate is to look at the PC address for the ARM/Thumb instruction. If it is even then it is a Thumb instruction, if not - then ARM. But how can I do this without loading the file to memory? When I parse the sections of the file and find the execute section, all that I have is the start (offset) location in the file and the file-offset is always even, and it will be always even because we have instruction of size equal to 2 or 4...
Finally, the last way to check is to detect BX Rm, extract the value from Rm, and then check if that address in Rm is it even or not. But, this may be difficult because for this I would need to emulate the whole program.
So what is the correct way to identify the ISA for disassembly?
Thank you for your attention and I hope you will help me.
I don't believe it's possible to tell, in a mixed mode binary, without inspecting the instructions as you describe.
If the whole file will be one ISA or the other, then you can determine the ISA of the entry point by running this:
readelf -h ./arm_binary
And checking whether the entry point is even or odd.
However, what I would do is simply disassemble it both ways, and see what looks right. As long as you start the disassembly at the start of a function (or any 4-byte boundary), then this will work fine. Most code will produce nonsense when disassembled in the wrong ISA.

arm (bare metal): call binary file as function

I have AT91Bootloader for AT91sam9 ARM controller. I need add some extra hardware initialization, but I have only compiled .bin file.
I loaded bin file to memory and tried to call it:
((void (*)())0x00005000)();
But, haven't any results. Please use assembler as less as possible. I was introduced to assembler before, but cannot understand ARM assembler due to it's complicity. How can I make call from middle of bootloader, execute bin file (it will be in some memory sector, 0x00005000 for example) and then return to bootloader and continue executing it's own code?
If ARM asm is "too complex", you will find it very difficult to debug any problems you're having. Basic* ARM assembly is one of the least complex assembly languages I've come across.
Your code ought to work (though I would not use a hard-coded address there) provided the ".bin" is of the correct format. Common issues:
The entry point should be ARM code; some compilers default to Thumb. It's possible (if a little tricky) to make Thumb code work.
The entry point needs to be at the start of the file. Without disassembling, it's hard to tell if you've done this correctly.
The linker will insert "thunks" (a.k.a. "stubs") where necessary. A quirk in some linkers means that the thunk can be placed before the entry point. You can work around this by using --stub-group-size=-1 (docs here).
* Ignoring things like Thumb/VFP/NEON which you don't need to get started.
ARM assembly is one of the simpler ones, very straight forward. If you want to continue to do bare metal you are going to need to learn at least some assembly. For example understanding Alexey's comment.
The instruction you are looking for is BX, it branches to an address, the assembly you need to branch to the code your bootloader downloaded is:
.globl tramp
tramp:
bx r0
The C prototype is
void tramp ( unsigned int address );
As mentioned in the comments the program needs to be compiled for the address you are running it from and/or it needs to be position independent, otherwise it wont work. Also you need to build the application with the proper entry point, if it is raw binary and you branch to the address where the binary was loaded the binary needs to be able to be started that way by having the first word in the binary be the entry point for execution.
Also understand that an elf format file contains the data you want to load, but as a whole is not the data you want to load. It is a "binary file" yes but to run the program contained in it you need to parse and extract the loadable portions and load them in the right places.
If you dont know what those terms mean, use google, and/or search SO, the answers are there.

Resources