ARM Assembly loop using PC? - loops

I am currently learning arm assembly and I have some questions. When reading docs, I've found that the register nº 15 is the program counter that stores the next instruction adress, and when an instruction is done, it is incremented by 4 (bytes, or 2 in thumb mode).
So, my question is, if I run an instruction that changes PC by itself less 4 bytes, would it return to the instruction before, won't it? Then back and over and over again so it will be an infinite loop?
Thanks, and sorry if it is an obvious question.
Regards,
Pedro.

You have to look on an instruction by instruction basis, as some have modification of the PC being unpredictable, but for those where it is legal modification of the program counter essentially causes a jump to the address you save in the program counter. You dont have to worry about the two instructions ahead thing (it is 8 and 4 bytes not 4 and 2, two instructions ahead).

Yes - a jump/branch instruction is exactly what you're describing - it's an instruction which modifies the PC. If you arrange the result of the jump to put the program counter back where it was then, yes, you'll loop on the spot.

Note that this is not really the address of the next instruction but the address of the current instruction +4 (In Thumb mode) or +8 (In ARM mode). So in ARM this is 2 instructions later, but in Thumb it may not be (As instructions can be 16-bit or 32-bit)

Related

Branch and ARM program counter

My understanding is the ARM program counter points to two instructions ahead of the currently executing instruction.
How does this work with conditional branching or even a plain branch?
If you are executing op1, have a branch at op2 and then op3, does the PC point to op3? Or does it point to the next instruction contiguous from op2?
How can you do PC relative addressing with branch instructions present? Do you need to add nops?
The PC register in ARM points to two instructions ahead of the current instruction in the address space, not in the flow of execution. So the PC points to the instruction next to op2 in the given example. Whether the subsequent instruction is branch or not is irrelevant to encoding.

Understanding Cortex-M assembly LDR with pc offset

I'm looking at the disassembly code for this piece of C code:
#define GPIO_PORTF_DATA_R (*((volatile unsigned long *)0x400253FC))
int main(void){
// Initialization code
while(1) {
SW1 = GPIO_PORTF_DATA_R&0x10; // Read PF4 into SW1
// Other code
SW2 = GPIO_PORTF_DATA_R&0x01;
}
}
The assembly for that SW1= line is (sorry can't copy code):
https://imgur.com/dnPHZrd
Here are my questions:
At the first line, PC = 0x00000A56, and PC + 92 = 0x00000AB2, which is not equal to 0x00000AB4, the number shown. Why?
I did a bit of research on SO and found out that PC actually points to the Next Next instruction to be executed.
When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.
However 0x00000AB4 - 0x00000A56 = 0x5E = 94, neither does it match 92+8 or 92+4. Where did I get wrong?
Reference:
Strange behaviour of ldr [pc, #value]
Why does the ARM PC register point to the instruction after the next one to be executed?
LDR Rd,-Label vs LDR Rd,[PC+Offset]
From ARM documentation:
Operation
address = (PC[31:2] << 2) + (immed_8 * 4)
Rd = Memory[address, 4]
The pc is 0xA56+4 because of two instructions ahead and this is thumb so 4 bytes.
(0xA5A>>2)<<2 + (0x17*4)
or
(0x00000A5A&0xFFFFFFFC) + (0x17<<2)
0xA58+92=0xA64
This is an LDR so it is a word-based address ideally. Because the thumb instruction can be on a non-word aligned address, you start off by adding two instructions of course (thumb2 complicates this but add four for thumb). Then zero the lower two bits (LDR) the offset is in words so need to convert that to bytes, times four. This makes the encoding make more sense if you think about each part of it. In arm mode the PC is already word aligned so that step is not required (and in arm mode you have more bits for the immediate so it is byte-based not word-based), making the offset encoding between arm and thumb possibly confusing.
The various documents will show the math in different ways but it is the same math nevertheless. The PC is the only confusing part, especially for thumb. For ARM you add 8, two ahead, for thumb it is basically 4 because the execution cannot tell if there is a thumb2 coming, and it would break a great many things if they had attempted that. So add 4 for the two ahead, for thumb. Since thumb is compressed they do not use a byte offset but instead a word offset giving 4 times the range. Likewise this and/or other instructions can only look forward not back so unsigned offset. This is why you will get alignment errors when assembling things in thumb that in arm would just be unaligned (and you get what you get there depending on architecture and settings). Thumb cannot encode any address for an instruction like this.
For understanding instruction encoding, in particular pc based addressing, it is best to go back to the early ARM ARM (before the armv5 one but if not then just get the armv5 one) as well as the armv6-m and armv7-m and full sized armv7-ar. And look at the pseudo-code for each. The older one generally has the best pseudo-code, but sometimes they leave out the masking of lower bits of the address. No document is perfect, they have bugs just like everything else. Naturally the architecture tied to the core you are using is the official document for the IP the chip vendor used (even down to the specific version of the TRM as these can vary in incompatible ways from one to the next). But if that document is not perfectly clear you can sometimes get an idea from others that, upon inspection, have compatible instructions, architectural features.
You missed a key part of the rules for Thumb mode, quoted in one of the question you linked (Why does the ARM PC register point to the instruction after the next one to be executed?):
For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.
(0xA56 + 4) & -4 = 0xA58 is the location that PC-relative things are relative to during execution of that ldr r0, [PC, #92]
((0xA56 + 4) & -4) + 92 = 0xab4, the location the disassembler calculated.
It's equivalent to do 0xA56 & -4 = 0xa54 then +4 + 92, because +4 doesn't modify bit #1; you can think of clearing it before or after adding that +4. But you can't clear the bit after adding the PC-relative offset; that can be unaligned for other instructions like ldrb. (Thumb-mode ldr encodes an offset in words to make better use of the limited number of bits, so the scaled offset and thus the final load address always have bits[1:0] clear.)
(Thanks to Raymond Chen for spotting this; I had also missed it initially!)
Also note that your debugger shows you a PC value when stopped at a breakpoint, but that's the address of the instruction you're stopped at. (Because that's how ARM exceptions work, I assume, saving the actual instruction to return to, not some offset.) During execution of the instruction, PC-relative stuff follows different rules. And the debugger doesn't "cook" this value to show what PC will be during its execution.
The rule is not "relative to the end of this / start of next instruction". Answers and comments stating that rule happen to get the right answer in this case, but would get the wrong answer in other Thumb cases like in LDR Rd,-Label vs LDR Rd,[PC+Offset] where the PC-relative load instruction happens to start at a 4-byte aligned address so bit #1 of PC is already cleared.
Your LDR is at address 0xA56 where bit #1 is set, so the rounding down has an effect. And your ldr instruction used a 2-byte encoding, not a Thumb2 32-bit instruction like you might need for a larger offset. Both of these things means round-down + 4 happens to be the address of the next instruction, rather than 2 instruction later or the middle of this instruction.
Since the program counter points to the next instruction, when it executes the LDR at address 0x00000A56, the program counter will be holding the address of the next instruction, which is 0x00000A58.
0x0A58 + 0x5C (decimal 92) == 0x00000AB4

ARM11 Emulator in C

I'm trying to write a C program that simulates execution of an ARM binary file.
So what it does right now, we fetch instructions from binary file into an array of uint32_t, which I then decode and execute.
The problem is that I use program counter only to access ints from array and then I increment it. But for the branch instruction, that takes offset, extends it to 32 bits and adds it to PC, PC should be 8 bytes ahead of the instruction that is being executed.
So pipeline effect should take place. That is basically:
32bit (4byte) instruction is fetched from memory
instruction is decoded
decoded instruction is executed
So when the instruction is being executed at the top of the pipeline, the instruction being fetched is two instructions further in memory. Therefore PC is 8 bytes greater that the address of the instruction being executed.
Does anybody have any idea how to implement that pipeline easily?
I guess I need to redo my memory storage for the instruction as it is just an array of a fixed size right now.
My thought was to align memory, and then after each instruction add 4 to the PC, and access next instruction by using a pointer to the first element in an array and adding PC to that. If that would work, could somebody show me how that would look like?
You dont need to simulate a pipeline. ARM has not used a 2 ahead pipeline in a while in hardware, they simulate the 2 ahead as well.
What I did in mine was keep the pc one ahead any time it is modified, then on the fetch I fetch at pc-4 then add 4. Any instruction that can modify the pc had to have an if pc then pc+=4. And when I compute a branch dest I had to add another 4. Why I did it this way, I dont know.
Alternatively you can keep the pc pointing at the current instruction. And every time the pc is read you add 8. Probably easier that way. I abstract my instruction fetch, memory read/write and register read/write using functions and within the read_register() function I would add the if r15 then result+=8; and then return.
Again, why I didnt do that, I dont remember, I had to hack at it to get it working. Now the reality was my was a thumb(v1) simulator not arm so it was +2 and +4 not +4 and +8. And there was no arm mode at all (supported by my simulator), so only a few instructions in thumb can actually modify the pc so that made it easier to target only those with the if reg==15.
If you really want to do an arm11 simulator you need to think about thumb mode and then deal with the pc either being 4 ahead or 8 ahead, and remember to strip the lsbit off the pc in thumb mode when it is modified by a handful of instructions that have to have the lsbit set to switch to or stay in thumb mode. Fortunately the arm11 does not support thumb2 then it gets really ugly as it is variable instruction length and you have to be two instructions ahead not just 4 or 8 bytes (can be 4, 6 or 8 bytes ahead you have to decode the next two to find out in thumb mode). If you dont want to support thumb mode you could just easily look for the lsbit being set in a bx or pop and then declare you dont support thumb mode and bail out.
Your choices are to either simulate a pipeline, or you have to insert one or some if reg==15 then... lines of code, and you have to do it everywhere the pc is used (routing all register reads to one function is a very easy way to do this). I should go fix my simulator to do this. If you end up supporting thumb mode then you could simply say if in thumb mode then add 4 else add 8 in this read register function.

Where does ARM read program instructions from after Register 14?

How I understand the basic workings of the ARM architecture is such;
There are 15 main registers with the 15th (r15) being the Program Counter (PC).
If the program counter points to a specific register, then how can you have a Program which runs more than ~14 lines?
Obviously this is not true, but I don't understand how you can incorporate a big program with just 15 registers? What am I missing?
The program counter points to memory, not another register.
Registers don't store the program code. Program code is in main memory, and the Program Counter points to the location in memory of the next instruction.
The other registers are high-speed locations for storing temporary, or frequently accessed, values during the processing of the application.
In the simplest form, you have Program (Instruction memory), Data memory, Stack Memory, and Registers.
ARM instructions are stored in the Instruction memory, they are a sequence of commands which tell the processor what to do. They are never stored in the registers of the processor. The program counter only points to that instruction, that instruction is simply a command which in the basic form has an opcode (operation code) and variables/literals ..
So what happens is that the instruction is read from memory (fetched) from the location pointed to by the program counter. It is not loaded into the registers, but the control unit where it is decoded (that is to know what operation to do, i.e. add, sub, mov etc) and where to read/store its inputs and outputs.
So where are the inputs/outputs to operate on and store? The ARM architecture is a load/store architecture, it means it operates on data loaded into its registers, that is R1, R2 .. R7 ..etc .. where the registers could be thought of as temporary variables where all inputs and outputs are stored. Registers are used because they are so fast and operate at the same speed of the processor frequency not as memory which is slower.
Now the question is, how to populate these registers with values in the first place?
Those values could be stored on the Data Memory or Stack memory, so there are instructions to copy them to these registers, followed by instructions to operate on them and store the value in the registers, then followed by other instructions to copy the result back to Memory. Some instructions could also load a register with a constant.
Instruction 1 // Copy Variable X into R1 from memory
Instruction 2 // Copy Variable Y into R2 from memory
ADD, R3, R1, R2 // add them together
Instruction 3 // Copy back the result into Memory
I tried to make it as simple as possible, there are so many details to cover. Needs books :)

Write jump instruction in c

To preface this, yes this is a project to take control of an executable externally. No, I do not have any malicious intents with this, the end result of this project won't be anything useful anyway. I am writing this in cygwin on a 32-bit installation of XP.
What I need to do is change the first few bits of a COM file to be a jump instruction so that on execution, it will jump to the very end of the COM file. I have looked in Assembler manuals to find what the bytes of that command would be so that I can just hard code it in C, but have had no luck.
First Question: Can I do this in C? It seems to me like I could just insert OpCodes in the beginning of any COM file so that it would execute that instead of the COM file.
Second Question: does someone know where I can find a resource for OpCodes so that I can insert them in my file? Or, does anyone know what the bytes would be for a Jump instruction?
If you have any question about the authenticity of this, feel free to ask.
The Intel® 64 and IA-32 Architectures Software Developer Manual Volume 2A Instruction Set Reference explains the encoding of the JMP instruction (real mode is a subset of IA-32).
For a 16-byte near jump (within the current code segment) you'd use 0xE9 followed by the relative offset to jump to. If your jump is the first bytes of the COM file then the offset will be relative to address 0x103 - the first instruction of a COM file is always loaded at address 0x100, and the jump is relative to the instruction following the 3-byte jump.
On XP there should be debug.exe. Simply start it, start writing code with 'a'
type jmp ff00, and dis/[u]nassemble the result with 'u' if the corresponding hex dump was not shown.
Notice first that your program is necessarily operating system, ABI, and machine instruction set specific. (e.g. it won't run under Linux/x86-64 or Linux/PowerPC)
You could write in C the machine instructions as a sequence of bytes. Which bytes you have to write (i.e. the encoding of the appropriate jump instructions) is left to you!!!!!
Of course, that is not portable C. But you could basically do a memcpy with some appropriate source byte zone.
Maybe libraries like asmjit or GNU lightning might inspire you.
You probably cannot use them directly, but studying their code could help you.
See also x86 wikipedia pages for more references.

Resources