ARM11 Emulator in C - c

I'm trying to write a C program that simulates execution of an ARM binary file.
So what it does right now, we fetch instructions from binary file into an array of uint32_t, which I then decode and execute.
The problem is that I use program counter only to access ints from array and then I increment it. But for the branch instruction, that takes offset, extends it to 32 bits and adds it to PC, PC should be 8 bytes ahead of the instruction that is being executed.
So pipeline effect should take place. That is basically:
32bit (4byte) instruction is fetched from memory
instruction is decoded
decoded instruction is executed
So when the instruction is being executed at the top of the pipeline, the instruction being fetched is two instructions further in memory. Therefore PC is 8 bytes greater that the address of the instruction being executed.
Does anybody have any idea how to implement that pipeline easily?
I guess I need to redo my memory storage for the instruction as it is just an array of a fixed size right now.
My thought was to align memory, and then after each instruction add 4 to the PC, and access next instruction by using a pointer to the first element in an array and adding PC to that. If that would work, could somebody show me how that would look like?

You dont need to simulate a pipeline. ARM has not used a 2 ahead pipeline in a while in hardware, they simulate the 2 ahead as well.
What I did in mine was keep the pc one ahead any time it is modified, then on the fetch I fetch at pc-4 then add 4. Any instruction that can modify the pc had to have an if pc then pc+=4. And when I compute a branch dest I had to add another 4. Why I did it this way, I dont know.
Alternatively you can keep the pc pointing at the current instruction. And every time the pc is read you add 8. Probably easier that way. I abstract my instruction fetch, memory read/write and register read/write using functions and within the read_register() function I would add the if r15 then result+=8; and then return.
Again, why I didnt do that, I dont remember, I had to hack at it to get it working. Now the reality was my was a thumb(v1) simulator not arm so it was +2 and +4 not +4 and +8. And there was no arm mode at all (supported by my simulator), so only a few instructions in thumb can actually modify the pc so that made it easier to target only those with the if reg==15.
If you really want to do an arm11 simulator you need to think about thumb mode and then deal with the pc either being 4 ahead or 8 ahead, and remember to strip the lsbit off the pc in thumb mode when it is modified by a handful of instructions that have to have the lsbit set to switch to or stay in thumb mode. Fortunately the arm11 does not support thumb2 then it gets really ugly as it is variable instruction length and you have to be two instructions ahead not just 4 or 8 bytes (can be 4, 6 or 8 bytes ahead you have to decode the next two to find out in thumb mode). If you dont want to support thumb mode you could just easily look for the lsbit being set in a bx or pop and then declare you dont support thumb mode and bail out.
Your choices are to either simulate a pipeline, or you have to insert one or some if reg==15 then... lines of code, and you have to do it everywhere the pc is used (routing all register reads to one function is a very easy way to do this). I should go fix my simulator to do this. If you end up supporting thumb mode then you could simply say if in thumb mode then add 4 else add 8 in this read register function.

Related

Understanding Cortex-M assembly LDR with pc offset

I'm looking at the disassembly code for this piece of C code:
#define GPIO_PORTF_DATA_R (*((volatile unsigned long *)0x400253FC))
int main(void){
// Initialization code
while(1) {
SW1 = GPIO_PORTF_DATA_R&0x10; // Read PF4 into SW1
// Other code
SW2 = GPIO_PORTF_DATA_R&0x01;
}
}
The assembly for that SW1= line is (sorry can't copy code):
https://imgur.com/dnPHZrd
Here are my questions:
At the first line, PC = 0x00000A56, and PC + 92 = 0x00000AB2, which is not equal to 0x00000AB4, the number shown. Why?
I did a bit of research on SO and found out that PC actually points to the Next Next instruction to be executed.
When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.
However 0x00000AB4 - 0x00000A56 = 0x5E = 94, neither does it match 92+8 or 92+4. Where did I get wrong?
Reference:
Strange behaviour of ldr [pc, #value]
Why does the ARM PC register point to the instruction after the next one to be executed?
LDR Rd,-Label vs LDR Rd,[PC+Offset]
From ARM documentation:
Operation
address = (PC[31:2] << 2) + (immed_8 * 4)
Rd = Memory[address, 4]
The pc is 0xA56+4 because of two instructions ahead and this is thumb so 4 bytes.
(0xA5A>>2)<<2 + (0x17*4)
or
(0x00000A5A&0xFFFFFFFC) + (0x17<<2)
0xA58+92=0xA64
This is an LDR so it is a word-based address ideally. Because the thumb instruction can be on a non-word aligned address, you start off by adding two instructions of course (thumb2 complicates this but add four for thumb). Then zero the lower two bits (LDR) the offset is in words so need to convert that to bytes, times four. This makes the encoding make more sense if you think about each part of it. In arm mode the PC is already word aligned so that step is not required (and in arm mode you have more bits for the immediate so it is byte-based not word-based), making the offset encoding between arm and thumb possibly confusing.
The various documents will show the math in different ways but it is the same math nevertheless. The PC is the only confusing part, especially for thumb. For ARM you add 8, two ahead, for thumb it is basically 4 because the execution cannot tell if there is a thumb2 coming, and it would break a great many things if they had attempted that. So add 4 for the two ahead, for thumb. Since thumb is compressed they do not use a byte offset but instead a word offset giving 4 times the range. Likewise this and/or other instructions can only look forward not back so unsigned offset. This is why you will get alignment errors when assembling things in thumb that in arm would just be unaligned (and you get what you get there depending on architecture and settings). Thumb cannot encode any address for an instruction like this.
For understanding instruction encoding, in particular pc based addressing, it is best to go back to the early ARM ARM (before the armv5 one but if not then just get the armv5 one) as well as the armv6-m and armv7-m and full sized armv7-ar. And look at the pseudo-code for each. The older one generally has the best pseudo-code, but sometimes they leave out the masking of lower bits of the address. No document is perfect, they have bugs just like everything else. Naturally the architecture tied to the core you are using is the official document for the IP the chip vendor used (even down to the specific version of the TRM as these can vary in incompatible ways from one to the next). But if that document is not perfectly clear you can sometimes get an idea from others that, upon inspection, have compatible instructions, architectural features.
You missed a key part of the rules for Thumb mode, quoted in one of the question you linked (Why does the ARM PC register point to the instruction after the next one to be executed?):
For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.
(0xA56 + 4) & -4 = 0xA58 is the location that PC-relative things are relative to during execution of that ldr r0, [PC, #92]
((0xA56 + 4) & -4) + 92 = 0xab4, the location the disassembler calculated.
It's equivalent to do 0xA56 & -4 = 0xa54 then +4 + 92, because +4 doesn't modify bit #1; you can think of clearing it before or after adding that +4. But you can't clear the bit after adding the PC-relative offset; that can be unaligned for other instructions like ldrb. (Thumb-mode ldr encodes an offset in words to make better use of the limited number of bits, so the scaled offset and thus the final load address always have bits[1:0] clear.)
(Thanks to Raymond Chen for spotting this; I had also missed it initially!)
Also note that your debugger shows you a PC value when stopped at a breakpoint, but that's the address of the instruction you're stopped at. (Because that's how ARM exceptions work, I assume, saving the actual instruction to return to, not some offset.) During execution of the instruction, PC-relative stuff follows different rules. And the debugger doesn't "cook" this value to show what PC will be during its execution.
The rule is not "relative to the end of this / start of next instruction". Answers and comments stating that rule happen to get the right answer in this case, but would get the wrong answer in other Thumb cases like in LDR Rd,-Label vs LDR Rd,[PC+Offset] where the PC-relative load instruction happens to start at a 4-byte aligned address so bit #1 of PC is already cleared.
Your LDR is at address 0xA56 where bit #1 is set, so the rounding down has an effect. And your ldr instruction used a 2-byte encoding, not a Thumb2 32-bit instruction like you might need for a larger offset. Both of these things means round-down + 4 happens to be the address of the next instruction, rather than 2 instruction later or the middle of this instruction.
Since the program counter points to the next instruction, when it executes the LDR at address 0x00000A56, the program counter will be holding the address of the next instruction, which is 0x00000A58.
0x0A58 + 0x5C (decimal 92) == 0x00000AB4

ARM Assembly loop using PC?

I am currently learning arm assembly and I have some questions. When reading docs, I've found that the register nº 15 is the program counter that stores the next instruction adress, and when an instruction is done, it is incremented by 4 (bytes, or 2 in thumb mode).
So, my question is, if I run an instruction that changes PC by itself less 4 bytes, would it return to the instruction before, won't it? Then back and over and over again so it will be an infinite loop?
Thanks, and sorry if it is an obvious question.
Regards,
Pedro.
You have to look on an instruction by instruction basis, as some have modification of the PC being unpredictable, but for those where it is legal modification of the program counter essentially causes a jump to the address you save in the program counter. You dont have to worry about the two instructions ahead thing (it is 8 and 4 bytes not 4 and 2, two instructions ahead).
Yes - a jump/branch instruction is exactly what you're describing - it's an instruction which modifies the PC. If you arrange the result of the jump to put the program counter back where it was then, yes, you'll loop on the spot.
Note that this is not really the address of the next instruction but the address of the current instruction +4 (In Thumb mode) or +8 (In ARM mode). So in ARM this is 2 instructions later, but in Thumb it may not be (As instructions can be 16-bit or 32-bit)

Big empty space in memory?

Im very new to embedded programming started yesterday actually and Ive noticed something I think is strange. I have a very simple program doing nothing but return 0.
int main() {
return 0;
}
When I run this in IAR Embedded Workbench I have a memory view showing me the programs memory. Ive noticed that in the memory there is some memory but then it is a big block of empty space and then there is memory again (I suck at explaining :P so here is an image of the memory)
Please help me understand this a little more than I do now. I dont really know what to search for because Im so new to this.
The first two lines are the 8 interrupt vectors, expressed as 32-bit instructions with the highest byte last. That is, read them in groups of 4 bytes, with the highest byte last, and then convert to an instruction via the usual method. The first few vectors, including the reset at memory location 0, turn out to be LDR instructions, which load an immediate address into the PC register. This causes the processor to jump to that address. (The reset vector is also the first instruction to run when the device is switched on.)
You can see the structure of an LDR instruction here, or at many other places via an internet search. If we write the reset vector 18 f0 95 e5 as e5 95 f0 18, then we see that the PC register is loaded with the address located at an offset of 0x20.
So the next two lines are memory locations referred to by instructions in the first two lines. The reset vector sends the PC to 0x00000080, which is where the C runtime of your program starts. (The other vectors send the PC to 0x00000170 near the end of your program. What this instruction is is left to the reader.)
Typically, the C runtime is code added to the front of your program that loads the global variables into RAM from flash, and sets the uninitialized RAM to 0. Your program starts after that.
Your original question was: why have such a big gap of unused flash? The answer is that flash memory is not really at a premium, so we can waste a little, and that having extra space there allows for forward-compatibility. If we need to increase the vector table size, then we don't need to move the code around. In fact, this interrupt model has been changed in the new ARM Cortex processors anyway.
Physical (not virtual) memory addresses map to physical circuits. The lowest addresses often map to registers, not RAM arrays. In the interest of consistency, a given address usually maps to the same functionality on different processors of the same family, and missing functionality appears as a small hole in the address mapping.
Furthermore, RAM is assigned to a contiguous address range, after all the I/O registers and housekeeping functions. This produces a big hole between all the registers and the RAM.
Alternately, as #Martin suggests, it may represent uninitialized and read-only Flash memory as -- bytes. Unlike truly unassigned addresses, access to this is unlikely to produce an exception, and you might even be able to make them "reappear" using appropriate Flash controller commands.
On a modern desktop-class machine, virtual memory hides all this from you, and even parts of the physical address map may be configurable. Many embedded-class processors allow configuration to the extent of specifying the location of the interrupt vector table.
UncleO is right but here is some additional information.
The project's linker command file (*.icf for IAR EW) determines where sections are located in memory. (Look under Project->Options->Linker->Config to identify your linker configuration file.) If you view the linker command file with a text editor you may be able to identify where it locates a section named .intvec (or similar) at address 0x00000000. And then it may locate another section (maybe .text) at address 0x00000080.
You can also see these memory sections identified in the .map file, along with their locations. (Ensure "Generate linker map file" is checked under Project->Options->Linker->List.) The map file is an output from the build, however, and it's the linker command file that determines the locations.
So that space in memory is there because the linker command file instructed it to be that way. I'm not sure whether that space is necessary but it's certainly not a problem. You might be able to experiment with the linker command file and move that second section around. But the exception table (a.k.a. interrupt vector table) must be located at 0x00000000. And you'll want to ensure that the reset vector points to the new location of the startup code if you move it.

AT91 Bootstrap + Bare Metal Application

I am currently trying to understand how AT91 and a bare metal application can work together. I'll try to describe what I have:
IAR as development environment
A simple application which I can download via debugger to SRAM and which toggles some LEDs (working!)
Using SAM-BA I can write this application to SRAM and it will start correctly (LEDs are toggling)
My hardware platform is the ATSAMA5D3x-EK
Now I would like this application to first run the AT91 bootstrap to initialize all the low level hardware (like DDR-RAM), then jump to my application and run it. I have not been able to do that yet successfully. I am able to start the pre-built uboot binary though so I assume it's not the copying or jump that are failing but my application is setup incorrectly.
As far as I understand, if I jump to an application (I assume this is some sort of "LDR pc, appstart_address") the operation at address appstart_address gets executed.
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt vectors, whereas the first instruction is usually some sort of "LDR pc, =main". Are these required if my application is copied to RAM and executed from there? I somehow have the feeling that after copying my application to RAM, the address pointers do not match anymore (although they should be relative - is that correct at all?)
So my questions basically boil down to:
What happens after AT91 has initialized the hardware and jumps to my application
Do I need to setup my application in some specific way? Do I need to tell the linker or any other component that it will be relocated to some other memory location (at91 bootstrap copies it to 0x2600 0000 whereas 0x2000 0000 is the start address of DDR).
Does anyone know of a good tutorial which explains exactely this step (the jump from at91 bootstrap to my application)?
One more question which I probably can answer myself:
Is it safe to assume that I will not need to execute the instructions in board_startup.s at the beginning of my application which enable The floating point unit, setup the sys stack pointer and so on. I would say that the hardware itself has already been setup by AT91 Bootstrap and therefore there is no need for such setup.
After thinking about a few things it comes down to this:
Does it make sense to tell the linker that it should link main to address 0x0 (because this is where bootstrap will jump to) - how would I do that?
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt
vectors, whereas the first instruction is usually some sort of "LDR
pc, =main". Are these required if my application is copied to RAM and
executed from there? I somehow have the feeling that after copying my
application to RAM, the address pointers do not match anymore
(although they should be relative - is that correct at all?)
The first 8 WORDS are exception entry points yes. Of which one is undefined so 7 real ones...
The reset vector does not want to go straight to main implying C code, you have not setup the stack or anything that you need to do to call C code. Also the reset vector is often close enough to use a branch b instead of a ldr pc, but since you only have one word/instruction to get out of the exception table then it either needs to be a branch or a ldr pc,something.
if your binary is position dependent then you build it for that position, you can then place it in non-volatile storage, copy and run if you like there is no problem with that. if you build it for its non-volatile address but you run it in a different address space and it is not position independent then you are right it simply wont work.
What happens after AT91 has initialized the hardware and jumps to my
application
your application runs
Do I need to setup my application in some specific way? Do I need to
tell the linker or any other component that it will be relocated to
some other memory location (at91 bootstrap copies it to 0x2600 0000
whereas 0x2000 0000 is the start address of DDR).
either build it position independent or link it for the address where it will run.
Does anyone know of a good tutorial which explains exactely this step
(the jump from at91 bootstrap to my application)?
I assume when you say at91 bootstrap (need to use a more correct term) you mean some part specific (at91 is a long lived family of devices) you really mean either some atmel part specific code or IAR part specific code. And the answer to your question is in their examples or documentation. You need to demonstrate what you found, examples, etc before posting a question like that.
Is it safe to assume that I will not need to execute the instructions
in board_startup.s at the beginning of my application which enable The
floating point unit, setup the sys stack pointer and so on. I would
say that the hardware itself has already been setup by AT91 Bootstrap
and therefore there is no need for such setup.
if you are relying on someone elses code to for example setup ddr, then it is probably a safe bet that they setup the stack. fpu, thats another story. But if that file name is specific to their project and is something they call/use then well, they called it or used it. Again this is specific to this magic AT91 Bootstrap thing which you have not demonstrated that you looked at or through or read about. Please, do some more research on the topic, show what you tried, etc. For example it should be quite trivial after this bootstrap code to read the registers that enable the fpu and or just use it and see what you see. that is an easy way to tell if it had been run. alternatively insert an infinite loop in that code and re-build if the code hangs at the infinite loop. they they are running it. (careful not to brick your board with such a move, in theory SAM-BA will let you re-load).
Does it make sense to tell the linker that it should link main to
address 0x0 (because this is where bootstrap will jump to) - how would
I do that?
The exception table for this processor is at a well known location (possibly one of two depending on strapping). the exception handlers need to be in the right place for the processor to boot properly. Generally it is the linker that does the final arranging of code and it is linker specific as to how you tell the linker where to put things so the answer is in the documentation for the linker and also either somewhere in the project it specifies this information (linker script, makefile, etc) or a default is used either global default or some variable or command line option tells one of the tools where to look for this information. so how you do it is read the docs and do what the docs say.

Write jump instruction in c

To preface this, yes this is a project to take control of an executable externally. No, I do not have any malicious intents with this, the end result of this project won't be anything useful anyway. I am writing this in cygwin on a 32-bit installation of XP.
What I need to do is change the first few bits of a COM file to be a jump instruction so that on execution, it will jump to the very end of the COM file. I have looked in Assembler manuals to find what the bytes of that command would be so that I can just hard code it in C, but have had no luck.
First Question: Can I do this in C? It seems to me like I could just insert OpCodes in the beginning of any COM file so that it would execute that instead of the COM file.
Second Question: does someone know where I can find a resource for OpCodes so that I can insert them in my file? Or, does anyone know what the bytes would be for a Jump instruction?
If you have any question about the authenticity of this, feel free to ask.
The Intel® 64 and IA-32 Architectures Software Developer Manual Volume 2A Instruction Set Reference explains the encoding of the JMP instruction (real mode is a subset of IA-32).
For a 16-byte near jump (within the current code segment) you'd use 0xE9 followed by the relative offset to jump to. If your jump is the first bytes of the COM file then the offset will be relative to address 0x103 - the first instruction of a COM file is always loaded at address 0x100, and the jump is relative to the instruction following the 3-byte jump.
On XP there should be debug.exe. Simply start it, start writing code with 'a'
type jmp ff00, and dis/[u]nassemble the result with 'u' if the corresponding hex dump was not shown.
Notice first that your program is necessarily operating system, ABI, and machine instruction set specific. (e.g. it won't run under Linux/x86-64 or Linux/PowerPC)
You could write in C the machine instructions as a sequence of bytes. Which bytes you have to write (i.e. the encoding of the appropriate jump instructions) is left to you!!!!!
Of course, that is not portable C. But you could basically do a memcpy with some appropriate source byte zone.
Maybe libraries like asmjit or GNU lightning might inspire you.
You probably cannot use them directly, but studying their code could help you.
See also x86 wikipedia pages for more references.

Resources