How to automatically increment offset in lw mips? - loops

I need to iterate through the array below. However the values of the array need to be stored at 0x10010080, but the program needs to start at 0x10010000. Is there a way for me to create a loop that automatically increments the offset by 4, so that the next lw instruction is:
lw $t0, 132($s0)
.data 0x10010080
array: .word 0x10010008, 0x1001000C, 0x10010006, 0x1001000D, 0x10010002
.text
li $s0, 0x10010000
# store the value 0x10010008 in t0
lw $t0, 128($s0)

MIPS has no automatic increment.  Even processors that do (e.g. ARM) require you to request the increment, giving evidence to the argument that it is not really automatic.  However, since MIPS doesn't even have the ability to request increment as part of dereference, then simply increase the value of the pointer by using a separate instruction.
As always, if you cannot do it in one instruction, then use a sequence of one or more.
As another point, asking for an instruction that does 128(t0) to become an instruction that is 132(t0) is called self-modifying code.  Older processors supported this concept for two reasons (1) it was necessary because of missing instructions (instructions that take constants but not variables), and (2) because older architectural designs didn't care about code being modified like data — for newer processors, most missing concepts have been provided for in an instruction form, and, due to their more modern designs, they don't like it when code is modified on the fly as it hampers performance or just doesn't work without cache synchronization (which, using it, is also a performance issue).
If you want to access a different location from the first, then use normal/regular array indexing or any other pointer arithmetic to refer to the (new) desired location.

Related

Expanding or avoiding addiu in MIPS

I've implemented a program (a fully-connected layer) in C, which needs to be compiled to MIPS to run on a specific microprocessor in order to test the microprocessor's function. Since the ADDIU instruction is not part of this processor's instruction set, I am editing the C program to produce fewer ADDIU instructions at compile time and trying to edit the remaining ones out of the MIPS code (ADD and ADDU are allowed). However, I am brand new to MIPS and want to make sure my edits don't change the function of the program. Is there an expansion for ADDIU using other instructions? If not, any ideas for how I can change the logic of my program to avoid using them?
I am developing a test for a specific microprocessor with a limited MIPS instruction set. Many of the problematic instructions in the compiled code can be expanded to use only instructions in the set, so I will edit the compiled code to include those expansions. However, ADDIU doesn't seem to have an expansion according to the expansion guides I've seen.
I've already gotten rid of some ADDIU instructions by storing commonly-used values as constants so I can refer to variables instead of literals in the rest of the C code, resulting in ADDU instructions (which are allowed). The ADDIU instructions which I'm having trouble editing out occur in the following places:
Manipulating or accessing the values of the stack and frame pointers. I've thought about hard-coding the addends as constants, but I'm not sure if that's even possible or if it would change the values in question.
e.g. addiu $sp,$sp,-3840
e.g. addiu $3,$fp,52
Accessing the high/low parts of 32-bit integers separately using %hi and %lo and adding them together
e.g. lui $2,%hi(output_layer.3511)
addiu $2,$2,%lo(output_layer.3511)
Note: output_layer is an array of 32-bit ints.
Addiu instructions that occur when I compile the "mod" function in C (expanding the mod function to get the remainder "the hard way" didn't help) e.g. fracPart = currentInput % 256; in C
compiles to
lw $3,40($fp)
li $2,-2147483648 # 0xffffffff80000000
ori $2,$2,0xff
and $2,$3,$2
bgez $2,$L17
nop
addiu $2,$2,-1
li $3,-256 # 0xffffffffffffff00
or $2,$2,$3
addiu $2,$2,1
$L17:
sw $2,48($fp)
The goal is working MIPS code which contains only instructions in the instruction set of this particular microprocessor, which does not include ADDIU.
addiu and addi are almost identical. The only difference is that addi generates an exception when there is an overflow in the addition, while addiu does not generate an overflow.
So, you can replace all addiu by addi.
Manipulating or accessing the values of the stack and frame pointers. I've thought about hard-coding the addends as constants, but I'm not sure if that's even possible or if it would change the values in question.
No problem to replace addi by addi. No sane software can create addresses in sp/fp that can generate an overflow in this situation.
Accessing the high/low parts of 32-bit integers separately using %hi and %lo and adding them together
You can use addi, but people generally use an ori for this operation.
lui $2,%hi(output_layer.3511)
ori $2,$2,%lo(output_layer.3511)
In either case, there is no risk of overflow (as the 16 LSB are cleared by the lui) and addi, addiu and ori are strictly equivalent.
Addiu instructions that occur when I compile the "mod" function in C (expanding the mod function to get the remainder "the hard way" didn't help) e.g. fracPart = currentInput % 256; in C compiles to
lw $3,40($fp)
li $2,-2147483648 # 0xffffffff80000000
ori $2,$2,0xff
and $2,$3,$2
bgez $2,$L17
nop
addiu $2,$2,-1
li $3,-256 # 0xffffffffffffff00
or $2,$2,$3
addiu $2,$2,1
$L17:
sw $2,48($fp)
This code seems very strange. Why not replace the two lines (li+ori) by
li $2, 0xffffffff800000ff
The last part (after bgez) is only executed by strictly negative numbers and for them, it is equivalent to an or with 0xffffffffffffff00, and the pair of addiu seems useless...
Anyway they can also be replaced by addi.
EDIT:
If addi is not available, you can copy the immediate in a free register and then perform an add/addu with this register. In most MIPS conventions, $1 is used to store temporaries by the asm and is never used by the compilers. So yo can freely use it (provided you do not use macros that may use this register).
A systematic translation of addiu can be
addiu $d, $s, imm
## ->
ori $1, $0, imm
add $d, $s, $1
Both ori and add are real instructions and $1 can be used safely. In some assemblers, you must use $at (assembler temporary) instead of $1.
I work in a lab which develops new microprocessors, so this micro is not commercially available.
According to my understanding based on your statement, the compiler is also under development. You should discuss with the team developing the compiler about this issue, so they can take your needs into account.

Indexed addressing in x86 assembly - is something like mov array[ebx],eax a valid instruction?

My understanding is that indexed addressing essentially produces an address whose offset is the number in the brackets. Is this understanding correct? But I also understand that this address is dereferenced somehow. I don't understand exactly how this works. The book Assembly Language for Intel-Based Computers shows a lot of instructions of the form mov reg,array[reg] that move data from an indexed array element into a register. But I need a way to move data back from the register into the array. How can I do this? Do I use the opposite of that, which would be mov array[reg],reg? Or would this dereference array[reg] and move the data into the address given by the value stored in that array element?
For example, suppose the array index is 3 and it's stored in register EBX, and I want to move the value stored in EAX into that array element. The value currently stored in that array element is 500 hex. If I use the instruction mov array[ebx],eax, will this instruction move the value in EAX into array[3], or will it move it into memory location 500 hex? And if it's the latter case, what instruction can I use to avoid this effect and do what I actually want to do, which is move the data into array[3]?
Note: The syntax I am using is for MASM. I do not have MASM installed on my machine, and it's not really an option since I'm using Ubuntu. But the book I'm reading is written for MASM, so I'm learning MASM first, just to get a feel for how x86 assembly language works. I'm not assembling any programs, but I'd like to understand them.

How to identify a loop on the instruction level?

Can a direct branch instruction with a lower target address than the address of the branch instruction itself be considered as the beginning of a loop? Is this condition sufficient, or are there are other situations (compiler optimizations, etc) in which similar behaviour is shown on the instruction level.
What other approach would you recommend? The obvious one is storing a list of target addresses encountered and if a target address is taken by the same instruction more than once, it means it's the beginning of a loop. The downside of that is that it takes up memory of storing all the addresses and time for checks.

Where does ARM read program instructions from after Register 14?

How I understand the basic workings of the ARM architecture is such;
There are 15 main registers with the 15th (r15) being the Program Counter (PC).
If the program counter points to a specific register, then how can you have a Program which runs more than ~14 lines?
Obviously this is not true, but I don't understand how you can incorporate a big program with just 15 registers? What am I missing?
The program counter points to memory, not another register.
Registers don't store the program code. Program code is in main memory, and the Program Counter points to the location in memory of the next instruction.
The other registers are high-speed locations for storing temporary, or frequently accessed, values during the processing of the application.
In the simplest form, you have Program (Instruction memory), Data memory, Stack Memory, and Registers.
ARM instructions are stored in the Instruction memory, they are a sequence of commands which tell the processor what to do. They are never stored in the registers of the processor. The program counter only points to that instruction, that instruction is simply a command which in the basic form has an opcode (operation code) and variables/literals ..
So what happens is that the instruction is read from memory (fetched) from the location pointed to by the program counter. It is not loaded into the registers, but the control unit where it is decoded (that is to know what operation to do, i.e. add, sub, mov etc) and where to read/store its inputs and outputs.
So where are the inputs/outputs to operate on and store? The ARM architecture is a load/store architecture, it means it operates on data loaded into its registers, that is R1, R2 .. R7 ..etc .. where the registers could be thought of as temporary variables where all inputs and outputs are stored. Registers are used because they are so fast and operate at the same speed of the processor frequency not as memory which is slower.
Now the question is, how to populate these registers with values in the first place?
Those values could be stored on the Data Memory or Stack memory, so there are instructions to copy them to these registers, followed by instructions to operate on them and store the value in the registers, then followed by other instructions to copy the result back to Memory. Some instructions could also load a register with a constant.
Instruction 1 // Copy Variable X into R1 from memory
Instruction 2 // Copy Variable Y into R2 from memory
ADD, R3, R1, R2 // add them together
Instruction 3 // Copy back the result into Memory
I tried to make it as simple as possible, there are so many details to cover. Needs books :)

Is 2 pass on the source file necessary for assembler and linker?

I heard many times that the assembler and linker need to traverse its input file at least 2 times, is this really necessary? Why cannot it been done in one pass?
The assembler translates a symbolic assembler language into a binary representation.
In the input language (assembler), labels are symbolic too.
In the binary output language they are typically a distance in bytes, relative to the current position or some other fixed point (e.g. jump so many bytes ahead or back).
The first pass just determines the offset from the start of the code or some other fixed point of all assembler instructions to fixate the position of the labels.
This allows to calculate the correct jump distances from branch instructions in the second pass.
One pass assembler would be possible, but you would only be able to jump to labels you already had declared ("bacK") not forward.
One example when this is necessary is when two functions call each other.
int sub_a(int v);
int sub_b(int v);
int sub_a(int v) {
int u = v;
if ( 0 < u ) {
u = sub_b( v - 1 );
}
return u - 1;
}
int sub_b(int v) {
int u = v;
if ( 0 < u ) {
u = sub_a( v - 1 );
}
return u - 1;
}
It is then necessary to do a two-pass scan. As any ordering of the functions will have a dependency on a function that hasn’t been scanned.
it may even take more than two.
here:
...
jmp outside
...
jmp there
...
jmp here
...
there:
In particular for instruction sets that have some form of a near jump and some form of a far jump. The assembler doesnt always want to waste a far jump on every branch/jmp. Take the code above for example when it gets to the jmp here line it knows how many instructions are between the here label and the jump to here instruction. it can make a pretty good estimate if it is going to need to encode that as a near or far jump. Normally the far version of a jump is a case where it takes more bytes to implement causing all the instructions and labels that follow to shift.
When it encounters the jmp there instruction it does not know long or far and has to come
back later on a separate pass (through the data). When it encounters the label there it could go back and look to see if up to this point there has been a reference to it, and patch up that reference. that is another pass through the data, pass 2. or you just make one complete pass through the source code, then start to go back and forth through the data more times.
Lets say the jump outside does not resolve a label. Now depending on the instruction set the assembler has to respond. Some instruction sets, lets say the msp430 where a far jump simply means an absolute address in memory, all of memory space, no segments or nothing like that. Well you could simply assume a far jump and leave the address for the linker to fill in later. some instruction sets like ARM you have to allocate some memory, within near
reach of the instruction. often hiding things behind unconditional branches (this can be a bad thing and fail). Basically you need to allocate a place where the whole address to the
external item can be referenced, encode the instruction to load from that near memory location and let the linker fill in the address later.
Back to here and there. What if on the first pass you assumed that all of the unknown jumps were near and on the first pass computed addresses based on that. And if on that pass here was exactly 128 bytes from the jmp here instruction for an instruction set that has a reach of only 128 bytes. So you assume jmp here is also near, and to make this painful what if when there was found jump there to there was 127 bytes which was your maximum near jump forward. But outside is not found! it has to be far, so you need to burn some more bytes, now the here to jmp here is too far it needs to be more bytes, now the jmp there is too far and it needs to be more bytes. How many passes through the data did it take to figure those three things out? More than two. One pass to start. the second pass marks outside as far, the assumption has jmp there as near on the second pass, when it gets to jmp here it discovers that has to be a far jump causing the there address to change. The third pass it discovers that jmp there needs to be far and that affects everything after that instruction. For this simple code that is it everything is resolved.
think about a bubble sort. you keep looping through the data doing swaps until you have a flag that says, I made no changes on that last pass, indicating everything is resolved, we are done. You have to play the same game with an assembler. For instruction sets like ARM you need to do things like try to find places to tuck away addresses and constants/immediates that dont encode into a single instruction. That is if the assembler
wants to do that work for you. You could easily declare an error and say the destination
is too far for the instruction chosen. Arm assemblers allow you to be lazy and do things like:
ldr r0,=0x1234567
...
ldr r1,=lab7
...
lab7:
The assembler looks at that = and knows it has to determine, can I encode that constant/immediate in the instruction (changing your ldr to a mov for you) or do I need to find a place wedged in your code to place the word, and then encode the instruction with a
near address offset.
Even without dealing with near and far, simply resolving addresses, the outside, there, here example above takes two passes. first pass reads everything, jump here happens to know where here is on the first pass. but you have to make a second pass through the program (not necessarily from the disk, can keep the info in memory) there might be a jump to here that preceeds the here: label. the second pass will find the jump outside and know there is no outside label in the program marking it on the second pass as unresolved or external depending on the rules of the assembler. The second pas resolves the jump there as being a known label, and the second pass doesnt mess with the jump here because it resolved it on
the first pass. This is your classic two pass problem/solution.
The linker has the same problem, it has to pass through all the sources, think of each object as a complicated line in source code. it finds all the labels, both ones found in the objects and ones not resolved in the object. If it finds the I need an "outside" label in the second file out of 10 files, it has to pass through all 10 files, then go back through the data either on file or in memory to resolve all the forward referenced labels. It wont know on the first occurrence of jmp outside that there was no outside label, on the second pass through is when it finds jmp outside, looks through the list it keeps of found labels (that could be considered a third pass) finds no outside label and declares an error.

Resources