Using Nested For Loops and an Array in MIPS - c

This is a homework assignment, I've written the whole program myself, run through it in the debugger, and everything plays out the way I mean it to EXCEPT for this line:
sw $t1, counter($a3)
The assignment is to convert this snippet of C code to MIPS
for(i = 0; i < a; i++) {
for(j = 0; j < b; j++) {
C[2 * i] = i – j; } }
All the registers change values the way they should in my program except for $a3 - It never changes.
Changes: An array needed to be declared and "pointed to" by a register and a label can't be used for an offset in the manner I started with
EDIT: Here's the finished, working code

Recap answer from the comments
Your $a3 register, is supposed to be loaded with the address of an array defined in the .data section.
One big problem with your code is how you constructed your loops. The best way is to translate your loops step by step, and one loop at a time. Also, remember that :
for( i = 0; i < a; i++ )
{
loop_content;
}
Is equivalent to :
i = 0;
while( i < a )
{
loop_content;
i++;
}
Which is easier to translate in assembly. The condition just have to be negated, has you need an "exit" condition, and not a "continue" condition as in a while loop. Your code will be much clearer and easier to understand (and less error prone).
Your "out of range" error comes from here : sw $t1, counter($a3). Here counter is a label, and therefore an address. Thus counter($a3) is doing "$a3 (=0x10010008) + address of counter (=0x100100f8)", giving 0x20020100, which is clearly not what you want (and non-sense).
Oh, and in the sw $r, offset($a) MIPS instruction, offset MUST be a 16-bit CONSTANT. Here, you use a 32-bit address, but it's just that the assembler kindly translate sw $t1, counter($a3) to $x = $a3 + counter; sw $t1, 0($x), which is why you may see a sw with 0 as offset.

Related

How to deal with multiple line loop-like blocks in bison for 3 address code generation?

I'm trying to create a compiler and one of the cases that I want to tackle is somthing like this:
i = 3
a = 0
repeat i - 1 do
a = a + i
done
The expected output would be:
i = 3
a = 0
$t1 := i SUBI 1
$t2 := 0
LINE_NUMBER: $t3 := a ADDI i
$t2 := $t2 ADDI 1
IF $t2 LTI $t1 GOTO LINE_NUMBER
But no matter what I try my code won't even run if I add one of these blocks to my test file. The idea is to not add backpatching, just manually capture the line Im at once I detect a REPEAT expr DO ENDLINE line and then the comparation.
As of now I have a basic 3 address code generator that makes use of code in c to deal with simple operations that deal with things like a = a + 2 and a emet function that sends the PUT, HALT, PARAM and generally generates the instructions.
I tried adding things like this to my y file:
REPEAT expr do '\n' input DONE {}
iterative : REPEAT expr DO ENDLINE sentences_list DONE ENDLINE {
salt = ln_inst;
emet(NULL,0,NULL,"END\n",NULL);
}
REPEAT expr capture DO ENDLINE sentences_list DONE ENDLINE {}
capture : DO { line_number = current_line }
The main idea is for a program to be made of a list_sentences and then each sentence have a possible type (assignment, operation, and lastly this iterative block, which in itself can contain a list of sentences as the loop).
If needed will post my files for easier understanding of my current lex file and .y, since Im not sure if Im getting my point across.
Any help will be greatly appreciated.

Why does the following loop unrolling lead to a wrong result?

I am currently trying to optimize some MIPS assembler that I've written for a program that triangulates a 24x24 matrix. My current goal is to utilize delayed branching and manual loop unrolling to try and cut down on the cycles. Note: I am using 32-bit single precision for all the matrix arithmetic.
Part of the algorithm involves the following loop that I'm trying to unroll (N will always be 24)
...
float inv = 1/A[k][k]
for (j = k + 1; j < N; j++) {
/* divide by pivot element */
A[k][j] = A[k][j] * inv;
}
...
I want
...
float inv = 1/A[k][k]
for (j = k + 1; j < N; j +=2) {
/* divide by pivot element */
A[k][j] = A[k][j] * inv;
A[k][j + 1] = A[k][j + 1] * inv;
}
...
but it generates the incorrect result and I don't know why. The interesting thing is that the version with loop unrolling generates the first row of matrix correctly but the remaining ones incorrect. The version without loop unrolling correctly triangulates the matrix.
Here is my attempt at doing it.
...
# No loop unrolling
loop_2:
move $a3, $t2 # column number b = j (getelem A[k][j])
jal getelem # Addr of A[k][j] in $v0 and val in $f0
addiu $t2, $t2, 1 ## j += 2
mul.s $f0, $f0, $f2 # Perform A[k][j] * inv
bltu $t2, 24, loop_2 # if j < N, jump to loop_2
swc1 $f0, 0($v0) ## Perform A[k][j] := A[k][j] * inv
# The matrix triangulates without problem with this original code.
...
...
# One loop unrolling
loop_2:
move $a3, $t2 # column number b = j (getelem A[k][j])
jal getelem # Addr of A[k][j] in $v0 and val in $f0
addiu $t2, $t2, 2 ## j += 2
lwc1 $f1, 4($v0) # $f1 <- A[k][j + 1]
mul.s $f0, $f0, $f2 # Perform A[k][j] * inv
mul.s $f1, $f1, $f2 # Perform A[k][j+1] * inv
swc1 $f0, 0($v0) # Perform A[k][j] := A[k][j] * inv
bltu $t2, 24, loop_2 # if j < N, jump to loop_2
swc1 $f1, 4($v0) ## Perform A[k][j + 1] := A[k][j + 1] * inv
# The first row in the resulting matrix is correct, but the remaining ones not when using this once unrolled loop code.
...
The unrolled C loop condition is buggy.
j < N; j +=2 can start the loop body with j = N-1,
accessing the array at A[k][N-1] (fine) and A[k][N] (not fine).
One common method is j < N-1, or in general j < N-(unroll-1). But for unsigned N, you also have to separately check N >= unroll before starting the loop, because N-1 could wrap to a huge unsigned value.
Keeping the j < limit is generally good for C compilers vs. j + 1 < N which is a separate thing they'd have to calculate. And can also stop a compiler from proving that the loop isn't infinite for unsigned counts (like size_t), because that's well-defined as wrapping around, so N = UINT_MAX could lead to the condition always being true depending on the starting point. (e.g. j = UINT_MAX-2 makes UINT_MAX-1 < UINT_MAX, and j+=2 makes 0 < UINT_MAX, also true.) So it's a similar problem to using j <= limit for unsigned counters. Compilers really like to know when a loop is potentially infinite. For some, that it disables auto-vectorization if the trip-count isn't calculable ahead of the first iteration.
If j was starting at 0, you can get away with a sloppy condition if N was guaranteed to be a multiple of the unroll factor. But here it's different, as Nate points out.
efficiency of your MIPS asm
generally the point of loop unrolling is performance. A non-inline call to a helper function inside the loop is kind of defeating the purpose.
jal getelem I assume does a bunch of multiplies and stuff to redo the indexing with a pointer and two integers? Notice that you're scanning along contiguous memory in one row, so you can just increment a pointer.
Calculate an end-pointer to compare against, so your MIPS loop can look like
# some checking outside the loop, maybe with a bxx to the end of it.
looptop: # do{
lwc1 $f2, 0($t0)
lwc1 $f3, 4($t0)
addiu $t0, $t0, 4*2 # p+=2 advance by 8 bytes, 2 floats
...
swc1 something, 0($t0)
swc1 something, 4($t0)
bne $t0, $t1 # }while(p!=endp)
# maybe another condition to check if you should run one last iteration.
MIPS bltu is only a pseudo-instruction (sltu/bnez); that's why it's better to calculate an exact end-pointer so you can use a single machine instruction as the loop branch.
And yes, this might mean rounding the iteration count down to a multiple of 2 to ensure correctness. Or doing a scalar iteration and rounding up to a multiple of 2. e.g. x++ / x&=-2;
With software pipelining, e.g. doing a load and divide but not a store yet, you could maybe let the rounding-up have the loop redo that element if odd. (If the chance of a branch mispredict costs more than an FP multiply and a redundant store.) Haven't fully thought this through, but it's a similar idea to SIMD doing a first unaligned vector, then a potentially-partially-overlapping aligned vector. (SIMD vectorization is like unrolling, but then you roll back up into a single instruction that does 4 elements, for example.)

Recursive Fibonacci MIPS

I started to read MIPS to understand better how my C++ and C code works under the computer skin. I started with a recursive function, a Fibonacci function.
The C code is:
int fib(int n) {
if(n == 0) { return 0; }
if(n == 1) { return 1; }
return (fib(n - 1) + fib(n - 2));
}
MIPS code:
fib:
addi $sp, $sp, -12
sw $ra, 8($sp)
sw $s0, 4($sp)
addi $v0, $zero, $zero
beq $a0, $zero, end
addiu $v0, $zero, 1
addiu $t0, $zero, 1
beq $a0, $t0, end
addiu $a0, $a0, -1
sw $a0, 0($sp)
jal fib #fib(n-1)
addi $s0, $v0, $zero
lw $a0, 0($sp)
addiu $a0, $a0, -1
jal fib #fib(n-2)
add $v0, $v0, $s0
end:
lw $s0, 4($sp)
lw $ra, 8($sp)
addi $sp, $sp, 12
jr $ra
When n>1 it goes until the code reaches the first jal instruction. What happens next? it return to fib label ignoring the code below (the fib(n-2) call will never be executed?)? If that happens, the $sp pointer decreases 3 words again and the cycle will go until n<=1. I can't understand how this works when first jal instruction is reached.
Can you follow how the recursion works in C?
In some sense, recursion has two components: the forward part and the backward part.  In the forward part, a recursive algorithm computes things before the recursion, and in the backward part, a recursive algorithm computes things after the recursion completes.  In between the two parts, there is the recursion.
See this answer: https://stackoverflow.com/a/71551098/471129
Fibonacci is just slightly more complicated as it performs recursion twice, not just once as in the above list printing example.
However, the principles are the same:  There is work done before the recursion, and work done after (either of which can be degenerate).  The before part happens as code in front of the recursion executes, and the recursion builds up stack frames that are placeholders for work after the recursion yet to be completed.  The after part happens as the stack frames are released and the code after the recursive call is executed.
In any given call chain, the forward part goes until n is 0 or 1, then the algorithm starts returning back to the stacked callers, for whom the backward part kicks in unwinding stack frames until it returns to the original caller (perhaps main) rather than to some recursive fib caller.&npsp; Again, complicated by use of two recursive invocations rather than one as in simpler examples.
With fib, the work done before is to count down (by -1 or -2) until reaching 0 or 1.  The work done after the recursion is to sum the two prior results.  The recursion itself effectively suspends an invocation or activation of fib with current values, to be resumed when a recursive call completes.
Recursion in MIPS algorithm is the same; however, function operations are spread out over several machine code instructions that are implicit in C.
Suggest single stepping over a call to fib(2) as a very small example that may help you see what's going on there.  Suggest first doing this in C — single step until the outer fib call has full completed and returned to the calling test function (e.g. main).
To make the C version just a bit easier to view in the debugger you might use this version:
int fib(int n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
int fm1 = fib(n-1);
int fm2 = fib(n-2);
int result = fm1 + fm2;
return result;
}
With that equivalent C version, you'll be able to inspect fm1, fm2, and result during single stepping.  That will make it easier to follow.
Next, do the same in the assembly version.  Debug single step to watch execution of fib(2), and draw parallels with the equivalents in C.
There's another way to think about recursion, which is ignore the recursion, pretending that the recursive call is to some unrelated function implementation that just happens to yield the proper results of the recursive function; here's such a non-recursive function:
int fib(int n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
int fm1 = fibX(n-1); // calls something else that computes fib(n-1)
int fm2 = fibX(n-2); // "
int result = fm1 + fm2;
return result;
}
With this code, and the assumption that fibX simply works correctly to return proper results, you can focus strictly on the logic of one level, namely, the body of this fib, without considering the recursion at all.
Note that we can do the same in assembly language — though the opportunities for errors / typos are always much larger than in the C, since you still have to manipulate stack frames and preserve critical storage for later use after the calling.
The code you've posted has a transcription error, making it different from the C version.  It is doing the C equivalent of:
return fib(n-1) + fib(n-1);

MIPS assembly: Why can main exit without deallocating its stack space?

I have a question for a university exercise that I don't understand. We have to translate from C to assembly MIPS. In the main I have to allocate 400 bytes for the a[100] vector, but in the solutions my professor is not deallocating it at the end of the function, why is this happening? are there cases in which I don't need to deallocate memory moving stack pointer?
Here's the code in C:
int idamax(int n, float * dx, int incx) {
float dmax;
int i, ix, itemp;
if (n < 1) return (-1);
if (n == 1) return (0);
if (incx != 1) {
ix = 1;
dmax = fabs(dx[0]);
ix = ix + incx;
for (i = 1; i < n; i++) {
if (dmax < fabs(dx[ix])) {
itemp = i;
dmax = fabs(dx[ix]);
}
ix = ix + incx;
}
} else {
itemp = 0;
dmax = fabs(dx[0]);
for (i = 1; i < n; i++) {
if (dmax < fabs(dx[i])) {
itemp = i;
dmax = fabs(dx[i]);
}
}
}
return (itemp);
}
int main() {
float a[100];
int l, k, n = 100, lda = 10;
for (k = 0; k < n; ++k) a[k] = (float)((k * k * k) % 100);
k = 4;
l = idamax(n - lda * k - k, &a[lda * k + k], 1) + k;
print_int(l);
exit;
}
Main assembly code:
main:
#______CALL_FRAME______
# 100 float: 400B
#______Totale 400B
addi $sp,$sp,-400
add $t9,$sp,$0 #&a
addi $t0, $0, 100 #n=100
addi $t1, $0, 10 #lda=10
#l in t2, k in t3
add $t3, $0, $0 #k=0
main_forini:
slt $t5,$t3,$t0 #k<?n
beq $t5,$0,main_forend
mult $t3, $t3 #k*k
mflo $t5
mult $t3, $t5
mflo $t5 #k*k*k
div $t5,$t0 #()%n
mfhi $t5
mtc1 $t5,$f0
cvt.s.w $f1,$f0 #(float)()
sll $t5,$t3,2 #k*4
add $t5,$t5,$t9 #&a[k]
swc1 $f1,0($t5) #a[k]=()
addi $t3, $t3, 1 #++k
j main_forini
main_forend:
addi $t3,$0,4 #k=4
mult $t1,$t3 #lda*k
mflo $t5
add $t5,$t5,$t3 #lda*k+k
sub $a0,$t0,$t5 #a0=n-lda*k-k
sll $t5,$t5,2
add $a1,$t5,$t9 #a1=&a[lda*k+k]
addi $a2,$0,1 #a2=1
jal idamax
addi $a0,$v0,4 #a0=l=retval+k
addi $v0,$0,1 #print_int
syscall
addi $v0,$0,10 #exit
syscall
Execution of main never reaches the bottom of the function so cleanup of the stack never needs to happen; exit() is a "noreturn" function.
If main did want to return with jr $ra instead of making an exit system call, you would need to restore the stack pointer along with other call-preserved registers. Otherwise you'd be violating the calling convention that main's caller expects main to follow.
(Updated since you added asm to the question that uses a MARS system call: that main is probably not a function if it's the top of your code: $ra isn't a valid return address on entry so it couldn't return. IMO don't call it main if it's not a function.)
The OS doesn't care where the user-space stack pointer is pointing when the process makes an exit system call, so there's no need for main to clean up before exiting.
(In a "normal" C implementation, the exit() function would compile to a jal exit or a simple tailcall j exit. But you're compiling by hand for the MARS simulator which has no C library, so you inline system calls instead of calling wrapper functions.)
Also note that ISO C exit(int) takes an arg, like MARS exit2 (syscall/$v0=17). In fact you didn't even call exit() as a function, you just wrote exit; in C which evaluates the exit as a function pointer without calling it or doing anything with that value.
Typically C main is called by CRT startup code that might for example run C library init functions and put argc and an argv[] pointer in the right registers. So main is usually not the actual process entry point from the OS, especially not in a hosted implementation. (i.e. compiled C programs run under an OS, rather than being their own kernel like a freestanding program.)
If you're just translating this for the MARS or SPIM simulators or something, then there is no C library or any code beyond what you write, so what you're writing is what would normally be called _start, not main.
In C main is a function, but in MARS you can't jr $ra from the top-level entry point so the entry point is not a function. Thus don't call it main.
In ISO C it's even legal for main to call itself recursively, or other functions to call main. That can only work if main truly is a function that cleans up the stack and returns properly. But that means it can't also be the process entry point that needs to make an exit system call. To run a program with a crazy recursive main that eventually does a C return statement (or falls off the end of main), main pretty much has to be compiled to a real function that can return with jr $ra. So it has to be a function that you jal main to from your _start entry point.
There are two possible answers here.
The first answer is that main is the first and last function of your program. The OS will clean up afterwards.
The second answer would be for other functions that use stack memory. Stack memory is generally freed by restoring the stack frame of the calling function (which main doesn't have, hence the exception).

Simple Loop In Assembly (Branch Unconditionally)

I'm trying to create a simple loop in assembly to perform an instruction until a certain condition is met. For example, I want to implement this C code in assembly:
int compute_sum(int n)
{
i = 2;
sum = 0;
while(i <= n)
{
sum = sum + i;
i = i + 4;
}
}
The outline I made for myself is this:
/ ADD (compute sum)
/ Increment to keep track of # times passed through loop
/ SNA (skip if difference between user input and number is < 0)
/ BUN xxx (repeat)
I read in user input and have the decimal representation, but do not know the address that should follow BUN so that the instructions are repeated. These are all done in simple computer instructions
You might want to practice getting into the correct mindset by using C without structured conditions (ie using labels and gotos):
i = 2;
sum = 0;
loop:
if (i > n) goto finished;
sum = sum + i;
i = i + 4;
goto loop;
finished:
This is perfectly valid C (albeit archaic) but shows what you need to do at the simplest level. Compare i with n and branch to finished if greater and branch unconditionally to the loop level.
If the assembler language you are using does not have an unconditional branch then you can set the flag and branch (eg SEC, BCS loop) or count on the idea that i will not overflow and when you add 4, branch on no overflow - just make sure it doesn't fail catastrophically if this is not the case.
So, in assembler (which shares the label syntax), you would have:
loop:
cmp i, n ; Or register equivelents
bgt finished
....
add i, 4
bvc loop
finished:

Resources