Jump to Address after Branching in For Loop in MIPS - c

I am trying to code a program that checks if the 16 bits in an integer is a one or zero. I chose to implement this by shifting right one bit 15 times and checking if the first bit in each shift is a zero or non zero. Then, if the first bit is a 1, I increment an integer.
I made some code in C that represents a non-user input version of my code.
int j = 100;
int checker = 0;
int count = 0;
for (i=0; i<16; i++) {
checker = j & 0x1;
if (checker > 0)
count++;
j = (j >> 1);
}
My code in MIPS:
.data
userprompt: .asciiz "Enter positive integer: "
newline: .asciiz "\n"
.text
.globl main
main:
li $v0, 4 # System call: Display string
la $a0, userprompt # Load string userprompt for output
syscall
li $v0, 5 # System call: Read integer
syscall
move $s0, $v0 # Store integer from v0 to s0
move $s1, $s0 # s1 = s0
li $t0, 0 # t0 = 0
jal chk_zeros # Run function: chk_zeroes
li $v0, 1 # System call: Read integer
move $a0, $t2 # Store integer from t2 to a0
syscall
li $v0, 10 # System call: quit
syscall
chk_zeros:
bgt $t0, 15, exitchk # t0 <= 15
addi $t0, $t0, 1 # Add one to t0
andi $t1, $s1, 0x1 # Check if first bit is non-zero, store in t1
bgtz $t1, chk_zerosadd # If t1 >= 0
j chk_zeros
chk_zerosadd:
addi $t2, $t2, 1 # Add one to t2
jr $ra # Return to after the if statement (does not work!)
exitchk:
jr $ra
What I am having trouble with is making chk_zerosadd return to after the branching statement. jr $ra seems to return me to my main function in chk_zerosadd.

bgtz doesn't place the next PC address into the return address register, so jr $ra won't return to the instruction after the branching statement. You can either use bgtzal (branch if greater than zero and link), which will give you the behaviour you are looking for, or else you can re-arrange your code so that you branch over the add, instead of branching to it, like this:
andi $t1, $s1, 0x1 # Check if first bit is non-zero, store in t1
beq $t1, chk_zerosskipadd # Jump if $t1 is zerp
addi $t2, $t2, 1 # Add one to t2
chk_zerosskipadd:
# continue execution...
srl $s1, $s1, 1 # j = (j >> 1);
j chk_zeros

Related

Square Root MIPS assembly

Hi I need help in coding in MIPS. I have to find the Floor Square root of the n input. I have the C version of the code..
uint32_t isqrt(uint32_t n) {
if(n<2) return n;
uint32_t s = isqrt(n >> 2) << 1;
uint32_t l = s + 1;
if (l * l > n)
return s;
else
return l;
}
apparently it recurses itself..
Right now my function looks something like. Im not really sure what I am doing wrong
#isqrt
isqrt:
#prologue
subu $sp, $sp, 12
sw $ra, 8($sp)
sw $s0, 4($sp)
sw $s1, 0($sp)
#if(n<2) n Branch if Greater Than 2
blt $a0, 2, lt2 #if(n<2) return n;
#else uint32_t small = isqrt(n >> 2) << 1;
srl $s0, $a0, 2 # small = isqrt(n >> 2)
jal isqrt
sll $s0, $s0, 1 # then << 1
add $s1, $s0, 1 # large = small + 1
li $s3, 0
mul $s3, $s1, $s1 # large = large * large\
#if large * large > n return small else return large
bgt $s3, $s0, small # if l * l > n return small
move $v0, $t1 # else return large
lt2:
move $v0, $a0;
j end
small:
move $v0, $s0
j end
end:
lw $ra, 8($sp)
lw $s0, 4($sp)
lw $s1, 0($sp)
addi $sp, $sp, 12
jr $ra
Good effort, but many, many, many errors.  Suggest trying a more methodological approach.  Check your choices of registers, you register numbers, and prologue/epilogue code for proper handling of the various register kinds.
That code is doing if (n>=2) return n; — the exact opposite of the C code.
The called function isqrt expects a formal parameter in $a0, but that code is passing the actual argument in $t0, so there's a mismatch.
The called function isqrt provides the return value in $v0, but when calling that recursively, expects it to have been provided in $t0, another mismatch.
The function uses $s0, but fails to ensure its value is call-preserved as it should have been.
The following code:
li $t0, 2 # small = isqrt(n >> 2) << 1
srl $t0, $t0, 2
generates 2>>2, which is clearly different from the n>>2 in the C code.
The following code:
equal:
move $v0, $t1
j end
is doing return l*l; where the C code wants return l;
I'd suggest starting with an analysis of what registers to use for what variables — this analysis requires examining liveness across the (recursive) function call.  Start that analysis on the C version initially.  When you're finished with that, you'll have a mapping of register numbers for all the C variables.  Then write prologue/epilogue that handle those registers as required.  Then translate the C code line by line into assembly paying careful attention to what variables are in what registers, and what expressions you're trying to recreate in assembly.

Converting a C function to MIPS assembly

I'm currently learning MIPS Assembly and I am attempting to convert the following C function into MIPS Assembly:
int count (int a[], int n, int x)
{
int res = 0;
int i = 0;
int j = 0;
int loc[];
for(i = 0; i != n; i++)
if(a[i] == x)
{
res = res + 1;
loc [j] = i;
j = j + 1;
}
return res, loc;
}
I've succeeded in converting most of it, and I believe I have successfully returned res (a value of 1), though I'm uncertain about returning loc (I also get a value of 1, and I don't think that's correct). However, I am having difficulty with this program and I am unsure as to how I can ensure that loc is returning the correct value or how to even code it to do so.
Here is my Assembly code:
.data
a: .word 5,6,7,8,9,10
n: .word
x: .word
res: .word 0
i: .word 0
jj: .word 0
loc: .space 40
.text
main:
la $s0, a
lw $s1, res
lw $s2, x
lw $t0, i
lw $t1, jj
lw $t2, n
la $s3, loc
li $t4, 6
start:
sll $t3, $t0, 2
add $t5, $t3, $s0
lw $t4, 0($t5)
beq $t0, $t4, start
addi $t0, $t0, 1
beq $t0, $t2, exit
addi $s1, $s1, 1
sll $t7, $t1, 2
add $t6, $s3, $t7
sw $t0, 0($t6)
addi $t1, $t1, 1
exit:
li $v0, 1
add $a0, $s1, $zero
syscall
li $v0, 1
add $a1, $s3, $zero
syscall
Any help, pointers, or suggestions would be very much appreciated.
EDIT: I've revised my code and now receive 0 for the res return and "268501028" for loc. Not sure where this number is coming from.
.data
a: .word 5,6,7,8,9,10
n: .word #n
x: .word #x
res: .word 0
i: .word 0
jj: .word 0
loc: .space 40
.text
main:
la $s0, a
lw $s1, res
lw $s2, x
lw $t0, i
lw $t1, jj
lw $t2, n
la $s3, loc
li $t4, 6
start:
beq $t0, $t2, exit #for(i = 0; i != n; i++)
bne $s0, $s2, else #if(a[i] == x)
j start
else:
addi $s1, $s1, 1 #res = res + 1;
sw $t0, ($t1) #loc [j] = i;
addi $t1, $t1, 1 #j = j+1
addi $t0, $t0, 1 #Increment i
addi $s3, $s3, 4 #Setting next element for loc
addi $s0, $s0, 4 #Setting next element for a
j start
exit:
li $v0, 1
move $a0, $s1
syscall
li $v0, 1
move $a0, $s3
syscall
Okay, there were a few bugs. I've annotated the source and added "BUG:" to hightlight them. I then created a cleaned up and corrected version
Here's your original code--no bug fixes, just annotations [please pardon the gratuitous style cleanup]:
# int
# count(int a[], int n, int x)
# {
# int res = 0;
# int i = 0;
# int j = 0;
# int loc[n];
#
# for (i = 0; i != n; i++) {
# if (a[i] == x) {
# res = res + 1;
# loc[j] = i;
# j = j + 1;
# }
# }
#
# return res, loc;
# }
.data
a: .word 5,6,7,8,9,10
n: .word
x: .word
res: .word 0
i: .word 0
jj: .word 0
loc: .space 40
nl: .asciiz "\n"
.text
.globl main
main:
la $s0,a
lw $s1,res
lw $s2,x
lw $t0,i
lw $t1,jj
lw $t2,n
la $s3,loc
li $t4,6 # BUG: extraneous (gets trashed below)
start:
sll $t3,$t0,2 # get i << 2
add $t5,$t3,$s0 # get &a[i]
lw $t4,0($t5) # fetch it
# BUG: we're comparing a[i] against i but we want to compare against x
# _and_ we want to flip the sense of the branch
beq $t0,$t4,start # is it a match? if yes, loop
addi $t0,$t0,1 # increment i
beq $t0,$t2,exit # i == n? if no, loop. if yes, exit
# BUG: the indexing here is wrong
addi $s1,$s1,1 # j += 1
sll $t7,$t1,2 # get jj << j
add $t6,$s3,$t7 # &loc[jj << j] (BUG: we want &loc[j])
sw $t0,0($t6) # set it to i
addi $t1,$t1,1 # jj += 1
# BUG: we should loop here and _not_ fall through
exit:
# print j (with newline)
li $v0,1
add $a0,$s1,$zero
syscall
li $v0,4
la $a0,nl
syscall
# print _address_ of loc[0]
# BUG: if we care to print anything, we should print the _values_ of the
# whole array
li $v0,1
# BUG: this should be a0 and _not_ a1
###add $a1,$s3,$zero
add $a0,$s3,$zero
syscall
li $v0,4
la $a0,nl
syscall
li $v0,10 # exit program
syscall
Here's the cleaned up and corrected version. I had to do a bit of restructuring and simplification to make it work, so it may seem a bit "alien" at first. However, I tried to retain your register usage where possible.
I also increased the size of the a array and added a user prompt for the x value:
# int
# count(int a[], int n, int x)
# {
# int i = 0;
# int j = 0;
# int loc[n];
#
# for (i = 0; i != n; i++) {
# if (a[i] == x) {
# loc[j] = i;
# j += 1;
# }
# }
#
# return j, loc;
# }
.data
a: .word 5,6,7,8,9,10
.word 5,6,7,8,9,10
.word 5,6,7,8,9,10
.word 5,6,7,8,9,10
.word 5,6,7,8,9,10
ae:
loc: .space 1000
prompt: .asciiz "Enter x value: "
msgnl: .asciiz "\n"
msgj: .asciiz "j: "
msgloc: .asciiz "loc: "
.text
# main -- main program
#
# RETURNS [sort of as this is a main program]:
# s1 -- j value (count of elements in "loc")
# loc -- filled in indexes into "a" array of matches to x
#
# registers:
# s0 -- a (base address of "a" array)
# t2 -- n (number of elements in "a" array)
#
# s2 -- x (value to match)
# t0 -- i (current index into "a" array)
# s3 -- loc (base address of "loc" array)
# s1 -- j (current index into "loc" array)
#
# t6 -- quick temporary [reusable]
# t7 -- used in array offset/index calculations [reusable]
.globl main
main:
# prompt for x value
li $v0,4 # syscall: print string
la $a0,prompt
syscall
# read in x value
li $v0,5 # syscall: read integer
syscall
move $s2,$v0
# get address of "a" array and compute length
la $s0,a # get &a[0]
la $t2,ae # get address of &a[n]
sub $t2,$t2,$s0 # get number of bytes in a
srl $t2,$t2,2 # get number of words in a (i.e. n)
li $t0,0 # i = 0
li $s1,0 # j = 0
la $s3,loc # base address of loc array
# main matching loop
loop:
sll $t7,$t0,2 # get i << 2
add $t7,$t7,$s0 # get &a[i]
lw $t6,0($t7) # fetch from it
bne $t6,$s2,next # a[i] == x? if no, advance to next element
# add new "i" value to loc array
sll $t7,$s1,2 # get j << 2
add $t7,$s3,$t7 # &loc[j << 2]
sw $t0,0($t7) # store i into loc
addi $s1,$s1,1 # j += 1
next:
addi $t0,$t0,1 # i += 1
blt $t0,$t2,loop # i < n? if yes, loop (or, we're done)
# done with calculation/fill loop
done:
la $s6,msgj # get prefix string
move $s7,$s1 # get j
jal prtnum # pretty print the number
blez $s1,exit # bug out if _no_ values in loc
# prepare to print all values of loc
la $t6,loc # base address of "loc"
li $t7,0 # initial index
# loop and print all values of loc
prtlocloop:
la $s6,msgloc # prefix string
lw $s7,($t6) # get loc[...]
jal prtnum # pretty print the number
add $t6,$t6,4 # increment address
add $t7,$t7,1 # increment index
blt $t7,$s1,prtlocloop # done? if no, loop
exit:
li $v0,10 # exit program
syscall
# prtnum -- print a number with a prefix string on a single line
#
# arguments:
# s6 -- prefix string
# s7 -- value to print
#
# registers:
# v0 -- syscall number [trashed]
# a0 -- syscall argument [trashed]
prtnum:
li $v0,4 # syscall: print string
move $a0,$s6 # string to print
syscall
li $v0,1 # syscall: print integer
move $a0,$s7 # value to print
syscall
li $v0,4 # syscall: print string
la $a0,msgnl
syscall
jr $ra # return
UPDATE:
What exactly is the difference between print and prtnum?
print is the label for the top of the loop that prints the values in loc. prtnum is subroutine/function that does the printing of a single number.
I added prtnum to demonstrate the use of a function and to avoid needless replication of some code.
Can they not be properly merged?
Sure, with some caveats. I did a slight/cosmetic edit to try to make things clearer. In particular, I renamed print: to prtlocloop: to try and make its role clearer.
The syscall(1) for "print integer" just prints the integer but does not add any whitespace or newline to separate them (i.e. it's exactly like printf("%d",a0)). So, we need something.
Originally, I just had the syscall(print_integer). With that, we get one "very long" number. Then, I added syscall(4) to print a newline. This was fine except the output was a bit confusing as to which value was j and which were the loc values.
(1) So, I added the "prefix" string. So, that became three syscalls for each number.
(2) This was used in two places: To print j and to print the loc values.
Same code in two or more places. That's the standard criterion for "split out code to function" in any language. It's a design/style choice [so there is no absolute answer].
So, with (1) and (2), I moved it to the prtnum function. Actually, I wrote the prtnum function first because I already knew the structure, and added the prefix argument after the output "looked ugly" [to me].
When I first coded it, I used "j: " for j and used a " " prefix for loc. It still looked a little funky. So, I changed the prefix to "loc: " to be consistent.
Could it be inlined? Sure. But, in addition to printing the number itself, we still have to add a separater. So, we need two syscalls per number to do it.
The separater could be a space if we want to put all numbers on the same output line. Fine for short vectors. This would require a slight change to the code as it exists now and we'd have to add a final output of newline to close the line. For longer arrays [that might not fit on a single line], one per line is [probably] tidier.
We only had to print j and loc. If the problem stated that we had to print a, then j, and then loc, I would have gone the other way.
I would have changed prtlocloop into another function (e.g. prtarray), that would loop on the given array and call prtnum for each element.
The first step was getting the calculation loop correct. The second was the printing. But, sometimes, they have to be done together. (i.e.) How can you debug something that you can't see?
So, with calculation correct, you are free to recode the output printing in any way you choose. The prtnum was just my way. But, it is by no means the only way.
Beyond the basic mechanics of working with the asm instructions, the choices are just like in any other language [notably C]. Comment well, choose the simplest and most effective way to architect/split the code, use descriptive variable names, etc. Comments should show "intent", the "what/why". The asm instructions are the "how".
Side note: Some OPs have had serious difficulty understanding how sll [which you already understand] works. They just didn't "get" the fact that a left shift by 2 was like a multiply by 4 and converts an index value into byte/address offset. So, you may already be ahead of the game ...
Yesterday, I gave an answer for a mips question where I went the other way and recommended inlining two functions. The problem was to calculate sin(x) using a Taylor series expansion [summation of terms] of the form: x**(2n)/factorial(2n-1).
With inlining, it was possible to reuse partial results from the previous term in the series without having to recalculate each term from scratch. This would not have been [conveniently] possible with multiple functions.
I didn't write the mips code, but I wrote the C/pseudo-code: mips program to calculate sin(x) The resulting mips code would [probably] have been simpler and would definitely run faster.

C language to MIPS. Fibonacci Number

I was trying to convert this piece of code into MIPS instruction. Lets say that a is in $a0, b is in $a1, n is in $a2, the result is in $v0, and to end the
program, call “jr $ra” to return to the subroutine caller
int fib_iter(int a, int b, int n) {
if (n == 0)
return b;
else
return fib_iter(a+b, a, n-1);
For the simplicity, we just ignore the stack frame for this one
And this is the MIPS code I converted:
bne $a1, $zero, ELISEIF // if b != 0 go to ELSEIF
lw $v0, $0($a1) // load b to result if n == 0
j DONE // done
ELSEIF:
lw $at, $0($a0) // temp = a
add $a0, $a0, $a1 // a = a + b
add $a1, $zero, $zero // clear b
lw $a1, $0($at) // b = a
sub $a2, $a2, $1 // n = n - 1
jr $ra // call the subroutine caller
Done:
what to put??
Please point out my errors(there might be a a lot since I am new to this)
Thanks for your time for helping me and I appreciate that
lw $v0 $0($a1) will do $v0 = $a1[0] instead of $v0 = $a1. To do the latter, use mv $v0 $a1.
Also $at is reserved for pseudoinstructions in MIPS. I means they get modified by pseudoinstructions. So, do not use it unless you are sure that you have not used any pseudoinstruction. $t1 to $t7 are temporary registers. Use any one of them.
Here is the correct code
FIB:
bne $a2, $zero, ELSE // if n != 0 go to ELSE
mv $v0, $a1 // load b to result if n == 0
jr $ra // end of recursion, so call the subroutine caller
ELSE:
mv $t0, $a0 // temp = a
add $a0, $a0, $a1 // a = a + b
mv $a1, $t0 // b = a
addi $a2, $a2, -1 // n = n - 1
j FIB // call FIB recursively

C Programming to MIPS Assembly (for Loops)

I'm Trying to convert this C code to MIPS assembly and I am unsure if it is correct. Can someone help me? Please
Question : Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D
C Code :
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
My Attempt at MIPS ASSEMBLY
add $t0, $t0, $zero # i = 0
add $t1, $t1, $zero # j = 0
L1 : slt $t2, $t0, $s0 # i<a
beq $t2, $zero, EXIT # if $t2 == 0, Exit
add $t1, $zero, $zero # j=0
addi $t0, $t0, 1 # i ++
L2 : slt $t3, $t1, $s1 # j<b
beq $t3, $zero, L1, # if $t3 == 0, goto L1
add $t4, $t0, $t1 # $t4 = i+j
muli $t5, $t1, 4 # $t5 = $t1 * 4
sll $t5, $t5, 2 # $t5 << 2
add $t5, $t5, $s2 # D + $t5
sw $t4, $t5($s2) # store word $t4 in addr $t5(D)
addi $t0, $t1, 1 # j ++
j L2 # goto L2
EXIT :
add $t0, $t0, $zero # i = 0 Nope, that leaves $t0 unmodified, holding whatever garbage it did before. Perhaps you meant to use addi $t0, $zero, 0?
Also, MIPS doesn't have 2-register addressing modes (for integer load/store), only 16-bit-constant ($reg). $t5($s2) isn't legal. You need a separate addu instruction, or better a pointer-increment.
(You should use addu instead of add for pointer math; it's not an error if address calculation crosses from the low half to high half of address space.)
In C, it's undefined behaviour for another thread to be reading an object while you're writing it, so we can optimize away the actual looping of the outer loop. Unless the type of D is _Atomic int *D or volatile int *D, but that isn't specified in the question.
The inner loop writes the same elements every time regardless of the outer loop counter, so we can optimize away the outer loop and only do the final outer iteration, with i = a-1. Unless a <= 0, then we must skip the outer loop body, i.e. do nothing.
Optimizing away all but the last store to every location is called "dead store elimination". The stores in earlier outer-loop iterations are "dead" because they're overwritten with nothing reading their value.
You normally want to put the loop condition at the bottom of the loop, so the loop branch is a bne $t0, $t1, top_of_loop for example. (MIPS has bne as a native hardware instruction; blt is only a pseudo-instruction unless the 2nd register is $zero.) So we want to optimize j<b to j!=b because we know we're counting upward.
Put a conditional branch before the loop to check if it might need to run zero times. e.g. blez $s0, after_loop to skip the inner loop body if b <= 0.
An idiomatic for(i=0 ; i<a ; i++) loop in asm looks like this in C (or some variation on this).
if(a<=0) goto end_of_loop;
int i=0;
do{ ... }while(++i != a);
Or if i isn't used inside the loop, then i=a and do{}while(--i). (i.e. add -1 and use bnez). Although MIPS can branch just as efficiently on i!=a as it can on i!=0, unlike most architectures with a FLAGS register where counting down saves a compare instruction.
D[4*j] means we stride by 16 bytes in a word array. Separately using a multiply by 4 and a shift by 2 is crazy redundant. Just keep a pointer in a separate register an increment it by 16 every iteration, like a C compiler would.
We don't know the type of D, or any of the other variables for that matter. If any of them are narrow unsigned integers, we might need to implement 8 or 16-bit truncation/wrapping.
But your implementation assumes they're all int or unsigned, so let's do that.
I'm assuming a MIPS without branch-delay slots, like MARS simulates by default.
i+j starts out (with j=0) as a-1 on the last outer-loop iteration that sets the final value. It runs up to j=b-1, so the max value is a-1 + b-1.
Simplifying the problem down to the values we need to store, and the locations we need to store them in, before writing any asm, means the asm we do write is a lot simpler and easier to debug.
You could check the validity of most of these transformations by doing them in C source and checking with a unit test in C.
# int a: $s0
# int b: $s1
# int *D: $s2
# Pointer to D[4*j] : $t0
# int i+j : $t1
# int a-1 + b : $t2 loop bound
blez $s0, EXIT # if(a<=0) goto EXIT
blez $s1, EXIT # if(b<=0) goto EXIT
# now we know both a and b loops run at least once so there's work to do
addiu $t1, $s0, -1 # tmp = a-1 // addu because the C source doesn't do this operation, so we must not fault on signed overflow here. Although that's impossible because we already excluded negatives
addu $t2, $t1, $s1 # tmp_end = a-1 + b // one past the max we store
add $t0, $s2, $zero # p = D // to avoid destroying the D pointer? Otherwise increment it.
inner: # do {
sw $t1, ($t0) # tmp = i+j
addiu $t1, $t1, 1 # tmp++;
addiu $t0, $t0, 16 # 4*sizeof(*D) # could go in the branch-delay slot
bne $t1, $t2, inner # }while(tmp != tmp_end)
EXIT:
We could have done the increment first, before the store, and used a-2 and a+b-2 as the initializer for tmp and tmp_end. On some real pipelined/superscalar MIPS CPUs, that might be better to avoid putting the increment right before the bne that reads it. (After moving the pointer-increment into the branch-delay slot). Of course you'd actually unroll to save work, e.g. using sw $t1, 16($t0) and 32($t0) / 48($t0).
Again on a real MIPS with branch delays, you'd move some of the init of $t0..2 to fill the branch delay slots from the early-out blez instructions, because they couldn't be adjacent.
So as you can see, your version was over-complicated to say the least. Nothing in the question said we have to transliterate each C expression to asm separately, and the whole point of C is the "as-if" rule that allows optimizations like this.
This similar C code compiles and translates to MIPS:
#include <stdio.h>
main()
{
int a,b,i,j=5;
int D[50];
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
}
Result:
.file 1 "Ccode.c"
# -G value = 8, Cpu = 3000, ISA = 1
# GNU C version cygnus-2.7.2-970404 (mips-mips-ecoff) compiled by GNU C version cygnus-2.7.2-970404.
# options passed: -msoft-float
# options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
# -fpcc-struct-return -fcommon -fverbose-asm -fgnu-linker -msoft-float
# -meb -mcpu=3000
gcc2_compiled.:
__gnu_compiled_c:
.text
.align 2
.globl main
.ent main
main:
.frame $fp,240,$31 # vars= 216, regs= 2/0, args= 16, extra= 0
.mask 0xc0000000,-4
.fmask 0x00000000,0
subu $sp,$sp,240
sw $31,236($sp)
sw $fp,232($sp)
move $fp,$sp
jal __main
li $2,5 # 0x00000005
sw $2,28($fp)
sw $0,24($fp)
$L2:
lw $2,24($fp)
lw $3,16($fp)
slt $2,$2,$3
bne $2,$0,$L5
j $L3
$L5:
.set noreorder
nop
.set reorder
sw $0,28($fp)
$L6:
lw $2,28($fp)
lw $3,20($fp)
slt $2,$2,$3
bne $2,$0,$L9
j $L4
$L9:
lw $2,28($fp)
move $3,$2
sll $2,$3,4
addu $4,$fp,16
addu $3,$2,$4
addu $2,$3,16
lw $3,24($fp)
lw $4,28($fp)
addu $3,$3,$4
sw $3,0($2)
$L8:
lw $2,28($fp)
addu $3,$2,1
sw $3,28($fp)
j $L6
$L7:
$L4:
lw $2,24($fp)
addu $3,$2,1
sw $3,24($fp)
j $L2
$L3:
$L1:
move $sp,$fp # sp not trusted here
lw $31,236($sp)
lw $fp,232($sp)
addu $sp,$sp,240
j $31
.end main

MIPS recursion call in loop, preserving loop variable

I'm converting the following recursive java program to MIPS asm. The algorithm computes all the possible ordering/permutations of the numbers. But the recursive call is in the for loop. I need to preserve the variable 'i' in my MIPS version but I don't know exactly where to add that. My algorithm is correct, it's just that my $t0 (which is 'i') never gets reset to 0. I just can't figure out how/where to preserve it on the stack or when to take it off the stack. Any help appreciated.
import java.util.Arrays;
public class Test
{
private static void swap(int[] v, int i, int j)
{
int t = v[i];
v[i] = v[j];
v[j] = t;
}
public void permute(int[] v, int n)
{
if (n == 1)
System.out.println(Arrays.toString(v));
else
{
for (int i = 0; i < n; i++)
{
permute(v, n-1);
if (n % 2 == 1)
swap(v, 0, n-1);
else
swap(v, i, n-1);
}
}
}
public static void main(String[] args)
{
int[] ns = {1, 2, 3, 4};
new Test().permute(ns, ns.length);
}
}
and the mips function
Note: I am permutating Strings, not integers but the algorithm is the same.
#----------------------------------------------
# anagram - Prints all the permutations of
# the given word
# a0 - the word to compute the anagrams
# s0 - n, the length of the word
# a1 - n - 1 (length-1)
#----------------------------------------------
anagram:
addi $sp, $sp, -16
sw $a0, 0($sp)
sw $a1, 4($sp)
sw $s0, 8($sp)
sw $ra, 12($sp)
add $s0, $a1, $zero # this is n
addi $a1, $s0, -1 # n-1
beq $s0, 1, printS
init: move $t0, $zero # t0 = i = 0
logic: slt $t1, $t0, $s0 # Set t1 = 1 if t0 < length
beqz $t1, endAnagram # if it's zero, it's the end of the loop
jal anagram
li $t2, 2
div $s0, $t2
mfhi $t3
beqz $t3, even # if even branch to even, otherwise it will go to odd
odd: # swap the n-1 char with the first
add $t4, $a0, $zero
add $t5, $a0, $a1
lb $t6, 0($t4) # first char
lb $t7, 0($t5) # n-1 char
sb $t7, 0($t4) # swap the two
sb $t6, 0($t5)
j inc # skip the even section
even: # swap the ith char with n-1 char
add $t4, $a0, $t0 # ith char
add $t5, $a0, $a1 # n-1 char
lb $t6, 0($t4) # ith char
lb $t7, 0($t5) # n-1 char
sb $t7, 0($t4) # swap the two
sb $t6, 0($t5)
inc: addi $t0, $t0, 1 # t0++;
j logic
endAnagram:
# reset stack pointers
lw $a0, 0($sp)
lw $a1, 4($sp)
lw $s0, 8($sp)
lw $ra, 12($sp)
addi $sp, $sp, 16 # adjust stack
jr $ra
printS: # print string and jump to return
jal printString # calls printString function which prints the string
j endAnagram
$t0 is not preserved accross subroutine calls according to convention, and you seem to follow that convention. As such, you have two choices:
you either store i in a register that is preserved, in which case
you need to preserve the register yourself in the prologue/epilogue. You already do this for $s0.
or you save $t0 yourself on the stack, around the subroutine call
In both cases, you will need additional space for your locals, so change addi $sp, $sp, -16 to addi $sp, $sp, -20 (along with the matching code in the epilogue too). If you choose option #1, use for example $s1 to store i. Add code to save and restore $s1 just like you do for $s0. If you choose option #2, add code around the jal anagram that writes $t0 to stack before the jal, and reloads it after.

Resources