MIPS assembly: Why can main exit without deallocating its stack space?

MIPS assembly: Why can main exit without deallocating its stack space? - c

I have a question for a university exercise that I don't understand. We have to translate from C to assembly MIPS. In the main I have to allocate 400 bytes for the a[100] vector, but in the solutions my professor is not deallocating it at the end of the function, why is this happening? are there cases in which I don't need to deallocate memory moving stack pointer?
Here's the code in C:
int idamax(int n, float * dx, int incx) {
float dmax;
int i, ix, itemp;
if (n < 1) return (-1);
if (n == 1) return (0);
if (incx != 1) {
ix = 1;
dmax = fabs(dx[0]);
ix = ix + incx;
for (i = 1; i < n; i++) {
if (dmax < fabs(dx[ix])) {
itemp = i;
dmax = fabs(dx[ix]);
}
ix = ix + incx;
}
} else {
itemp = 0;
dmax = fabs(dx[0]);
for (i = 1; i < n; i++) {
if (dmax < fabs(dx[i])) {
itemp = i;
dmax = fabs(dx[i]);
}
}
}
return (itemp);
}
int main() {
float a[100];
int l, k, n = 100, lda = 10;
for (k = 0; k < n; ++k) a[k] = (float)((k * k * k) % 100);
k = 4;
l = idamax(n - lda * k - k, &a[lda * k + k], 1) + k;
print_int(l);
exit;
}
Main assembly code:
main:
#______CALL_FRAME______
# 100 float: 400B
#______Totale 400B
addi $sp,$sp,-400
add $t9,$sp,$0 #&a
addi $t0, $0, 100 #n=100
addi $t1, $0, 10 #lda=10
#l in t2, k in t3
add $t3, $0, $0 #k=0
main_forini:
slt $t5,$t3,$t0 #k<?n
beq $t5,$0,main_forend
mult $t3, $t3 #k*k
mflo $t5
mult $t3, $t5
mflo $t5 #k*k*k
div $t5,$t0 #()%n
mfhi $t5
mtc1 $t5,$f0
cvt.s.w $f1,$f0 #(float)()
sll $t5,$t3,2 #k*4
add $t5,$t5,$t9 #&a[k]
swc1 $f1,0($t5) #a[k]=()
addi $t3, $t3, 1 #++k
j main_forini
main_forend:
addi $t3,$0,4 #k=4
mult $t1,$t3 #lda*k
mflo $t5
add $t5,$t5,$t3 #lda*k+k
sub $a0,$t0,$t5 #a0=n-lda*k-k
sll $t5,$t5,2
add $a1,$t5,$t9 #a1=&a[lda*k+k]
addi $a2,$0,1 #a2=1
jal idamax
addi $a0,$v0,4 #a0=l=retval+k
addi $v0,$0,1 #print_int
syscall
addi $v0,$0,10 #exit
syscall

Execution of main never reaches the bottom of the function so cleanup of the stack never needs to happen; exit() is a "noreturn" function.
If main did want to return with jr $ra instead of making an exit system call, you would need to restore the stack pointer along with other call-preserved registers. Otherwise you'd be violating the calling convention that main's caller expects main to follow.
(Updated since you added asm to the question that uses a MARS system call: that main is probably not a function if it's the top of your code: $ra isn't a valid return address on entry so it couldn't return. IMO don't call it main if it's not a function.)
The OS doesn't care where the user-space stack pointer is pointing when the process makes an exit system call, so there's no need for main to clean up before exiting.
(In a "normal" C implementation, the exit() function would compile to a jal exit or a simple tailcall j exit. But you're compiling by hand for the MARS simulator which has no C library, so you inline system calls instead of calling wrapper functions.)
Also note that ISO C exit(int) takes an arg, like MARS exit2 (syscall/$v0=17). In fact you didn't even call exit() as a function, you just wrote exit; in C which evaluates the exit as a function pointer without calling it or doing anything with that value.
Typically C main is called by CRT startup code that might for example run C library init functions and put argc and an argv[] pointer in the right registers. So main is usually not the actual process entry point from the OS, especially not in a hosted implementation. (i.e. compiled C programs run under an OS, rather than being their own kernel like a freestanding program.)
If you're just translating this for the MARS or SPIM simulators or something, then there is no C library or any code beyond what you write, so what you're writing is what would normally be called _start, not main.
In C main is a function, but in MARS you can't jr $ra from the top-level entry point so the entry point is not a function. Thus don't call it main.
In ISO C it's even legal for main to call itself recursively, or other functions to call main. That can only work if main truly is a function that cleans up the stack and returns properly. But that means it can't also be the process entry point that needs to make an exit system call. To run a program with a crazy recursive main that eventually does a C return statement (or falls off the end of main), main pretty much has to be compiled to a real function that can return with jr $ra. So it has to be a function that you jal main to from your _start entry point.

There are two possible answers here.
The first answer is that main is the first and last function of your program. The OS will clean up afterwards.
The second answer would be for other functions that use stack memory. Stack memory is generally freed by restoring the stack frame of the calling function (which main doesn't have, hence the exception).

Related

sum of N natural Number Using Recursion in c

#include<conio.h>
#include<math.h>
int sum(int n);
int main()
{
printf("sum is %d", sum(5));
return 0;
}
//recursive function
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n;
}
Here i didn't understand how this code works when n==1 becomes true,
How this code backtracks itself afterwards..?

The code needs a return statement in the case where n is not 1:
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n;
return sumN;
}
or more simply:
int sum(int n)
{
if(n==1)
{
return 1;
}
return n + sum(n-1);
}
How this code backtracks itself afterwards..?
When a function is called, the program saves information about hwo to get back to the calling context. When return statement is executed, the program uses the saved information to return to the calling context.
This is usually implemented via a hardware stack, a region of memory set aside for this purpose. There is a stack pointer that points to the active portion of the stack memory. When main calls sum(5), a return address into main is pushed onto the stack, and the stack pointer is adjusted to point to memory that is then used for the local variables in sum. When sum calls sum(n-1), which is sum(4), a return address into sum is pushed onto the stack, and the stack pointer is adjusted again. This continues for sum(3), sum(2), and sum(1). For sum(1), the function returns 1, and the saved return address is used to go back to the previous execution of sum, for sum(2), and the stack pointer is adjusted in the reverse direction. Then the returned value 1 is added to its n, and 3 is returned. The saved address is used to go back to the previous execution, and the stack pointer is again adjusted in the reverse direction. This continues until the original sum(5) is executing again. It returns 15 and uses the saved address to go back to main.

How this code backtracks itself afterwards..?
It doesn't certainly work.
Any success is due to undefined behavior (UB).
The biggest mistake is not compiling with a well enabled compiler.
int sum(int n)
{
if(n==1)
{
return 1;
}
int sumNm1=sum(n-1); //sum of 1 to n
int sumN=sumNm1+n; // !! What, no warning?
} // !! What, no warning?
A well enabled compiler generates warnings something like the below.
warning: unused variable 'sumN' [-Wunused-variable]
warning: control reaches end of non-void function [-Wreturn-type]
Save time and enable all compiler warnings. You get faster feedback to code troubles than posting on SO.
int sumN=sumNm1+n;
return sumN; // Add
}

Like pointed in comments, the problem is that you don't return the value you compute from within the function (Undefined Behavior). You calculate it correctly (but in a clumsy way, using 2 unneeded variables). If you add a return sumN; statement at the end of the function, things should be fine.
Also, the type chosen for the return value is not the best one. You should choose:
An unsigned type (as we are talking about natural numbers), otherwise half of its interval would be simply wasted (on negative values which won't be used)
One that's as large as possible (uint64_t). Note that this only allows larger values to be computed, but does not eliminate the possibility of an overflow, so you should also be careful when choosing the input type (uint32_t)
More details on recursion: [Wikipedia]: Recursion (it also contains an example very close to yours: factorial).
Example:
main00.c:
#include <stdint.h>
#include <stdio.h>
#if defined(_WIN32)
# define PC064U_FMT "%llu"
# define PC064UX_FMT "0x%016llX"
#else
# define PC064U_FMT "%lu"
# define PC064UX_FMT "0x%016lX"
#endif
uint64_t sum(uint32_t n) // Just 3 lines of code
{
if (n < 2)
return n;
return n + sum(n - 1);
}
uint64_t sum_gauss(uint32_t n)
{
if (n == (uint32_t)-1)
return (uint64_t)(n - 1) / 2 * n + n;
return n % 2 ? (uint64_t)(n + 1) / 2 * n : (uint64_t)n / 2 * (n + 1);
}
uint64_t sum_acc(uint32_t n, uint64_t acc)
{
if (n == 0)
return acc;
return sum_acc(n - 1, acc + n);
}
int main()
{
uint32_t numbers[] = { 0, 1, 2, 3, 5, 10, 254, 255, 1000, 100000, (uint32_t)-2, (uint32_t)-1 };
for (size_t i = 0; i < sizeof(numbers) / sizeof(numbers[0]); ++i) {
uint64_t res = sum_gauss(numbers[i]);
printf("\nsum_gauss(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
res = sum_acc(numbers[i], 0);
printf(" sum_acc(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
res = sum(numbers[i]);
printf(" sum(%u): "PC064U_FMT" ("PC064UX_FMT")\n", numbers[i], res, res);
}
printf("\nDone.\n\n");
return 0;
}
Notes:
I added Gauss's formula (sum_gauss) to calculate the same thing using just simple arithmetic operations (and thus is waaay faster)
Another thing about recursion: although it's a nice technique (very useful for learning), it's not so practical (because each function call eats up stack), and if function calls itself many times, the stack will eventually run out (StackOverflow). A recurrent call can be worked around that using an optimization - with the help of an accumulator (check [Wikipedia]: Tail call or [SO]: What is tail call optimization?). I added sum_acc to illustrate this
Didn't consider necessary to also add the iterative variant (as it would only be a simple for loop)
Output:
(qaic-env) [cfati#cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q074798666]> ~/sopr.sh
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
[064bit prompt]> ls
main00.c vs2022
[064bit prompt]> gcc -O2 -o exe main00.c
[064bit prompt]> ./exe
sum_gauss(0): 0 (0x0000000000000000)
sum_acc(0): 0 (0x0000000000000000)
sum(0): 0 (0x0000000000000000)
sum_gauss(1): 1 (0x0000000000000001)
sum_acc(1): 1 (0x0000000000000001)
sum(1): 1 (0x0000000000000001)
sum_gauss(2): 3 (0x0000000000000003)
sum_acc(2): 3 (0x0000000000000003)
sum(2): 3 (0x0000000000000003)
sum_gauss(3): 6 (0x0000000000000006)
sum_acc(3): 6 (0x0000000000000006)
sum(3): 6 (0x0000000000000006)
sum_gauss(5): 15 (0x000000000000000F)
sum_acc(5): 15 (0x000000000000000F)
sum(5): 15 (0x000000000000000F)
sum_gauss(10): 55 (0x0000000000000037)
sum_acc(10): 55 (0x0000000000000037)
sum(10): 55 (0x0000000000000037)
sum_gauss(254): 32385 (0x0000000000007E81)
sum_acc(254): 32385 (0x0000000000007E81)
sum(254): 32385 (0x0000000000007E81)
sum_gauss(255): 32640 (0x0000000000007F80)
sum_acc(255): 32640 (0x0000000000007F80)
sum(255): 32640 (0x0000000000007F80)
sum_gauss(1000): 500500 (0x000000000007A314)
sum_acc(1000): 500500 (0x000000000007A314)
sum(1000): 500500 (0x000000000007A314)
sum_gauss(100000): 5000050000 (0x000000012A06B550)
sum_acc(100000): 5000050000 (0x000000012A06B550)
sum(100000): 5000050000 (0x000000012A06B550)
sum_gauss(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum_acc(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum(4294967294): 9223372030412324865 (0x7FFFFFFE80000001)
sum_gauss(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
sum_acc(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
sum(4294967295): 9223372034707292160 (0x7FFFFFFF80000000)
Done.
As seen in the image above, the simple implementation (sum) failed while the other 2 passed (for a certain (big) input value). Not sure though why it didn't also fail on Linux (WSL), most likely one of the optimizations (from -O2) enabled tail-end-recursion (or increased the stack?).

If I understand your question correctly, you're more interested in how recursion actually works, than in the error produced by the missing return statement (see any of the other answers).
So here's my personal guide to understanding recurive functions.
If you know about Mathematical Induction, this might help understand how recursion works (a least it did for me). You prove a base case(, make an assumption about a fixed value) and prove the statement for a following number. In programming we do a very similar thing.
Firstly, identify your base cases, i.e. some input to the function that you know what the output is. In your example this is
if(n==1)
{
return 1;
}
Now, we need to find a way to compute the value for any given input from "smaller" inputs; in this case sum(n) = sum(n-1) +n.
How does backtracking work after the base case has been reached?
To understand this, picture the function call sum(2).
We first find that 2 does not match our base case, so we recursively call the function with sum(2-1). You can imagine this recursive call as the function called with sum(2) halting until sum(1) has returned a result. Now sum(1) is the "active" function, and we find that it matches our base case, so we return 1. This is now returned to where sum(2) has waited for the result, and this function now can compute 2 + sum(1), because we got the result from the recursive call.
This goes on like this for every recursive call, that is made.
Interested in a bit more low-level explanation?
In assembly (MIPS), your code would look something like this:
sum:
addi $t1, $0, 1 # store '1' in $t0
beq $a0, $t0, base # IF n == 1: GOTO base
# ELSE:
# prepare for recursive call:
sw $a0, 4($sp) # write n on top of the stack
sw %ra, 8($sp) # write the line of the caller on top of stack
addi $sp, $sp, 8 # advance stack pointer
addi $a0, $a0, -1 # n = n-1
jal sum # call sum with reduced n
# this is executed AFTER the recursive call
addi $sp, $sp, -8 # reset stack pointer
lw %ra, 8($sp) # load where to exit the function
lw %a0, 4($sp) # load the original n this instance of sum got
add %v0, %a0, %v0 # add our n to the result of sum(n-1)
jr %ra # return to wherever sum() was called
base: # this is only executed when base case is reached
add %v0, %0, %t1 # write '1' as return statement of base case
jr %ra # reutrn to caller
Anytime the recursive function is called, we temporarily store the argument the current function got ($a0) and the calling function ($ra) on the stack. That's basically a LIFO storage, and we can access the top of it using the stack pointer $sp. So when we enter recursion, we want to make room on the stack for whatever we need to store there by advancing the stack pointer(addi $sp, $sp, 8); we can now store whatever we need there.
When this is done, we manipulate the argument we got (function arguments are always stored in $a0 in MIPS so we need to overwrite the argument we got). We write n-1 as argument for our recursive call and proceed to 'jump and lin' (jal) to the beginning of the function. This jumps to the provided label (start of our function) and saves the current line of code in $ra so we can return here after the function call. For every recursive call we make, the stack grows, because we store our data there, so we need to remember to reset it lateron.
Once a function call gets the argument 1, the programm jumps to base, we can simply write 1 into the designated return register ($v0), and jump back to the line of code we were called from.
This is the line where we used jal to jump back to the beginning. Since the called function provided the result of the base case in $v0,we can simply add our argument to $v0and return. However we first need to recover the argument and the return address from the stack. We also decrement the stack pointer, so that it is in the exact position where it was when the function was called. Therefore all recursive calls work together to compute the overall result; every idividual call has it's own storage on the stack, but it also ensures to tidy up before exiting, so that all the other calls can access their respective data.
The takeaway is: When calling a function recursively, execution jumps back to the beginning of the function with altered arguments. However, every individual function call handles their own set of variables (temporarily store on the stack). After a recursive call returns a value, the next most-inner recursive call becomes active, re-loads all their variables and computes the next result.

If this program were implemented correctly, it would work like this: When n is 1, the function returns 1. When n is 2, the function calls itself for n is 1, so it gets 1, and then adds n (i.e., 2) to it. When n is 3, the function calls itself for n is 2, so it gets 3, and then adds n (i.e., 3) to it. And so on.

Why does the following loop unrolling lead to a wrong result?

I am currently trying to optimize some MIPS assembler that I've written for a program that triangulates a 24x24 matrix. My current goal is to utilize delayed branching and manual loop unrolling to try and cut down on the cycles. Note: I am using 32-bit single precision for all the matrix arithmetic.
Part of the algorithm involves the following loop that I'm trying to unroll (N will always be 24)
...
float inv = 1/A[k][k]
for (j = k + 1; j < N; j++) {
/* divide by pivot element */
A[k][j] = A[k][j] * inv;
}
...
I want
...
float inv = 1/A[k][k]
for (j = k + 1; j < N; j +=2) {
/* divide by pivot element */
A[k][j] = A[k][j] * inv;
A[k][j + 1] = A[k][j + 1] * inv;
}
...
but it generates the incorrect result and I don't know why. The interesting thing is that the version with loop unrolling generates the first row of matrix correctly but the remaining ones incorrect. The version without loop unrolling correctly triangulates the matrix.
Here is my attempt at doing it.
...
# No loop unrolling
loop_2:
move $a3, $t2 # column number b = j (getelem A[k][j])
jal getelem # Addr of A[k][j] in $v0 and val in $f0
addiu $t2, $t2, 1 ## j += 2
mul.s $f0, $f0, $f2 # Perform A[k][j] * inv
bltu $t2, 24, loop_2 # if j < N, jump to loop_2
swc1 $f0, 0($v0) ## Perform A[k][j] := A[k][j] * inv
# The matrix triangulates without problem with this original code.
...
...
# One loop unrolling
loop_2:
move $a3, $t2 # column number b = j (getelem A[k][j])
jal getelem # Addr of A[k][j] in $v0 and val in $f0
addiu $t2, $t2, 2 ## j += 2
lwc1 $f1, 4($v0) # $f1 <- A[k][j + 1]
mul.s $f0, $f0, $f2 # Perform A[k][j] * inv
mul.s $f1, $f1, $f2 # Perform A[k][j+1] * inv
swc1 $f0, 0($v0) # Perform A[k][j] := A[k][j] * inv
bltu $t2, 24, loop_2 # if j < N, jump to loop_2
swc1 $f1, 4($v0) ## Perform A[k][j + 1] := A[k][j + 1] * inv
# The first row in the resulting matrix is correct, but the remaining ones not when using this once unrolled loop code.
...

The unrolled C loop condition is buggy.
j < N; j +=2 can start the loop body with j = N-1,
accessing the array at A[k][N-1] (fine) and A[k][N] (not fine).
One common method is j < N-1, or in general j < N-(unroll-1). But for unsigned N, you also have to separately check N >= unroll before starting the loop, because N-1 could wrap to a huge unsigned value.
Keeping the j < limit is generally good for C compilers vs. j + 1 < N which is a separate thing they'd have to calculate. And can also stop a compiler from proving that the loop isn't infinite for unsigned counts (like size_t), because that's well-defined as wrapping around, so N = UINT_MAX could lead to the condition always being true depending on the starting point. (e.g. j = UINT_MAX-2 makes UINT_MAX-1 < UINT_MAX, and j+=2 makes 0 < UINT_MAX, also true.) So it's a similar problem to using j <= limit for unsigned counters. Compilers really like to know when a loop is potentially infinite. For some, that it disables auto-vectorization if the trip-count isn't calculable ahead of the first iteration.
If j was starting at 0, you can get away with a sloppy condition if N was guaranteed to be a multiple of the unroll factor. But here it's different, as Nate points out.
efficiency of your MIPS asm
generally the point of loop unrolling is performance. A non-inline call to a helper function inside the loop is kind of defeating the purpose.
jal getelem I assume does a bunch of multiplies and stuff to redo the indexing with a pointer and two integers? Notice that you're scanning along contiguous memory in one row, so you can just increment a pointer.
Calculate an end-pointer to compare against, so your MIPS loop can look like
# some checking outside the loop, maybe with a bxx to the end of it.
looptop: # do{
lwc1 $f2, 0($t0)
lwc1 $f3, 4($t0)
addiu $t0, $t0, 4*2 # p+=2 advance by 8 bytes, 2 floats
...
swc1 something, 0($t0)
swc1 something, 4($t0)
bne $t0, $t1 # }while(p!=endp)
# maybe another condition to check if you should run one last iteration.
MIPS bltu is only a pseudo-instruction (sltu/bnez); that's why it's better to calculate an exact end-pointer so you can use a single machine instruction as the loop branch.
And yes, this might mean rounding the iteration count down to a multiple of 2 to ensure correctness. Or doing a scalar iteration and rounding up to a multiple of 2. e.g. x++ / x&=-2;
With software pipelining, e.g. doing a load and divide but not a store yet, you could maybe let the rounding-up have the loop redo that element if odd. (If the chance of a branch mispredict costs more than an FP multiply and a redundant store.) Haven't fully thought this through, but it's a similar idea to SIMD doing a first unaligned vector, then a potentially-partially-overlapping aligned vector. (SIMD vectorization is like unrolling, but then you roll back up into a single instruction that does 4 elements, for example.)

Recursive Fibonacci MIPS

I started to read MIPS to understand better how my C++ and C code works under the computer skin. I started with a recursive function, a Fibonacci function.
The C code is:
int fib(int n) {
if(n == 0) { return 0; }
if(n == 1) { return 1; }
return (fib(n - 1) + fib(n - 2));
}
MIPS code:
fib:
addi $sp, $sp, -12
sw $ra, 8($sp)
sw $s0, 4($sp)
addi $v0, $zero, $zero
beq $a0, $zero, end
addiu $v0, $zero, 1
addiu $t0, $zero, 1
beq $a0, $t0, end
addiu $a0, $a0, -1
sw $a0, 0($sp)
jal fib #fib(n-1)
addi $s0, $v0, $zero
lw $a0, 0($sp)
addiu $a0, $a0, -1
jal fib #fib(n-2)
add $v0, $v0, $s0
end:
lw $s0, 4($sp)
lw $ra, 8($sp)
addi $sp, $sp, 12
jr $ra
When n>1 it goes until the code reaches the first jal instruction. What happens next? it return to fib label ignoring the code below (the fib(n-2) call will never be executed?)? If that happens, the $sp pointer decreases 3 words again and the cycle will go until n<=1. I can't understand how this works when first jal instruction is reached.

Can you follow how the recursion works in C?
In some sense, recursion has two components: the forward part and the backward part.  In the forward part, a recursive algorithm computes things before the recursion, and in the backward part, a recursive algorithm computes things after the recursion completes.  In between the two parts, there is the recursion.
See this answer: https://stackoverflow.com/a/71551098/471129
Fibonacci is just slightly more complicated as it performs recursion twice, not just once as in the above list printing example.
However, the principles are the same:  There is work done before the recursion, and work done after (either of which can be degenerate).  The before part happens as code in front of the recursion executes, and the recursion builds up stack frames that are placeholders for work after the recursion yet to be completed.  The after part happens as the stack frames are released and the code after the recursive call is executed.
In any given call chain, the forward part goes until n is 0 or 1, then the algorithm starts returning back to the stacked callers, for whom the backward part kicks in unwinding stack frames until it returns to the original caller (perhaps main) rather than to some recursive fib caller.&npsp; Again, complicated by use of two recursive invocations rather than one as in simpler examples.
With fib, the work done before is to count down (by -1 or -2) until reaching 0 or 1.  The work done after the recursion is to sum the two prior results.  The recursion itself effectively suspends an invocation or activation of fib with current values, to be resumed when a recursive call completes.
Recursion in MIPS algorithm is the same; however, function operations are spread out over several machine code instructions that are implicit in C.
Suggest single stepping over a call to fib(2) as a very small example that may help you see what's going on there.  Suggest first doing this in C — single step until the outer fib call has full completed and returned to the calling test function (e.g. main).
To make the C version just a bit easier to view in the debugger you might use this version:
int fib(int n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
int fm1 = fib(n-1);
int fm2 = fib(n-2);
int result = fm1 + fm2;
return result;
}
With that equivalent C version, you'll be able to inspect fm1, fm2, and result during single stepping.  That will make it easier to follow.
Next, do the same in the assembly version.  Debug single step to watch execution of fib(2), and draw parallels with the equivalents in C.
There's another way to think about recursion, which is ignore the recursion, pretending that the recursive call is to some unrelated function implementation that just happens to yield the proper results of the recursive function; here's such a non-recursive function:
int fib(int n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
int fm1 = fibX(n-1); // calls something else that computes fib(n-1)
int fm2 = fibX(n-2); // "
int result = fm1 + fm2;
return result;
}
With this code, and the assumption that fibX simply works correctly to return proper results, you can focus strictly on the logic of one level, namely, the body of this fib, without considering the recursion at all.
Note that we can do the same in assembly language — though the opportunities for errors / typos are always much larger than in the C, since you still have to manipulate stack frames and preserve critical storage for later use after the calling.
The code you've posted has a transcription error, making it different from the C version.  It is doing the C equivalent of:
return fib(n-1) + fib(n-1);

C to MIPS Code translation problem with nested procedure

I need to tranlate piece of C code
int main(){
int a, b, result;
if(a == b)
result = a*b;
else
result = assess(a, b);
return result;
}
int assess(int a, int b){
if(b<a)
return upgrade(a, b);
else
return demote(a, b);
}
int upgrade(int a, int b)
{return 4*(a+b);}
int demote(int a, int b)
{return 4*(b-a);}
a and b will be tested for a=8 b=8 a=3 b=5 a=5 b=3
here is what i tried
.text
main:
add $s0,$s0,5
add $s1,$s1,3
add $s3,$s3,0
beq $s0,$s1,Resultmul
bne $s0,$s1,assess
li $v0, 10
syscall
assess:
addi $sp,$sp,-8
sw $s3,0($sp)
sw $ra,4($sp)
jal upgrade
lw $ra,4($sp)
add $sp,$sp,4
jr $ra
Resultmul :
mul $s3,$s1,$s0
li $v0, 10
syscall
upgrade:
add $s3,$s0,$s1
mul $s3,$s3,4
jr $ra
demote:
sub $v0,$s1,$s0
mul $v0,$v0,4
jr $ra
But it gets stuck in jr $ra in the assess procedure can someone fix this issue that would be great.

You are branching to assess instead of calling it like a function via jal. Thus, there is no proper value in $ra upon the entry to assess for it use upon completion to return to main.
You are (almost) properly saving $ra and restoring it later, but it never had a good value in the first place, so the save & restore (which will be needed) doesn't help yet.
You should pop as many bytes off the stack as you push — you're pushing 8 but popping only 4.
You are also not restoring $s3 though you do save it.
You might consider $ra as a parameter passed to a function, and inspect its value upon function entry and during function execution to see where becomes incorrect. The value passed to the callee should be the address of the return point in the caller — a code address.

Using Nested For Loops and an Array in MIPS

This is a homework assignment, I've written the whole program myself, run through it in the debugger, and everything plays out the way I mean it to EXCEPT for this line:
sw $t1, counter($a3)
The assignment is to convert this snippet of C code to MIPS
for(i = 0; i < a; i++) {
for(j = 0; j < b; j++) {
C[2 * i] = i – j; } }
All the registers change values the way they should in my program except for $a3 - It never changes.
Changes: An array needed to be declared and "pointed to" by a register and a label can't be used for an offset in the manner I started with
EDIT: Here's the finished, working code

Recap answer from the comments
Your $a3 register, is supposed to be loaded with the address of an array defined in the .data section.
One big problem with your code is how you constructed your loops. The best way is to translate your loops step by step, and one loop at a time. Also, remember that :
for( i = 0; i < a; i++ )
{
loop_content;
}
Is equivalent to :
i = 0;
while( i < a )
{
loop_content;
i++;
}
Which is easier to translate in assembly. The condition just have to be negated, has you need an "exit" condition, and not a "continue" condition as in a while loop. Your code will be much clearer and easier to understand (and less error prone).
Your "out of range" error comes from here : sw $t1, counter($a3). Here counter is a label, and therefore an address. Thus counter($a3) is doing "$a3 (=0x10010008) + address of counter (=0x100100f8)", giving 0x20020100, which is clearly not what you want (and non-sense).
Oh, and in the sw $r, offset($a) MIPS instruction, offset MUST be a 16-bit CONSTANT. Here, you use a 32-bit address, but it's just that the assembler kindly translate sw $t1, counter($a3) to $x = $a3 + counter; sw $t1, 0($x), which is why you may see a sw with 0 as offset.