I am attempting to create an algorithm that finds the trace of an n-by-n square matrix A.the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A.The main idea involved is that at this level multi-dimensional arrays are stored as one-dimensional arrays, and the multi-dimensional indexing (for a matrix with m rows and n columns) is converted to one-dimensional indexing.As I'am unfamiliar with mips attempts to integrate it into code are unsuccessful my latest attempt below.
I have set the registers to the following:
$a0 = base address of array (matrix), a
$a1 = n, number of rows and columns
$v0 = trace
$t0 = i
trace: move $v0, $zero
move $t0, $zero
bne $t0,$a1,end
sll $t1,$a0,4
add $t1,$t1,$t0
sll $t1,$t1,2
add $t2,$t1,$a0
lw $t0,0($t1)
addi $sp, $sp, 8
sw $t1,0($t0)
j trace
end: jr $ra
but to no avail the answer does not come out as desired the format of the algorithm should be as follows;
trace = 0
for i = 0 to n-1
trace = trace + a[i,i]
end for
I have added some comments indicating suspicious behaviour
trace: move $v0, $zero
move $t0, $zero
loop: bne $t0,$a1,end
sll $t1,$a0,4 ; t1 = a0 * 16, a0 is the base address, should probably be $t0
add $t1,$t1,$t0
sll $t1,$t1,2
add $t2,$t1,$a0
lw $t0,0($t1)
addi $sp, $sp, 8 ; What are you doing with $sp? perhaps this should be add $v0,$v0,$t0
sw $t1,0($t0) ; What are you storing here?
j trace ; This should probably jump to loop, or the code will never end
end: jr $ra
Also note that the sll assumes that the size of your matrix is 16.
Related
I am doing a homework problem for the computer design and architecture class where we are required to implement a simple for loop in C, into MIPS using the MARSim. I implemented the for loop step by step, initialized (if that is the proper term) the variables in memory, yet when I assemble and run it throws this error: question1.asm line 11: Runtime exception at 0x00400008: address out of range 0x00000000
I've looked at line 11 : lw $t1, 0($a1),
and as I understand it, this should work properly. As I understand here we are setting our t1 value to a1 (b[i]).
Here is the C we have to reproduce:
for (i=0; i<=100; i++) { a[i] = b[i] + C ; }
Here is my attempt:
# t0 = i
# t1 = b[i]
# t2 = a[i]
# t3 = 101 (the end value of i)
# s0 = c
# $a0 = a
# $a1 = b
begin:
addi $t3, $zero, 101 #loop terminate value
add $t0, $zero, $zero # set our counter to zero
loop: lw $t1, 0($a1) # set t1 to b[i]
add $t1, $t1, $s0 # B[i] + c
sw $t1, 0($a0) #store $t1 into a[i]
addi $t0, $t0, 1 # loop increment
addi $a0,$a0,4 # increment a0 to point to the next block in the array (4 bits)
addi $a1, $a1, 4 # likewise with b[i]
beq $t0, $s0, finish
finish:
The MIPS code I've written, should mimic the action of the C code w/o error. However on the first iteration of the loop it states it is accessing an out of range address 0x00000000. Could someone shine some light on what I am doing wrong? I would really appreciate a thorough explanation so I can understand this better for my class. Thanks for your assistance and much love.
Cheers.
How do I generate a function which takes elements from two vectors (while configuring stack properly) and passing the operands via stack to a FUNCTION? I defined my vectors in data portion of my code.
I don't know how to take the elements (1, 2, 3, 4, 5, 6) from the their vector and passing the operands via stack? Does anyone have any programming insight on this topic? I know an example of pushing a stack is
Sub sp, sp, 0
Li s0, 0x12345678
Sw s0, 0(sp)
And an example of popping a stack is
Lw s0, 0(sp)
Add sp, sp, 0
The part I'm confused at is implement my array elements while passing onto the stack. We were told specifically not to use registers to pass the parameters. Or else I would just itemize each number into a register and try and pass them that way. This is the second time that I am taking the class. I have the textbook. I have seen mips run. I have watched tutorials on mips via youtube. (#sys call is a term for just do it. #mad props to that guy) I guess what I am trying to say is that any helpful input would be appreciated.
I rewrote my code so its more readable for others.
.globl main #tells assembler that there is a global routine called main
.data
array1: .word 2, 4, 6, 8, 10, -12 #define array 1(6x1 vector)
array2: .word -1, 3, 5, 7, 9, 11 #define array 2(6x1 vector)
.text
main: # I am trying to configure the stack properly and pass parameters
# (vector elements) to dot_product function
addiu $sp, $sp, -32 # Allocating space on the stack.
# I need to pass two operands via stack onto num mult
# and multiply them together
sw $t1, array1+0x00($sp) #placing first element of array at Mem[Sp]
sw $t2, array1+0x04($sp) #placing second element of array at Mem[Sp+4]
sw $t3, array1+0x08($sp) #placing third element of array at Mem[Sp+8]
sw $t4, array1+0x0C($sp) #placing fourth element of array at Mem[Sp+12]
sw $t5, array1+0x10($sp) #placing fifth element of array at Mem[Sp+16]
sw $t6, array1+0x04($sp) #placing sixth element of array at Mem[Sp+20]
sw $ra, 24($sp) #saving return address
#Should I have an equivalent amount of lw so I
#de-allocate space on the stack? If so, then what
#is the purpose of allocating space on the stack
# in the first place. I don't see the point.
nop
jal num_mult #Call num_mult Function. I don't know why this is here. I just
# placed so others knew that I was inept at programming.
nop
dot_product: # takes elements from each vector and after configuring stack
# appropriately passes two operands via stack to num_mult
# calculate sum of RETURNED values from each call to num_mult
addi $t7, $v0, $v1 # $v0 and $v1 are return values from a function call
# This is my running sum calculator
mov $v0, $t7 # This returns the sum in $v0
addi $v1, $zero, 0 # Places 0 in $v1
bgt $t0, 32767, else # dot_product tests checks for overflow
addi $v0, $zero, 0 #if overflow, then dot_product will return zero in $v0
addi $v1, $zero, -1 #reg and -1 in $v1 reg
else:
addi $v0, 0, $t0 # if no overflow then dot_product will return product
addi $v1, 0, -1 # in $v0 reg and zero in $v1 reg
#dot_product should check for overflow in running sum calculation
#process. If overflow in running sum, dot_product should return zero
#too.
num_mult:
mult $a0, $a1 # takes two operands passed via stack and multiplies them
# together
mflo $t0
bgt $t0, 32767, else1 # num_mult tests result to determine if product can
# be accurately represented in 32 bits
addi $v0, $zero, 0 # if overflow, then num_mult will return zero in $v0
addi $v1, $zero, -1 # reg and -1 in $v1 reg
else1:
addi $v0, 0, $t0 # if no overflow then num_mult will return product in
addi $v1, 0, -1 # $v0 reg and zero in $v1 reg
end:
nop
b end #teacher asked for a 'spin' loop but this is all I have.
nop
The biggest problem that I have is understanding: taking elements from each vector and after configuring stack appropriately passes two operands via stack to num_mult & calculating sum of RETURNED values from each call to num_mult.
I will through it in the compiler tomorrow. I figure that I hang my dirty laundry up so others could eagerly make their comments. Thanks in advance
Just started learning assembly for one of my classes and I am a bit confused over this code segment. This is from a textbook question which asks you to translate from MIPS instructions to C. The rest of the question is in the attached image.
For the MIPS assembly instructions above, what is the corresponding
C statement? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively.
sll $t0, $s0, 2 # $t0 = f * 4
add $t0, $s6, $t0 # $t0 = &A[f]
sll $t1, $s1, 2 # $t1 = g * 4
add $t0, $s6, $t0 # $t1 = &B[g]
lw $s0, 0($t0) # f = A[f]
addi $t2, $t0, 4
lw $t0, 0($t2)
add $t0, $t0, $s0
sw $t0, 0($t1)
I have a basic understanding of some MIPS instructions but frankly, the stuff with arrays confuses me a bit. Could anyone here point me in the right direction? Thanks!
It's been a while since I last wrote MIPS assembly. However, from what I can understand from the first few instructions:
sll $t0, $s0, 2 # t0 = 4 * f
add $t0, $s6, $t0 # t0 = &A[f]
s0 holds the index f at which you want to access array A. Since you multiply f by 4, A is an array of some datatype with 4 bytes length (probably an int). s6 is holding the array address, because to access the address of A[f] you would essentially do (in pseudocode)
address_of_A[f] = base_address_of(A) + offset_of_type_int(f)
The same stuff in principle happens in the next 2 instructions, but this time for array B. After that:
lw $s0, 0($t0) # f = A[f]
addi $t2, $t0, 4 # t2 = t0 + 4
The first load is obvious, s0 gets the value at address t0, which is of course A[f]. The second increments t0 by 4 and stores it to t2, which means that t2 now contains the address &A[f+1], since we know that array A contains 4-byte data.
The last lw command:
lw $t0, 0($t2)
stores the value at address $t2 on $t0, so $t0 is now A[f+1].
I'm Trying to convert this C code to MIPS assembly and I am unsure if it is correct. Can someone help me? Please
Question : Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D
C Code :
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
My Attempt at MIPS ASSEMBLY
add $t0, $t0, $zero # i = 0
add $t1, $t1, $zero # j = 0
L1 : slt $t2, $t0, $s0 # i<a
beq $t2, $zero, EXIT # if $t2 == 0, Exit
add $t1, $zero, $zero # j=0
addi $t0, $t0, 1 # i ++
L2 : slt $t3, $t1, $s1 # j<b
beq $t3, $zero, L1, # if $t3 == 0, goto L1
add $t4, $t0, $t1 # $t4 = i+j
muli $t5, $t1, 4 # $t5 = $t1 * 4
sll $t5, $t5, 2 # $t5 << 2
add $t5, $t5, $s2 # D + $t5
sw $t4, $t5($s2) # store word $t4 in addr $t5(D)
addi $t0, $t1, 1 # j ++
j L2 # goto L2
EXIT :
add $t0, $t0, $zero # i = 0 Nope, that leaves $t0 unmodified, holding whatever garbage it did before. Perhaps you meant to use addi $t0, $zero, 0?
Also, MIPS doesn't have 2-register addressing modes (for integer load/store), only 16-bit-constant ($reg). $t5($s2) isn't legal. You need a separate addu instruction, or better a pointer-increment.
(You should use addu instead of add for pointer math; it's not an error if address calculation crosses from the low half to high half of address space.)
In C, it's undefined behaviour for another thread to be reading an object while you're writing it, so we can optimize away the actual looping of the outer loop. Unless the type of D is _Atomic int *D or volatile int *D, but that isn't specified in the question.
The inner loop writes the same elements every time regardless of the outer loop counter, so we can optimize away the outer loop and only do the final outer iteration, with i = a-1. Unless a <= 0, then we must skip the outer loop body, i.e. do nothing.
Optimizing away all but the last store to every location is called "dead store elimination". The stores in earlier outer-loop iterations are "dead" because they're overwritten with nothing reading their value.
You normally want to put the loop condition at the bottom of the loop, so the loop branch is a bne $t0, $t1, top_of_loop for example. (MIPS has bne as a native hardware instruction; blt is only a pseudo-instruction unless the 2nd register is $zero.) So we want to optimize j<b to j!=b because we know we're counting upward.
Put a conditional branch before the loop to check if it might need to run zero times. e.g. blez $s0, after_loop to skip the inner loop body if b <= 0.
An idiomatic for(i=0 ; i<a ; i++) loop in asm looks like this in C (or some variation on this).
if(a<=0) goto end_of_loop;
int i=0;
do{ ... }while(++i != a);
Or if i isn't used inside the loop, then i=a and do{}while(--i). (i.e. add -1 and use bnez). Although MIPS can branch just as efficiently on i!=a as it can on i!=0, unlike most architectures with a FLAGS register where counting down saves a compare instruction.
D[4*j] means we stride by 16 bytes in a word array. Separately using a multiply by 4 and a shift by 2 is crazy redundant. Just keep a pointer in a separate register an increment it by 16 every iteration, like a C compiler would.
We don't know the type of D, or any of the other variables for that matter. If any of them are narrow unsigned integers, we might need to implement 8 or 16-bit truncation/wrapping.
But your implementation assumes they're all int or unsigned, so let's do that.
I'm assuming a MIPS without branch-delay slots, like MARS simulates by default.
i+j starts out (with j=0) as a-1 on the last outer-loop iteration that sets the final value. It runs up to j=b-1, so the max value is a-1 + b-1.
Simplifying the problem down to the values we need to store, and the locations we need to store them in, before writing any asm, means the asm we do write is a lot simpler and easier to debug.
You could check the validity of most of these transformations by doing them in C source and checking with a unit test in C.
# int a: $s0
# int b: $s1
# int *D: $s2
# Pointer to D[4*j] : $t0
# int i+j : $t1
# int a-1 + b : $t2 loop bound
blez $s0, EXIT # if(a<=0) goto EXIT
blez $s1, EXIT # if(b<=0) goto EXIT
# now we know both a and b loops run at least once so there's work to do
addiu $t1, $s0, -1 # tmp = a-1 // addu because the C source doesn't do this operation, so we must not fault on signed overflow here. Although that's impossible because we already excluded negatives
addu $t2, $t1, $s1 # tmp_end = a-1 + b // one past the max we store
add $t0, $s2, $zero # p = D // to avoid destroying the D pointer? Otherwise increment it.
inner: # do {
sw $t1, ($t0) # tmp = i+j
addiu $t1, $t1, 1 # tmp++;
addiu $t0, $t0, 16 # 4*sizeof(*D) # could go in the branch-delay slot
bne $t1, $t2, inner # }while(tmp != tmp_end)
EXIT:
We could have done the increment first, before the store, and used a-2 and a+b-2 as the initializer for tmp and tmp_end. On some real pipelined/superscalar MIPS CPUs, that might be better to avoid putting the increment right before the bne that reads it. (After moving the pointer-increment into the branch-delay slot). Of course you'd actually unroll to save work, e.g. using sw $t1, 16($t0) and 32($t0) / 48($t0).
Again on a real MIPS with branch delays, you'd move some of the init of $t0..2 to fill the branch delay slots from the early-out blez instructions, because they couldn't be adjacent.
So as you can see, your version was over-complicated to say the least. Nothing in the question said we have to transliterate each C expression to asm separately, and the whole point of C is the "as-if" rule that allows optimizations like this.
This similar C code compiles and translates to MIPS:
#include <stdio.h>
main()
{
int a,b,i,j=5;
int D[50];
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
}
Result:
.file 1 "Ccode.c"
# -G value = 8, Cpu = 3000, ISA = 1
# GNU C version cygnus-2.7.2-970404 (mips-mips-ecoff) compiled by GNU C version cygnus-2.7.2-970404.
# options passed: -msoft-float
# options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
# -fpcc-struct-return -fcommon -fverbose-asm -fgnu-linker -msoft-float
# -meb -mcpu=3000
gcc2_compiled.:
__gnu_compiled_c:
.text
.align 2
.globl main
.ent main
main:
.frame $fp,240,$31 # vars= 216, regs= 2/0, args= 16, extra= 0
.mask 0xc0000000,-4
.fmask 0x00000000,0
subu $sp,$sp,240
sw $31,236($sp)
sw $fp,232($sp)
move $fp,$sp
jal __main
li $2,5 # 0x00000005
sw $2,28($fp)
sw $0,24($fp)
$L2:
lw $2,24($fp)
lw $3,16($fp)
slt $2,$2,$3
bne $2,$0,$L5
j $L3
$L5:
.set noreorder
nop
.set reorder
sw $0,28($fp)
$L6:
lw $2,28($fp)
lw $3,20($fp)
slt $2,$2,$3
bne $2,$0,$L9
j $L4
$L9:
lw $2,28($fp)
move $3,$2
sll $2,$3,4
addu $4,$fp,16
addu $3,$2,$4
addu $2,$3,16
lw $3,24($fp)
lw $4,28($fp)
addu $3,$3,$4
sw $3,0($2)
$L8:
lw $2,28($fp)
addu $3,$2,1
sw $3,28($fp)
j $L6
$L7:
$L4:
lw $2,24($fp)
addu $3,$2,1
sw $3,24($fp)
j $L2
$L3:
$L1:
move $sp,$fp # sp not trusted here
lw $31,236($sp)
lw $fp,232($sp)
addu $sp,$sp,240
j $31
.end main
Ok, so I have to convert the following C code segment to MIPS Assembly.
f = k + A[5]
The question tells me that f is stored in register $s3, k is in $s2 and the base address of array A is $s4. This is what I put as my answer:
add $s3, $s2, $s4
Is this correct? Do I have to do anything special with the 5 in the array? I'm very new to MIPS, so any and all help if VERY much appreciated.
Are you working on this for homework? If so, are you actually writing out an executable program or just responding to a list of questions?
Either way yes, you do need to account for the 5 in the array. The question is telling you that $s4 points to the base address of the array, not the 5th index.
hint: A[0] would be at the same address as the base of the array.
Try this out. (Off the top of my head). Remember each index is * 4.
li $t2, 6 # init 6 to $t2
addi $t2, $t2, $t2 # $t2 * 2
addi $t2, $t2, $t2 # $t2 * 2
addi $t1, $t2, $s4 # A[6 * 4]
lw $t4, 0($t1) # load A[6] int $t4
addi $s3, $s2, $t4 # obtain f