I'm Trying to convert this C code to MIPS assembly and I am unsure if it is correct. Can someone help me? Please
Question : Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D
C Code :
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
My Attempt at MIPS ASSEMBLY
add $t0, $t0, $zero # i = 0
add $t1, $t1, $zero # j = 0
L1 : slt $t2, $t0, $s0 # i<a
beq $t2, $zero, EXIT # if $t2 == 0, Exit
add $t1, $zero, $zero # j=0
addi $t0, $t0, 1 # i ++
L2 : slt $t3, $t1, $s1 # j<b
beq $t3, $zero, L1, # if $t3 == 0, goto L1
add $t4, $t0, $t1 # $t4 = i+j
muli $t5, $t1, 4 # $t5 = $t1 * 4
sll $t5, $t5, 2 # $t5 << 2
add $t5, $t5, $s2 # D + $t5
sw $t4, $t5($s2) # store word $t4 in addr $t5(D)
addi $t0, $t1, 1 # j ++
j L2 # goto L2
EXIT :
add $t0, $t0, $zero # i = 0 Nope, that leaves $t0 unmodified, holding whatever garbage it did before. Perhaps you meant to use addi $t0, $zero, 0?
Also, MIPS doesn't have 2-register addressing modes (for integer load/store), only 16-bit-constant ($reg). $t5($s2) isn't legal. You need a separate addu instruction, or better a pointer-increment.
(You should use addu instead of add for pointer math; it's not an error if address calculation crosses from the low half to high half of address space.)
In C, it's undefined behaviour for another thread to be reading an object while you're writing it, so we can optimize away the actual looping of the outer loop. Unless the type of D is _Atomic int *D or volatile int *D, but that isn't specified in the question.
The inner loop writes the same elements every time regardless of the outer loop counter, so we can optimize away the outer loop and only do the final outer iteration, with i = a-1. Unless a <= 0, then we must skip the outer loop body, i.e. do nothing.
Optimizing away all but the last store to every location is called "dead store elimination". The stores in earlier outer-loop iterations are "dead" because they're overwritten with nothing reading their value.
You normally want to put the loop condition at the bottom of the loop, so the loop branch is a bne $t0, $t1, top_of_loop for example. (MIPS has bne as a native hardware instruction; blt is only a pseudo-instruction unless the 2nd register is $zero.) So we want to optimize j<b to j!=b because we know we're counting upward.
Put a conditional branch before the loop to check if it might need to run zero times. e.g. blez $s0, after_loop to skip the inner loop body if b <= 0.
An idiomatic for(i=0 ; i<a ; i++) loop in asm looks like this in C (or some variation on this).
if(a<=0) goto end_of_loop;
int i=0;
do{ ... }while(++i != a);
Or if i isn't used inside the loop, then i=a and do{}while(--i). (i.e. add -1 and use bnez). Although MIPS can branch just as efficiently on i!=a as it can on i!=0, unlike most architectures with a FLAGS register where counting down saves a compare instruction.
D[4*j] means we stride by 16 bytes in a word array. Separately using a multiply by 4 and a shift by 2 is crazy redundant. Just keep a pointer in a separate register an increment it by 16 every iteration, like a C compiler would.
We don't know the type of D, or any of the other variables for that matter. If any of them are narrow unsigned integers, we might need to implement 8 or 16-bit truncation/wrapping.
But your implementation assumes they're all int or unsigned, so let's do that.
I'm assuming a MIPS without branch-delay slots, like MARS simulates by default.
i+j starts out (with j=0) as a-1 on the last outer-loop iteration that sets the final value. It runs up to j=b-1, so the max value is a-1 + b-1.
Simplifying the problem down to the values we need to store, and the locations we need to store them in, before writing any asm, means the asm we do write is a lot simpler and easier to debug.
You could check the validity of most of these transformations by doing them in C source and checking with a unit test in C.
# int a: $s0
# int b: $s1
# int *D: $s2
# Pointer to D[4*j] : $t0
# int i+j : $t1
# int a-1 + b : $t2 loop bound
blez $s0, EXIT # if(a<=0) goto EXIT
blez $s1, EXIT # if(b<=0) goto EXIT
# now we know both a and b loops run at least once so there's work to do
addiu $t1, $s0, -1 # tmp = a-1 // addu because the C source doesn't do this operation, so we must not fault on signed overflow here. Although that's impossible because we already excluded negatives
addu $t2, $t1, $s1 # tmp_end = a-1 + b // one past the max we store
add $t0, $s2, $zero # p = D // to avoid destroying the D pointer? Otherwise increment it.
inner: # do {
sw $t1, ($t0) # tmp = i+j
addiu $t1, $t1, 1 # tmp++;
addiu $t0, $t0, 16 # 4*sizeof(*D) # could go in the branch-delay slot
bne $t1, $t2, inner # }while(tmp != tmp_end)
EXIT:
We could have done the increment first, before the store, and used a-2 and a+b-2 as the initializer for tmp and tmp_end. On some real pipelined/superscalar MIPS CPUs, that might be better to avoid putting the increment right before the bne that reads it. (After moving the pointer-increment into the branch-delay slot). Of course you'd actually unroll to save work, e.g. using sw $t1, 16($t0) and 32($t0) / 48($t0).
Again on a real MIPS with branch delays, you'd move some of the init of $t0..2 to fill the branch delay slots from the early-out blez instructions, because they couldn't be adjacent.
So as you can see, your version was over-complicated to say the least. Nothing in the question said we have to transliterate each C expression to asm separately, and the whole point of C is the "as-if" rule that allows optimizations like this.
This similar C code compiles and translates to MIPS:
#include <stdio.h>
main()
{
int a,b,i,j=5;
int D[50];
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
}
Result:
.file 1 "Ccode.c"
# -G value = 8, Cpu = 3000, ISA = 1
# GNU C version cygnus-2.7.2-970404 (mips-mips-ecoff) compiled by GNU C version cygnus-2.7.2-970404.
# options passed: -msoft-float
# options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
# -fpcc-struct-return -fcommon -fverbose-asm -fgnu-linker -msoft-float
# -meb -mcpu=3000
gcc2_compiled.:
__gnu_compiled_c:
.text
.align 2
.globl main
.ent main
main:
.frame $fp,240,$31 # vars= 216, regs= 2/0, args= 16, extra= 0
.mask 0xc0000000,-4
.fmask 0x00000000,0
subu $sp,$sp,240
sw $31,236($sp)
sw $fp,232($sp)
move $fp,$sp
jal __main
li $2,5 # 0x00000005
sw $2,28($fp)
sw $0,24($fp)
$L2:
lw $2,24($fp)
lw $3,16($fp)
slt $2,$2,$3
bne $2,$0,$L5
j $L3
$L5:
.set noreorder
nop
.set reorder
sw $0,28($fp)
$L6:
lw $2,28($fp)
lw $3,20($fp)
slt $2,$2,$3
bne $2,$0,$L9
j $L4
$L9:
lw $2,28($fp)
move $3,$2
sll $2,$3,4
addu $4,$fp,16
addu $3,$2,$4
addu $2,$3,16
lw $3,24($fp)
lw $4,28($fp)
addu $3,$3,$4
sw $3,0($2)
$L8:
lw $2,28($fp)
addu $3,$2,1
sw $3,28($fp)
j $L6
$L7:
$L4:
lw $2,24($fp)
addu $3,$2,1
sw $3,24($fp)
j $L2
$L3:
$L1:
move $sp,$fp # sp not trusted here
lw $31,236($sp)
lw $fp,232($sp)
addu $sp,$sp,240
j $31
.end main
Related
I am doing a homework problem for the computer design and architecture class where we are required to implement a simple for loop in C, into MIPS using the MARSim. I implemented the for loop step by step, initialized (if that is the proper term) the variables in memory, yet when I assemble and run it throws this error: question1.asm line 11: Runtime exception at 0x00400008: address out of range 0x00000000
I've looked at line 11 : lw $t1, 0($a1),
and as I understand it, this should work properly. As I understand here we are setting our t1 value to a1 (b[i]).
Here is the C we have to reproduce:
for (i=0; i<=100; i++) { a[i] = b[i] + C ; }
Here is my attempt:
# t0 = i
# t1 = b[i]
# t2 = a[i]
# t3 = 101 (the end value of i)
# s0 = c
# $a0 = a
# $a1 = b
begin:
addi $t3, $zero, 101 #loop terminate value
add $t0, $zero, $zero # set our counter to zero
loop: lw $t1, 0($a1) # set t1 to b[i]
add $t1, $t1, $s0 # B[i] + c
sw $t1, 0($a0) #store $t1 into a[i]
addi $t0, $t0, 1 # loop increment
addi $a0,$a0,4 # increment a0 to point to the next block in the array (4 bits)
addi $a1, $a1, 4 # likewise with b[i]
beq $t0, $s0, finish
finish:
The MIPS code I've written, should mimic the action of the C code w/o error. However on the first iteration of the loop it states it is accessing an out of range address 0x00000000. Could someone shine some light on what I am doing wrong? I would really appreciate a thorough explanation so I can understand this better for my class. Thanks for your assistance and much love.
Cheers.
I'm very new to MIPs and trying to create a for loop for an assignment.
for (int i=1; i < 16; i+=2)
{
A[i] = A[i] + B[3*i]
}
With the current code I have when I try to load the value of A[i] it says fetch address not aligned on word boundary.
Here is my code:
main:
li $t0, 1 # Starting index of t0=i
lw $s7, aSize # Loop bound
la $s0, A # &A
la $s6, endA # &endA
la $s1, B # &B
loop:
#TODO: Write the loop code
addi $t3, $zero, 3 # $t3 = 3
mul $t4, $t0,$t3 # $t4 = i * 3
sll $t4, $t4, 2 # $t4 into a byte offset
add $s1, $s1, $t4 # $s1 = &B[i*3]
add $s0, $s0, $t0 # $s0 = &A[i]
lw $t1, 0($s0) # value of A[i]
lw $t2, 0($s1) # value of B[i * 3]
add $t2, $t1, $t2 # A[i] + B[i]
sw $t2, 0($s0) # A[i] = A[i] + B[i]
addi $s0, $s0, 2
addi $s1, $s1, 2
addi $t0, $t0, 1 #i++
bne $t0, $s7, loop
I'm very new to MIPs so not sure whats going on or where to even look. I appreciate any help.
When you do:
mul $t4, $t0,$t3 # $t4 = i * 3
You are calculating the array index [as you would in c].
But, before you can add that to the base address of the array, you need to convert this index into a byte offset. That is, you have to multiply it by 4. This can be done [as in c] with a shift left of 2.
So, after the mul, do:
sll $t4,$t4,2
You have to do this multiply/shift for all index values before adding them in.
UPDATE:
Okay that makes sense. I added that in but I'm still getting that word boundary error on the line "lw $t1, 0($s0)"
You're not showing the definition of A or B, so there could be an alignment issue.
When you do:
add $s1, $s1, $t4 # $s1 = &B[i*3]
You are modifying the original/base value of &B[0]. That's not what you want. Use a different register for the final address value (i.e. leave $s1 unchanged throughout the loop)
Do something like:
add $s3, $s1, $t4 # $s3 = &B[i*3]
lw $t2, 0($s3) # value of B[i * 3]
Adjust other similar register usage in a similar manner (i.e. you have a similar problem with the A array)
I've coded up a cleaned up version. I've not assembled nor tested it, but I think it will get you closer. This may have an off-by-one error [it's hard to tell without the entire program] as I'm not sure what aSize is.
main:
li $t0,1 # Starting index of t0=i
lw $s7,aSize # Loop bound
la $s0,A # &A
la $s6,endA # &endA
la $s1,B # &B
addi $t3,$zero,3 # $t3 = 3
loop:
# TODO: Write the loop code
mul $t4,$t0,$t3 # $t4 = i * 3
sll $t4,$t4,2 # $t4 into a byte offset
add $s3,$s1,$t4 # $s3 = &B[i*3]
sll $t4,$t0,2 # $t4 into a byte offset
add $s2,$s0,$t4 # $s2 = &A[i]
lw $t1,0($s2) # value of A[i]
lw $t2,0($s3) # value of B[i * 3]
add $t2,$t1,$t2 # A[i] + B[i]
sw $t2,0($s2) # A[i] = A[i] + B[i]
addi $t0,$t0,2 # i += 2
bne $t0,$s7,loop
C statement
A= A? C[0] : B;
Is is correct to write in assembly instruction this way?
Assuming $t1=A, $t2=B, $s1=base address of Array C:
beq $t1, $0, ELSE
lw $t1, 0($s1)
ELSE: add $t1, $t2, $0
No, it doesn't seem correct because add $t1, $t2, $0 will be executed even if $t1 != $0.
I hope that this works (not tested):
beq $t1, $0, ELSE
sll $0, $0, 0 # NOP : avoid the instruction after branch being executed
lw $t1, 0($s1)
j END
sll $0, $0, 0 # NOP : avoid the instruction after branch being executed
ELSE: add $t1, $t2, $0
END:
This code assumes that the elements of C are 4-byte long each.
You can avoid an unconditional j. Instead of structuring this as an if/else, always do the A=B (because copying a register is cheaper than jumping) and then optionally do the load.
On a MIPS with branch-delay slots, the delay slot actually helps us:
# $t1=A, $t2=B, $s1=base
beq $t1, $zero, noload
move $t1, $t2 # branch delay: runs always
lw $t1, 0($s1)
noload:
# A= A? C[0] : B;
On a MIPS without branch-delay slots (like MARS or SPIM in their default config):
# MIPS without branch-delay slots
# $t1=A, $t2=B, $s1=base
move $t3, $t1 # tmp=A
move $t1, $t2 # A=B
beq $t3, $zero, noload # test the original A
lw $t1, 0($s1)
noload:
# $t1 = A= A ? C[0] : B;
If we can clobber B and rearrange registers, we can save an insn without a branch-delay:
# MIPS without branch-delay slots
# $t1=A, $t2=B, $s1=base
beq $t1, $zero, noload
lw $t2, 0($s1)
noload:
# $t2 = A. B is "dead": we no longer have it in a register
A move $t3, $t2 before the BEQ could save B in another register before this sequence. Ending up with your variables in different registers can save instructions, but makes it harder to keep track of things. In a loop, you can get away with this if you're unrolling the loop because the 2nd copy of the code can re-shuffle to get registers back the way they need to be for the top of the loop.
move x, y is a pseudo-instruction for or x, y, $zero or ori x, y, 0. Or addu x, y, $zero. Implement it however you like or let your assembler do it for you.
Write a MIPS segment for the C statement
x=5; y=7;
Land(x,y,z) // z=x &y as procedure call
if (z>x) y=x+1
I am trying to code a program that checks if the 16 bits in an integer is a one or zero. I chose to implement this by shifting right one bit 15 times and checking if the first bit in each shift is a zero or non zero. Then, if the first bit is a 1, I increment an integer.
I made some code in C that represents a non-user input version of my code.
int j = 100;
int checker = 0;
int count = 0;
for (i=0; i<16; i++) {
checker = j & 0x1;
if (checker > 0)
count++;
j = (j >> 1);
}
My code in MIPS:
.data
userprompt: .asciiz "Enter positive integer: "
newline: .asciiz "\n"
.text
.globl main
main:
li $v0, 4 # System call: Display string
la $a0, userprompt # Load string userprompt for output
syscall
li $v0, 5 # System call: Read integer
syscall
move $s0, $v0 # Store integer from v0 to s0
move $s1, $s0 # s1 = s0
li $t0, 0 # t0 = 0
jal chk_zeros # Run function: chk_zeroes
li $v0, 1 # System call: Read integer
move $a0, $t2 # Store integer from t2 to a0
syscall
li $v0, 10 # System call: quit
syscall
chk_zeros:
bgt $t0, 15, exitchk # t0 <= 15
addi $t0, $t0, 1 # Add one to t0
andi $t1, $s1, 0x1 # Check if first bit is non-zero, store in t1
bgtz $t1, chk_zerosadd # If t1 >= 0
j chk_zeros
chk_zerosadd:
addi $t2, $t2, 1 # Add one to t2
jr $ra # Return to after the if statement (does not work!)
exitchk:
jr $ra
What I am having trouble with is making chk_zerosadd return to after the branching statement. jr $ra seems to return me to my main function in chk_zerosadd.
bgtz doesn't place the next PC address into the return address register, so jr $ra won't return to the instruction after the branching statement. You can either use bgtzal (branch if greater than zero and link), which will give you the behaviour you are looking for, or else you can re-arrange your code so that you branch over the add, instead of branching to it, like this:
andi $t1, $s1, 0x1 # Check if first bit is non-zero, store in t1
beq $t1, chk_zerosskipadd # Jump if $t1 is zerp
addi $t2, $t2, 1 # Add one to t2
chk_zerosskipadd:
# continue execution...
srl $s1, $s1, 1 # j = (j >> 1);
j chk_zeros
Just started learning assembly for one of my classes and I am a bit confused over this code segment. This is from a textbook question which asks you to translate from MIPS instructions to C. The rest of the question is in the attached image.
For the MIPS assembly instructions above, what is the corresponding
C statement? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively.
sll $t0, $s0, 2 # $t0 = f * 4
add $t0, $s6, $t0 # $t0 = &A[f]
sll $t1, $s1, 2 # $t1 = g * 4
add $t0, $s6, $t0 # $t1 = &B[g]
lw $s0, 0($t0) # f = A[f]
addi $t2, $t0, 4
lw $t0, 0($t2)
add $t0, $t0, $s0
sw $t0, 0($t1)
I have a basic understanding of some MIPS instructions but frankly, the stuff with arrays confuses me a bit. Could anyone here point me in the right direction? Thanks!
It's been a while since I last wrote MIPS assembly. However, from what I can understand from the first few instructions:
sll $t0, $s0, 2 # t0 = 4 * f
add $t0, $s6, $t0 # t0 = &A[f]
s0 holds the index f at which you want to access array A. Since you multiply f by 4, A is an array of some datatype with 4 bytes length (probably an int). s6 is holding the array address, because to access the address of A[f] you would essentially do (in pseudocode)
address_of_A[f] = base_address_of(A) + offset_of_type_int(f)
The same stuff in principle happens in the next 2 instructions, but this time for array B. After that:
lw $s0, 0($t0) # f = A[f]
addi $t2, $t0, 4 # t2 = t0 + 4
The first load is obvious, s0 gets the value at address t0, which is of course A[f]. The second increments t0 by 4 and stores it to t2, which means that t2 now contains the address &A[f+1], since we know that array A contains 4-byte data.
The last lw command:
lw $t0, 0($t2)
stores the value at address $t2 on $t0, so $t0 is now A[f+1].