I'm learning assembly code, and given this code, I need to find what this code is about. However I am trying to debug using qtspim. I know what the value inside each register, but I still don't get what is this code about.
If you find the pattern and what this code about, can you tell me how can you do it, and in what line you know the pattern? thanks!
.text
.globl main
.text
main:
li $s0, 0x00BEEF00 ##given $s0= 0x00BEEF00
Init:
li $t0, 0x55555555
li $t1,0x33333333
li $t2,0x0f0f0f0f
li $t3,0x00ff00ff
li $t4,0x0000ffff
Step1: and $s1, $s0, $t0
srl $s0,$s0,1
and $s2,$s0,$t0
add $s0,$s1,$s2
Step2: and $s1,$s0,$t1
srl $s0,$s0,2
and $s2,$s0,$t1
add $s0,$s1,$s2
Step3: and $s1,$s0,$t2
srl $s0,$s0,4
and $s2,$s0,$t2
add $s0,$s1,$s2
Step4: and $s1,$s0,$t3
srl $s0,$s0,8
and $s2,$s0,$t3
add $s0,$s1,$s2
Step5:
and $s1,$s0,$t4
srl $s0,$s0,16
and $s2,$s0,$t4
add $s0,$s1,$s2
End:
andi $s0,$s0,0x003f
enter image description here
enter image description here
mips explain
This is a population count, aka popcount, aka Hamming Weight. The final result in $s0 is the number of 1 bits in the input. This is an optimized implementation that gives the same result as shifting each bit separately to the bottom of a register and adding it to a total. See https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
This implementation works by building up from 2-bit accumulators to 4-bit, 8-bit, and 16-bit using SWAR to do multiple narrow adds that don't carry into each other with one add instruction.
Notice how it masks every other bit, then every pair of bits, then every group of 4 bits. And uses a shift to bring the other pair down to line up for an add. Like C
(x & mask) + ((x>>1) & mask)
Repeating this with a larger shift and a different mask eventually gives you the sum of all the bits (treating them all as having place-value of 1), i.e. the number of set bits in the input.
So the GNU C representation of this is __builtin_popcnt(x).
(Except that compilers will actually use a more efficient popcnt: either a byte lookup table for each byte separately, or a bithack that starts this way, but uses a multiply by a number like 0x01010101 to horizontal sum 4 bytes into the high byte of the result. Because multiply is a shift-and-add instruction. How to count the number of set bits in a 32-bit integer?)
But this is broken: it needs to use addu to avoid faulting; if you try to popcnt 0x80000000, the first add will have both inputs = 0x40000000, thus producing signed overflow and faulting.
IDK why anyone uses the add instruction on MIPS. The normal binary add instruction is called addu.
The add-with-trapping-on-signed-overflow instruction is add, which is rarely what you want even if your numbers are signed. You might as well just forget it exists and use addu / addui
Related
For this assignment i need to be able to make an array of size n based on a user inputed value from zero to fifty. So far this is what ive done below. If you have any advice for the question overall that would be very helpful too.
a) Prompt the user for an integer in the range of 0 to 50. If the user inputs 0 the program stops.
b) Otherwise, the program stores the numbers from 0 up to the input value into an array of words in memory, i.e. initializes the array with values from 0 up to N where N is the value that user has inputted.
c) The program then adds the value of all items of the array together (up to N) by loading them from the main memory then add them up, then prints out the sum with the message "The sum of integers from 0 to N is:". For example, if the user gave 5 as the input the program prints out "The sum of integers from 0 to 5 is 15".
Submit your work as a zip file as specified in the syllabus to eLearning by the due date.
.data
userPrompt : .asciiz "enter in an integer from zero to fifty "
zeroMessage : .asciiz " you have entered a zero , the program will close "
incorrectEntry : .asciiz " you have entered in a value greater than 50 ,
this is an incorrect value"
InputVal : .word
upperLim : .word 50
Array : .space InputVal
.text
main:
addi $t7 , $zero , 50
li $v0, 4 # load for printing of strings
la $a0, userPrompt
syscall
# take in user input and move the read in number to a temp
li $v0, 5
la $t0 , InputVal
syscall
# Store int A into memory
move $t0 , $v0
beq $t0 , $0 , numbersEqual
la $t1 , upperLim
li $v0 , 1
move $a0 , $t1
syscall
slt $t3 ,$t0 , $t1
sw $t0 , InputVal
#beq $t3 , $0 , ELSE
ELSE :
li $v0 , 4
la $a0 , incorrectEntry
syscall
li $v0 , 10
syscall
numbersEqual:
li $v0 , 4
la $a0 , zeroMessage
syscall
li $v0 , 10
syscall
The assembly language is symbolic language for machine code, and machine code is what your CPU can execute.
After you run assembler to compile your source, you get machine code, which is sort of final. Then you execute that machine code.
The .word 0x12345678 is not instruction of CPU, but directive of your assembler, it tells it to reserve whole word of memory at that place of machine code, and store value of 0x12345678 there. I'm not sure what .word without value does, whether it will reserve at least one word, you should rather do InputVal: .word 0 to be sure.
In the final machine code there is no ".word", that's not CPU instruction, there will be just those 4 bytes with their respective values forming word value, at some address, which was known to the assembler as symbol InputVal (that's also not part of machine code any more in this textual form, any instruction using this memory address has only the proper address encoded in it as numeric value, and the executable binary may contain some sort of relocation table for OS to patch addresses properly after loading the machine code to target memory before execution).
Now this may sounds like stating the obvious, but it's important for you to understand the difference, what is available when the CPU is already executing your machine code, and what is available during compilation by assembler (code not running yet).
Array : .space InputVal will not work as you wish, because InputVal is symbol of memory address. So the directive .space will either reserve bazillion bytes (value of memory address of InputVal), or more likely the compilation will fail. What you want is content of memory at InputVal, but that's not known yet, because the code didn't run, user didn't enter anything, it's still just assembly step. So the value is not know, you may just as write Array: .space 0. But that will reserve no space.
Unless you want to delve into dynamic memory allocation, there's simple trick to resolve such situation, which will work in your particular case. If you will read the task, the N is valid only when inputted value is from 0 to 50 (anything else is user error and you can exit). So for maximum N=50 you will need array of 51 values, and you will never need more.
So you can avoid all the dynamic memory allocation (at runtime) by simply doing:
.align 4 # make sure the reserved space is word-aligned
Array: .space (51*4)
This will reserve 204 bytes (51 words) of memory in the data area for your machine code, with symbol Array pointing at the first byte of it.
Now you have memory reserved, and you can store values into it, and use it. If you would want during runtime to change your mind, and use 52 words of memory, you are out of luck with such code. Then you need either to code dynamic allocation, or increment the hard-sized fixed buffer during compilation time.
Also your code has always 51 words available in that Array, so it's up to your code to use only the inputted N+1 values out of them. If user enters N=5, then you should work only over 6 words (24 bytes), ignoring the remaining reserved words).
So each loop in your code will be like for (i = 0; i <= N; ++i) Array[i] = i;, where N is the value entered by user during runtime (store it to InputVal reserved memory, so you can access it whenever you need it), not known during compilation.
So i have an array, that is filled previously with 1's or 0's, whenever i try to compile this code MIPS gives me a syntax error, could someone explain what this syntax error is? I'm having trouble understanding why you can't access the array like that, of course $t1 is a counter for the index, which increments up through 100
slti $t7, prim_flag($t1), 1 # checks if prim_flag ($t1) < 1 stores 1 if so stores 0 if not
beq $t7, 0, print_numbers # checks if the value in $t7 is 0, if so jump to end_game
and the array:
.data
test: .asciiz "Printing numbers:"
test_2: .asciiz "Before loop"
space: .asciiz " "
done: .asciiz "\n Done printing the array"
numbers:
.word 0:210
numbers_size:
.word 210
prim_flag:
.word 1:210
The only valid operand combination for slti is register,register,immediate. You're trying to use register,memory,immediate, and there's simply no such version of slti in the MIPS instruction set.
Practically every time you need to perform an operation on some data in memory in MIPS assembly, you first have to load that data into a register using lb/lh/lw; then you can perform the operation you need on that register; and finally write some result back to memory if necessary.
Also note that the constant to the left of the parentheses in prim_flag($t1) is an offset, not the base address. The base address is the part that's inside the parentheses, and has to be a register. And since the offset has to fit in 16 bits due to how MIPS instructions are encoded, it's possible that prim_flag won't fit. So you might have to load the address of prim_flag into some register, then add that register plus $t1 and store the sum in a third register, and then read from memory using that last register as the base address.
I am trying to write a cpu simulator. But, it doesnt seem to function as expected when bne instruction is encountered. bne performs the same as bqe. bqe seems to be working fine though:
Mux2_32(mbranchAddress, pcPlus4, branchAddress, AND2_1(zero, branch));
Mux2_32(pc, mbranchAddress, jumpAddress, jump);
if(!strcmp(opcode, "000101")&& !strcmp(branch, "1")){ /*bne instruction, ("000101" is the opcode for bne)*/
Mux2_32(mbranchAddress, pcPlus4, branchAddress, AND2_1(NOT_1(zero), branch));
Mux2_32(pc, mbranchAddress, jumpAddress, jump);
}
"branch" is the flag raised when the instruction is a branch instruction. zero is the single bit alu output
MUX2_32(a, b, c, d) works as follows:
a=b if d=0
a=c if d=1
where a, b and c are 32 bits long and d is a single bit.
Could someone please point out why beq instruction works fine but bne doesn't. Thanks
C does not support binary number constants. 000101 is an octal number with value 65... and '000101' is a 64 bit wide multichar constant. You need to use hex numbers, that is opcode 000101 in hex is 0x5...
I'm trying to learn emulation programming. I've done a CHIP-8 emulator, Under 40 instructions, and lived because of my music. I'm now hoping to do something A bit more complex, like an SNES. The problem I'm encountering is the sheer number of CPU instructions. Looking through the wiki.SuperFamicom.org 65c816 instruction listing, It look's like a pain in the rear. And I've seen notes here and there on various internet pages that the CPU is the easyest part of an emulator to impliment.
Under the assumption that it was so hard because I was doing it wrong, I looked around and found a simple implimentation: SNES Emulator in 15 minutes which is about 900 lines of code. Easy enough to work through.
So then, from the SNES Emulator in 15 minutes Source, I found where the CPU instructions are. It look's a lot simpler than what I was thinking. I dont really understand it, but it's a few lines of code as opposed to a large mass of code. First thing I notice is that the instructions only have 1 implimentation each. If you look at the table in SuperFamicom then you see that it has
ADC #const
ADC (_db_),X
ADC (_db_,X)
ADC addr
ADC long
...
And The emulator source for (I think) ALL of those is:
// Note: op 0x100 means "NMI", 0x101 means "Reset", 0x102 means "IRQ". They are implemented in terms of "BRK".
// User is responsible for ensuring that WB() will not store into memory while Reset is being processed.
unsigned addr=0, d=0, t=0xFF, c=0, sb=0, pbits = op<0x100 ? 0x30 : 0x20;
// Define the opcode decoding matrix, which decides which micro-operations constitute
// any particular opcode. (Note: The PLA of 6502 works on a slightly different principle.)
const unsigned o8 = op / 32, o8m = 1u << (op%32);
// Fetch op'th item from a bitstring encoded in a data-specific variant of base64,
// where each character transmits 8 bits of information rather than 6.
// This peculiar encoding was chosen to reduce the source code size.
// Enum temporaries are used in order to ensure compile-time evaluation.
#define t(w8,w7,w6,w5,w4,w3,w2,w1,w0) if( \
(o8<1?w0##u : o8<2?w1##u : o8<3?w2##u : o8<4?w3##u : \
o8<5?w4##u : o8<6?w5##u : o8<7?w6##u : o8<8?w7##u : w8##u) & o8m)
t(0,0xAAAAAAAA,0x00000000,0x00000000,0x00000000,0xAAAAA2AA,0x00000000,0x00000000,0x00000000) { c = t; t += A + P.C; P.V = (c^t) & (A^t) & 0x80; P.C = t & 0x100; }
In short, my General question:
Condensing the phenomenal cosmic power of CPU instructions into an itty bitty piece of code
Questions specific to the SNES emulator in 15 minutes source (portion posted above):
How does t(0, 0xAAAAAAAA, 0x00000000, ....) parse the instruction? I see the if statment, but I dont know where the number's for any of the arguments come from, or what they mean to the overall code.
Why o8 = op / 32 and o8m = 1u << (op%32)?
The opcodes for ADC has ADC #const which has a 2 byte operand, or ADC addr which has a 3 byte operand. And the code t(0, 0xAAAAAAAA, ...) impliments both cases?
While I'm asking:
what do the dp, _dp_ and sr that appear in ADC dp, ADC (_dp_) and ADC sr,S mean?
what is the difference between ADC (_dp_,X) and ADC dp,X? (probably redundand given the question above.)
I can't answer all of this, but dp stands for Direct Page, meaning that the instruction takes a single-byte operand which is a memory address within the Direct Page. Direct Page addressing is an extension of the Zero Page addressing mode of the 6502, where the single-byte addresses referred to memory locations $00 through $FF. The 16-bit derivatives of the 6502 have a configuration register which basically relocates the Zero Page to an alternate location.f
In the wiki page you linked to, some of the dp in the table have underscores on them, and the others are in italics. I assume that they are all intended to be italic, and the wiki markup isn't working. A quick check of the Edit link supports this assumption (in the wiki source, they all have underscores). So don't read anything into that.
In 6502 assembly and derivatives of it, ADC dp,X means... let's take a concrete example instead... ADC $10,X means to add $10 to the value in register X to obtain an address, then load a value from that address and add it to the accumulator. ADC ($10,X) adds an extra level of indirection: add $10 to X to obtain an address, load a value from that address, interpret the loaded value as another address, and load the value from that address and add it to the accumulator. Parenthesized operands always add a level of indirection.
Note that the available modes include (dp,X) and (dp),Y and the placement of the parentheses relative to the comma and register is significant. With (dp),Y the value of Y is added to the first loaded value to get the address to use in the second load.
As for that emulator... code golf doesn't lead to enhanced readability! I don't think the portion you've posted is actually understandable by itself, and I don't feel like tracking down and reading the rest of it. But the key concept in the t macro is bitstring. Its arguments are a series of 9 bitmasks, each 32 bits long, for a total of 288 bits. Every possible opcode (256 of them), plus the 3 pseudo-opcodes mentioned in the first comment, is therefore represented by a single bit in this 288-bit-long bitstring, with 29 bits left over.
That explains the construction of o8 and o8m. The 8-bit value is split into a 3-bit portion (to select an argument from the 8 arguments supplied to t) and a 5-bit portion (to select a single bit from the selected argument). The big ?: chain does the first selection and the combination of & and 1 << ... does the select selection.
And then, oh look we have a variable called t too. It's not related to the macro. Giving them the same name was just cruel.
Maybe I can figure out what that bitstring is doing. When the opcode is a low number, o8 (the high bits) will be 0, so the ?: chain will use w0, which is the last argument to the macro. As the opcode increases, the selected argument moves leftward through the argument list to w1, then w2... and the o8m selector likewise starts at the right and moves left (& (1<<0) is the rightmost bit, & (1<<1) is the next one, etc.) and the if condition will be true when the selected bit is 1. Values are:
0, # opcodes $100 and up
0xAAAAAAAA, # opcodes $E0 to $FF
0x00000000, # opcodes $C0 to $DF
0x00000000, # opcodes $A0 to $BF
0x00000000, # opcodes $80 to $9F
0xAAAAA2AA, # opcodes $60 to $7F
0x00000000, # opcodes $40 to $5F
0x00000000, # opcodes $20 to $3F
0x00000000 # opcodes $00 to $1F
or in binary
0, # opcodes $100 and up
0b10101010101010101010101010101010, # opcodes $E0 to $FF
0b00000000000000000000000000000000, # opcodes $C0 to $DF
0b00000000000000000000000000000000, # opcodes $A0 to $BF
0b00000000000000000000000000000000, # opcodes $80 to $9F
0x10101010101010101010001010101010, # opcodes $60 to $7F
0b00000000000000000000000000000000, # opcodes $40 to $5F
0b00000000000000000000000000000000, # opcodes $20 to $3F
0b00000000000000000000000000000000 # opcodes $00 to $1F
Reading each line from right to left, the 1's are in positions corresponding to these opcodes: $61 $63 $65 $67 $69 $6D $6F $71 $73 $75 $77 $79 $7B $7D $7F $E1 $E3 $E5 $E7 $E9 $EB $ED $EF $F1 $F3 $F5 $F7 $F9 $FB $FD $FF
Hmm... that sort of resembles the list of ADC and SBC opcodes, but some of them are wrong.
Oh (I finally gave up and looked at some more of the emulator code) that's a NES emulator, not a SNES emulator, so it only has 6502 opcodes.
I am trying copy some words from memory and saving it to another memory address using assembly.
I am trying to write the code for it but I am not sure about some of the parts. I will briefly describe what I want to do.
The source address, destination address and the number of words to copy are input arguments of the function.
From your description it sounds like a regular memcpy, except that you specify the number of words to copy rather than the number of bytes. Not sure where the whole stack buffer idea comes from(?).
Something like this would copy the words from the source to the destination address:
sll $a2,$a2,2
addu $a2,$a1,$a2 ; $a2 = address of first byte past the dest buffer
Loop:
lw $t0,0($a0)
sw $t0,0($a1)
addiu $a0,$a0,4
addiu $a1,$a1,4
bne $a1,$a2,Loop
nop
EDIT: If your source and destination buffers are not aligned on word boundaries you need to use lb/sb instead to avoid data alignment exceptions.
EDIT: added nops after branches
So think about how you would do this in C...At a low level.
unsigned int *src,*dst;
unsigned int len;
unsigned int temp;
...
//assume *src, and *dst and len are filled in by this point
top:
temp=*src;
*dst=temp;
src++;
dst++;
len--;
if(len) goto top;
you are mixing too many things, focus on one plan. First off you said you had a source and destination address in two registers, why is the stack involved? you are not copying or using the stack, you are using the two addresses.
it is correct to multiply by 4 to get the number of bytes, but if you copy one word at a time you dont need to count bytes, just words. This is assuming the source and destination addresses are aligned and or you dont have to be aligned. (if unaligned then do everything a byte at a time).
so what does this look like in assembly, you can convert to mips, this is pseudocode:
rs is the source register $a0, rd is the destination register $a1 and rx is the length register $a2, rt the temp register. Now if you want to load a word from memory use the load word (lw) instruction, if you want to load a byte do an lb (load byte).
top:
branch_if_equal rx,0,done
nop
load_word rt,(rs)
store_word rt,(rd)
add rs,rs,4
add rd,rd,4
subtract rx,rx,1
branch top
nop
done:
Now if you copy bytes at a time instead of words then
shift_left rx,2
top:
branch_if_equal rx,0,done
nop
load_byte rt,(rs)
store_byte rt,(rd)
add rs,rs,1
add rd,rd,1
subtract rx,rx,1
branch top
nop
done: