100x100 array A of integers, one byte each, is located at A. Write a program segment to compute the sum of the minor diagonal, i.e.
SUM = ΣA[i,99-i], where i=0...99
This is what I have so far:
LEA A, A0
CLR.B D0
CLR.B D1
ADDA.L #99, D0
ADD.B (A0), D1
ADD.B #1, D0
BEQ Done
ADDA.L #99,A0
BRA loop
There's quite many issues in this code, including (but not limited to):
You use 'Loop' and 'Done', but the labels are not shown in the code
You are adding 100 bytes in D1, also as a byte, so you are definitely going to overflow on the results (the target of the sum should at least be 16 bit, so .w or .l addressing)
I'm perhaps wrong but I think the 'minor diagonal' goes from the bottom left to the upper right, while your code goes from the top left to the bottom right of the array
On the performance side:
You should use the 'quick' variant of the 68000 instruction set
Decrement and branch as mentioned by JasonD is more efficient than add/beq
Considering the code was close enough from the solution, here is a variant (I did not test, hope it works)
lea A+99*100,a0 ; Points to the first column of the last row
moveq #0,d0 ; Start with Sum=0
moveq #100-1,d1 ; 100 iterations
Loop
moveq #0,d2 ; Clear register long
move.b (a0),d2 ; Read the byte
add.l d2,d0 ; Long add
lea -99(a0),a0 ; Move one row up and one column right
dbra d1,Loop ; Decrement d1 and branch to Loop until d1 gets negative
Done
; d0 now contains the sum
Related
I have an Arduino MEGA program written in C that populates an array with random integers, then calls a bubble sort algorithm written in ASM inline assembly. The sorted integers are then converted into binary and eight LEDs are lit with each LED corresponding to one bit of the binary number.
Firstly, the declared global variables.
const byte arraySize = 10;
volatile byte randomNums[arraySize];
volatile byte limit = arraySize-1;
volatile byte counter = 1;
volatile byte iteration = 1;
Next, the main program loop (for simplicity, I will omit the binary conversion and LED code).
void loop() {
for (int i = 0; i < arraySize; i++) {
randomNums[i] = random(255);
}
// asm inline bubble sort here
}
Lastly, the bubble sort algorithm in ASM inline assembly.
asm volatile(
" lds r20, (limit) ; position before end of array \n"
" lds r21, (counter) ; counter for loop is defined i=0 \n"
" lds r22, (iteration) ; counter set for iteration of sort algorithm k=1 \n"
" mov r21, r27 ; i=1 \n"
" mov r22, r28 ; iteration number in r28=k \n"
" lds r23, (randomNums) ; point to beginning of array by r23 'element' \n"
" mov r23, r24 "
" add 1, r24 ; point to 'neighbour' \n"
" check%=: mov r23, r25 ; get 'element' and place in r25 \n"
" mov r24, r26 ; get 'neighbour' in array in r26 \n"
" cp r25, r26 ; compare both values \n"
" brge swap%= ; swap the numbers \n"
" add 1, r23 ; increment pointer r23 \n"
" add 1, r24 ; increment pointer r24 \n"
" add 1, r27 ; increment loop counter \n"
" eor r27, (limit) ; xor check if not exceeding array size \n"
" brne check%= "
" swap%=: mov r23, r24 ; swap content where index r23 is pointing to where index r24 is pointing \n"
" mov r26, r23 ; move greater number to position after smaller number \n"
" add 1, r23 ; increment pointer r23 \n"
" add 1, r24 ; increment pointer r24 \n"
" add 1, r28 ; increment loop counter \n"
" cp r28, (arraySize) ; check not exceeding array capacity \n"
" ret "
::: "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28"); // clobbered registers
The program works (save for the sorting) when the assembly is commented out. When I compile the program with the assembly, I get the following obscure errors.
C:\Users\USERNAME\AppData\Local\Temp\cc2d2uAp.s: Assembler messages:
C:\Users\USERNAME\AppData\Local\Temp\cc2d2uAp.s:1963: Error: garbage at end of line
C:\Users\USERNAME\AppData\Local\Temp\cc2d2uAp.s:1971: Error: constant value required
C:\Users\USERNAME\AppData\Local\Temp\cc2d2uAp.s:1972: Error: garbage at end of line
C:\Users\USERNAME\AppData\Local\Temp\cc2d2uAp.s:1977: Error: constant value required
lto-wrapper.exe: fatal error: E:\Program Files (x86)\Arduino\hardware\tools\avr/bin/avr-gcc returned 1 exit status
compilation terminated.
e:/program files (x86)/arduino/hardware/tools/avr/bin/../lib/gcc/avr/7.3.0/../../../../avr/bin/ld.exe: error: lto-wrapper failed
collect2.exe: error: ld returned 1 exit status
exit status 1
Error compiling for board Arduino Mega ADK.
I've searched online and taken a look at the AVR assembly manual and I can't figure out what these errors mean and where they are occurring. I'll add that I'm new to AVR assembly and ASM inline.
The "garbage at end of line" problems are caused by missing \n characters at the end of the lines. There are three of these lines but I suspect the third is okay because it's at the end.
The other two problems are with the following lines, I don't believe these are allowed to have memory operands:
eor r27, (limit)
cp r28, (arraySize)
The line number differences between those two lines match the difference (1977 - 1971) in the generated errors, once you take into account the following two lines are considered as one due to the missing \n on the first:
" brne check%= "
" swap%=: mov r23, r24 ; swap content ... \n'
As an aside (as pointed out by Jester in the comments), it looks like Atmel don't have an immediate-operand ADD instruction (see here), so I'd be circumspect about instructions of the form:
add 1, r27
There are a number of possibilities why this might be allowed:
The problem could just be hidden by your current errors;
It could be a deficiency in the inline assembler that treats it as add r1, r27;
It could be a feature in the inline assembler in that it could be generating code to emulate this operation.
On that last point, I've seen this sort of thing done before, such as a "multi-register push":
push r1, r7, r42
which just assembles to three single-register pushes.
It may be that the assembler is smart enough to turn add 1, r27 into inc r27. Since the only immediate value you're adding is 1, this is a possibility.
This also points to a possible solution should it turn out to be a deficiency and it's erroneously adding r1 rather than 1, to r27. Just turn those add instructions into inc instructions.
And just commenting on your code logic rather than just the syntax, I'm not sure you've correctly separated the pointer and content concepts correctly. It looks like r23/24 are meant to be the addresses of cells in the randomNums array but the instruction:
lds r23, (randomNums)
will load the first value of that array into the register. So, consider:
+---+---+---+---+
randomNums # 0x1000 | 4 | 2 | 3 | 1 |
+---+---+---+---+
The instruction you use would have r23 being the value 4 rather than he address 1000. The ldi instruction is what you would use to load up an immediate value.
Even after fixing that, extracting the values using:
mov r23, r25
mov r24, r26
won't work because you're transferring the addresses, not using those addresses to get the values for comparison.
Loading values indirectly using a register is usually done by loading that register into one that can be used by the lpm instruction, such as Z (R30/31).
Additionally, while you branch to swap, you return from there meaning, even if you had the right addresses set up, you would swap at most one pair of elements.
One fix is to call swap as a subroutine, rather than branching to it, and modify it so that it only swaps and returns, removing the register manipulations and comparison - they should be done back in the main code.
The other (preferable in my opinion) is to treat it like an if statement and just skip over the code that swaps, something like:
cp r25, r26 ; compare both values.
brle noswap%= ; skip swap if already ordered.
#swap (r25), (r26) ; actual code to swap goes here.
noswap%=:
add 1, r23 ; carry on with loop.
I'm writing code to find the last 0 on an array.
Basically I need to move a new value on the "top" of each array, if it has only zeros it puts it at the end and if it finds other value it puts it on the last 0 (I'm treating my arrays as piles).
So far my subroutine works fine for the most part but sometimes it rewrites a value that I don't want (instead of getting the first value different from 0 it takes the next one). Here's the code I've been using to get the "top" of the array.
TOP:
xor ecx,ecx
xor ebx,ebx
TOP_FOR:
mov bx,word[eax+ecx*2] ;eax has the pointer of the array
cmp ecx,n ;n is the array's length
je END_TOP
inc ecx
cmp bx,0
je TOP_FOR
;here i get the direction of the first value different
END_TOP: ;from 0 but in my code i need the last 0, so
dec ecx ;i decrease ecx (result of this subrutine)
ret
For example,
If I put an array with 0,2 I expect ecx = 0, but with that input actually get 1.
With the array 1,2 I get 0 (which is what I want)
with the array 0,0 I get 1 (what I want, again)
Edit: tried starting the loop on n-1 and it's giving me even weirder results.
TOP:
xor ecx,ecx
;xor ebx,ebx
mov ecx,n-1
TOP_FOR:
;mov bx,word[eax+ecx*2]
cmp word[eax+ecx*2],0
je FIN_TOPE
dec ecx
cmp ecx,0
jne TOP_FOR
END_TOP:
ret
Your logic is totally backwards. Your cmp/je loop condition leaves the loop when you find the first non-zero. (And you've already incremented ECX after loading, but before checking it).
So after your loop, ECX = index of the element after the first non-zero element.
You at least 2 options:
remember the last-seen 0 in another register, and use it at the end of the loop
loop backwards, starting with ECX = n-1, and exit the loop on the first zero. (Or on dec ecx producing 0.)
One of these is obviously more efficient and easier than the other. :P
I'll leave it up to you to solve the off-by-1 problems, but probably you want to have the ecx < n or ecx >= 0 check at the bottom of the loop, e.g. dec ecx / jge TOP_FOR. i.e. a do{}while(--i) loop.
Also, normally EBX is a call-preserved register. You don't need to use it at all, though. cmp word [eax + ecx*2], 0 works fine.
Also in your current code, you read 2 bytes past the end of the array. potentially faulting if it was at the end of a page. (You don't use it, though, so it's not a correctness problem other than that.) You use ECX as an index before checking if it's too large! That problem goes away if you just use a memory, immediate cmp.
Also, normally a pointer-increment is more efficient. After the loop you can subtract and right-shift to get an index.
I want to copy a 5x5 matrix of bits to a peripherical. The problem I´m having is that I can´t start the column cicle with the line incrementation variable with 0. In a high-level it would be like this (very simple):
for (line=0;line<4;line++)
for (column=0;column<4;column+++)
R2- line
R3- column
line_cicle:
CMP R2, 4
JZ end
ADD R2,1
column_cicle:
; do stuff that is not depend of the end of a line
CMP R3, 4
JZ line_cicle
; do stuff that is depend of the end of a line
ADD R3, 1
JMP column_cicle
That ADD R2,1 is what is messing up, but where do I put it so that it doesn´t start with 1?
I don't really understand what you're saying/doing with your proposed assembly implementation. Why are you initializing the register to 1 when your loop is supposed to start at 0?
But a for loop nested in another for loop is a relatively simple thing to write, so let's start over and just take things one step at a time, starting from the high-level C code:
for (line=0;line<4;line++)
for (column=0;column<4;column++)
Here is the first (outer) for loop:
xor eax, eax ; line = 0
.LineLoop:
; Do something with line (EAX).
; ...
inc eax ; ++line
cmp eax, 4
jb .LineLoop ; keep looping if line < 4
; We are now finished with the loop.
Now, of course, a compiler wouldn't generate this code. This is a very small loop—it only goes around 4 times—so the overhead of the loop is probably going to be substantial compared to the code that gets executed inside, on each iteration. So a compiler would actually unroll the loop 4 times, producing code that is not only faster but more readable. However, I digress…we were writing loops. :-)
We have the outer loop, and we need the inner loop. Of course, the inner loop is basically the same thing as the outer loop, just with a different variable. Here is the inner loop:
xor edx, edx ; column = 0
.ColumnLoop:
; Do something with column (EDX).
; ...
inc edx ; ++column
cmp edx, 4
jb .ColumnLoop ; keep looping if column < 4
; We are now finished with the loop.
Simple enough, right? I just changed the variable/register and the label name. The last task is to nest them. It turns out that is simple, too. The inner loop's code just gets stuck right in the outer loop's code, right there where I said Do something with line (EAX), since the inner loop is going to do something with line—it's going to loop through all of the columns associated with that line. It is another copy-paste job:
xor eax, eax ; line = 0
.LineLoop:
xor edx, edx ; column = 0
.ColumnLoop:
; Do something with line (EAX) and column (EDX).
; ...
inc edx ; ++column
cmp edx, 4
jb .ColumnLoop ; keep looping if column < 4
inc eax ; ++line
cmp eax, 4
jb .LineLoop ; keep looping if line < 4
; We are now finished with both loops.
Remember that you can choose different registers for your loop counters. I just arbitrarily chose EAX and EDX. If you are going to call a function inside the body of the loop that does something with the line and column, and that function expects its parameters to be passed in different registers, then you might as well use those registers as your loop counters.
Note that there is a slightly more optimal way to write this code that would eliminate the cmp instructions. Instead of starting from 0 and counting up (which requires us to do a comparison to see if we've reached the end yet), we can start from the end and count down. Then, we just take advantage of the fact that the dec instruction sets the zero flag (ZF) when the result is 0, branching directly on that flag, instead of having to do a comparison. The code is easier to understand than the explanation:
mov eax, 4 ; line = 4
.LineLoop:
mov edx, 4 ; column = 4
.ColumnLoop:
; Do something with line (EAX) and column (EDX).
; ...
dec edx ; --column
jnz .ColumnLoop ; keep looping if column > 0
dec eax ; --line
jnz .LineLoop ; keep looping if line > 0
; We are now finished with both loops.
The only issue with this is that you are looping backwards over the lines and columns. This is usually not a problem, though.
I have to add two 3*3 arrays of words and store the result in another array. Here is my code:
.data
a1 WORD 1,2,3
WORD 4,2,3
WORD 1,4,3
a2 WORD 4, 3, 8
WORD 5, 6, 8
WORD 4, 8, 9
a3 WORD DUP 9(0)
.code
main PROC
mov eax,0;
mov ebx,0;
mov ecx,0;
mov edx,0;
mov edi,0;
mov esi,0;
mov edi,offset a1
mov esi,offset a2
mov ebx, offset a3
mov ecx,LENGTHOF a2
LOOP:
mov eax,[esi]
add eax,[edi]
mov [ebx], eax
inc ebx
inc esi
inc edi
call DumpRegs
loop LOOP
exit
main ENDP
END main
But this sums all elements of a2 and a1. How do I add them row by row and column by column? I want to display the result of sum of each row in another one dimensional array(Same for columns).
The
a1 WORD 1,2,3
WORD 4,2,3
WORD 1,4,3
will compile as bytes (in hexa):
01 00 02 00 03 00 04 00 02 00 03 00 01 00 04 00 03 00
Memory is addressable by bytes, so if you will find each element above, and count it's displacement from the first one (first one is displaced by 0 bytes, ie. it's address is a1+0), you should see a pattern, how to calculate the displacement of particular [y][x] element (x is column number 0-2, y is row number 0-2... if you decide so, it's up to you, what is column/row, but usually people tend to consider consecutive elements in memory to be "a row").
Pay attention to the basic types byte size, you are mixing it everywhere in every way, reread some lesson/tutorial about how qword/dword/word/byte differ and how you need to adjust your instructions to work with correct memory size, and how to calculate the address correctly (and what is the size of eax and how to use smaller parts of it).
If you have trouble to figure it on your own:
displacement = (y * 3 + x) * 2 => *2 because element is word, each occupies two bytes. y * 3 because single row is 3 elements long.
In ASM instructions that may be achieved for example...
If [x,y] is [eax,ebx], this calculation can be done as lea esi,[ebx+ebx*2] ; esi = y*3 | lea esi,[esi+eax] ; esi = y*3+x | mov ax,[a1+esi*2] ; loads [x,y] element from a1.
Now if you know how to calculate address of particular element, you can do either loop doing all the calculation ahead of each element load, or just do the math in head how the addresses differ and write the address calculation for first element (start of row/column) and then mov + 2x add with hardcoded offsets for next two elements (making loop for 3 elements is sort of more trouble than writing the unrolled code without loop), and repeat this for all three columns/rows and store the results.
BTW, that call DumpRegs ... is not producing what you expected? And it's a bit tedious way to debug the code, may be worth to spend a moment to get some debugger working.
Couldn't help my self, but to write it, as it's such funny short piece of code, but you will regret it later, if you will just copy it, and not dissect it to atoms and understand fully how it works):
column_sums: DW 0, 0, 0
row_sums: DW 0, 0, 0
...
; columns sums
lea esi,[a3] ; already summed elements of a1 + a2
lea edi,[column_sums]
mov ecx,3 ; three columns to sum
sum_column:
mov ax,[esi] ; first element of column
add ax,[esi+6] ; 1 line under first
add ax,[esi+12] ; 2 lines under
mov [edi],ax ; store result
add esi,2 ; next column, first element
add edi,2 ; next result
dec ecx
jnz sum_column
; rows sums
lea esi,[a3] ; already summed elements of a1 + a2
lea edi,[row_sums]
mov ecx,3 ; three rows to sum
sum_row:
mov ax,[esi] ; first element of row
add ax,[esi+2] ; +1 column
add ax,[esi+4] ; +2 column
mov [edi],ax ; store result
add esi,6 ; next row, first element
add edi,2 ; next result
dec ecx
jnz sum_row
...
(didn't debug it, so bugs are possible, plus this expect a3 to contain correct element sums, which your original code will not produce, so you have to fix it first ... this code does contain lot of hints, how to fix each problem of original)
Now I feel guilty of taking the fun of writing this from you... nevermind, I'm sure you can find few more tasks to practice this. The question is, whether you got the principle of it. If not, ask which part is confusing and how you currently understand it.
no no no no this top answer so terrible...
first we have a big memory access issue
change ur array access to be: "memtype ptr [memAddr + (index*memSize)]"
(): must be in a register of dword size im pretty sure, i know for a fact if its in a register it must be dword size, idk if u can do an expression like the way ive written it using the *...
memtype = byte, word (everything is a dword by default)
index = pos in array
memSize: byte = 1, word = 2, dword = 4
IF YOU DO NOT DO THIS, ALL MEMORY ACCESS WILL BE OF TYPE DWORD, AND YOU MIGHT ACCESS OUT OF BOUNDS AND MOST DEFINETELY YOU WILL NOT GET THE CORRECT VALUES BECAUSE IT IS MIXING MEMORY OF DIFFERENT THINGS( dword = word + word, so when u only want the word u have to do a word ptr, otheriwse it will give u the word+word and who knows what that value will be)
your type size is word, and your also trying to put it in a dword register, u can do the word size register of eax(ax) instead, or u can do movzx to place it in eax if you want to use the whole register
next accessing the array in different formats
i mean this part should be fairly obvious if you have done any basic coding, i think ur top error is the main issue
its just a normal array indexs: 0->?
so then you just access the addr [row * colSize + col]
and the way u progress your loop should be fairly self explanatory
I'm having another issue with addition in 6502....
I am attempting to add two n-byte integers to produce an n-byte result. I'm not completely sure if I understand the 6502 chip as much as I should for this project so any feedback on my current code would be extremely helpful.
I know I am supposed to be using INX (increment the x register) and DEY (decrement the y register) but I am unsure of the placement of the opcodes.
Description:
Add two n-byte integers using absolute indexed addressing
Adding two n-byte integers using absolute indexed addressing
The addends start at memory locations $xxxx, $yyyy, answer is at $zzzz
Byte length of the integers is at $AAAA (¢—>256)
START = $0500
CLC
____
loop LDA $0400, x
ADC $0410, x
STA $0412, x
____
BNE loop
BRK
LDA, ADC, and STA are outside the loop (first time using loops in assembly)
EDIT:
Variables
A1 = $0600
B1 = $0700
B2 = $0800
Z1 = $0900
[START] = $0500
CLC 18
LDX AE
LDY A1 AC
loop: LDA B1, x BD
ADC B2, x 7D
STA Z1, x 9D
INX E8
DEY 88
BNE loop D0
;Adding two n-byte integers using absolute indexed addressing
;The addends start at memory locations $xxxx, $yyyy, answer is at $zzzz
;Byte length of the integers is at $AAAA (¢—>256)
CLC
LDX #0 ; start at the beginning
LDY $AAAA ; load length into Y
loop: LDA $xxxx, X ; load first operand
ADC $yyyy, x ; add second operand
STA $zzzz, x ; store result
INX ; go on to next byte
DEY ; count how many are left
BNE loop ; if more, do more