I'm trying to figure out a way to implement the Fibonacci sequence using a 68HC11 IDE that uses a Motorolla as11 assembler.
I've done it using 2-byte unsigned in little-endian format, now I'm attempting to change it using 4-byte variables, using big-endian
My pseudo-code (which is written in c):
RESULT = 1;
PREV = 1;
COUNT = N;
WHILE(COUNT > 2){
NEXT = RESULT + PREV;
PREV = RESULT;
RESULT = NEXT;
COUNT--;
}
I'll include some of my current assembly code. Please note that count is set to unsigned int at 1-byte, and prev, next, and result are unsigned ints at 2 bytes. N is unsigned, set to 10.
ORG $C000
LDD #1
STD RESULT
STD PREV
LDAA N
STAA COUNT
WHILE LDAA COUNT
CMPA #2
BLS ENDWHILE
LDD RESULT
ADDD PREV
STD NEXT
LDD RESULT
STD PREV
LDD NEXT
STD RESULT
DEC COUNT
BRA WHILE
ENDWHILE
DONE BRA DONE
END
The issue that I'm having is now altering this (other than the obvious variable changes/declarations) N will begin at 40 now, not 10. Would altering my pseudo-code to include pointers allow me to implement it 1 to 1 better with big-endian? Since this is in little-endian, I assume I have to alter some of the branches. Yes this is an assignment for class, I'm not looking for the code, just some guidance would be nice.
Thank you!
(Your problem description is a bit vague as to what your actual problem is, so I may be guessing a bit.)
BTW, 68HC11 is big-endian.
The 68HC11 has a 16-bit accumulator, so as soon as your result overflows this, you need to do math operations in pieces.
I suppose you mean that by changing N from 10 to 40 your fibonacci number becomes too big to be stored in a 16-bit variable.
The use or not of pointers is irrelevant to your problem as you can solve it both with or without them. For example, you can use a pointer to tell your routine where to store the result.
Depending on your maximum expected result, you need to adjust your routine. I will assume you won't need to go over 32-bit result (N=47 => 2971215073).
Here's a partially tested but unoptimized possibility (using ASM11 assembler):
STACKTOP equ $1FF
RESET_VECTOR equ $FFFE
org $100 ;RAM
result rmb 4
org $d000 ;ROM
;*******************************************************************************
; Purpose: Return the Nth fibonacci number in result
; Input : HX -> 32-bit result
; : A = Nth number to calculate
; Output : None
; Note(s):
GetFibonacci proc
push ;macro to save D, X, Y
;--- define & initialize local variables
des:4 ;allocate 4 bytes on stack
tmp## equ 5 ;5,Y: temp number
ldab #1
pshb
clrb
pshb:3
prev## equ 1 ;1,Y: previous number (initialized to 1)
psha
n## equ 0 ;0,Y: N
;---
tsy ;Y -> local variables
clra
clrb
std ,x
std prev##,y
ldd #1
std 2,x
std prev##+2,y
Loop## ldaa n##,y
cmpa #2
bls Done##
ldd 2,x
addd prev##+2,y
std tmp##+2,y
ldaa 1,x
adca prev##+1,y
staa tmp##+1,y
ldaa ,x
adca prev##,y
staa tmp##,y
ldd ,x
std prev##,y
ldd 2,x
std prev##+2,y
ldd tmp##,y
std ,x
ldd tmp##+2,y
std 2,x
dec n##,y
bra Loop##
Done## ins:9 ;de-allocate all locals from stack
pull ;macro to restore D, X, Y
rts
;*******************************************************************************
; Test code
;*******************************************************************************
Start proc
ldx #STACKTOP ;setup our stack
txs
ldx #result
ldaa #40 ;Nth fibonacci number to get
bsr GetFibonacci
bra * ;check 'result' for answer
org RESET_VECTOR
dw Start
Related
I am practicing using arrays and loops and I am trying to have the user ENTER less than 100 characters in console to fill up my array. The user can press ENTER whenever they are done entering how ever many characters they want and the program will print out what they entered again.
The program works but I am wondering how the program checks to see if the user press ENTER.
I have it so the program will add #-10 to the inputted character and ENTER is x0A which is 10 in decimal. I'm assuming once the program detects this the result is 0 which if false and exits the loop. That is my thought process.
Also, how would I change my code to make it so I can have the exit character be anything?
.orig x3000
LD R1,DATA_PTR ;load the memory address of array into R1
DO_WHILE_LOOP
GETC ;read characters into R0
OUT ;print R0 onto console as ASCII
STR R0,R1, #0 ;stores into memory location in R1
ADD R1,R1, #1 ;increment to next memory address
ADD R0,R0,#-10 ;looks at inputted character and checks if its is ASCII #10
BRp DO_WHILE_LOOP
LD R0,newline
OUT
LD R1,DATA_PTR
DO_WHILE_LOOP2
LDR R0,R1,#0 ;load R1 into R0
OUT ;print
ADD R2,R0,#0 ;move R0 to R2
LD R0,newline ;newline
OUT ;print
ADD R1,R1,#1 ;increment
ADD R2,R2,#-10 ;check if printed character is enter ASCII #10
BRp DO_WHILE_LOOP2 ;if not print next character(loop)
HALT
;Data
DATA_PTR .FILL ARRAY ;DATA_PTR gets the beginning of the ARRAY
newline .FILL x0A
ARRAY .BLKW #100
.END
I'm assuming once the program detects this the result is 0 which if false and exits the loop.
It's not — that the result 0 here has meaning of "false" — but that the difference between the input character and 10 is 0 meaning it was exactly 0xA or 10(dec).
NB: the use of BRp can probably be considered a bug, though using usual simulators I've had trouble entering a character whose ascii value smaller than 10.
In high level language terms what it is saying is:
do { ... } while ( in > 10 );
Though using BRnp would mean:
do { ... } while ( in != 10 );
which is more specific to newline.
If you want a different terminal character, change the value subtracted to the value of another character.
LC-3 does not offer subtraction, but it can "add" a negative number. However, it cannot add a negative number smaller than -16 using the same immediate form of ADD. So, if you want to check for an ascii character larger than 16, you'll have to use the add register form instead, and use another instruction to load that register with the value, usually using a labeled constant, declared with .FILL and the value you want.
Instead of:
ADD R2,R2,#-10 ;check if printed character is enter ASCII #10
BRp DO_WHILE_LOOP
Do something like the following:
LD R3, value ; load value to subtract
ADD R2, R2, R3 ; subtract them
BRnp DO_WHILE_LOOP
...
...
value, .FILL #-65 ; letter A, negated.
In LC-3 the ADD instruction sets condition codes.
There are three condition codes, N, Z, and P, — N for negative, Z for zero, and P for positive. If you add zero to some register, as part of the addition operation, those three flags(condition codes) will be set as follows: N if the original value was negative, Z if the original value is zero, P if the original value is positive — so, < 0, = 0, > 0.
If we use ADD to add a non-zero value, here X (but in its negation, -X) to a register value, V, we get flags that tell us:
N = (V < X) i.e. N is true if V < X,
Z = (V = X) i.e. Z is true if V = X, and,
P = (V > X) i.e. P is true iv V > X
(all ignoring possibilities for overflow).
The BR instruction can then test flags as follows:
If you would like to change the program flow of control on:
relation
idea
Opcode
<
N
BRn
>=
not N
BRzp
=
Z
BRz
!=
not Z
BRnp
>
P
BRp
<=
not P
BRnz
I am currently working on a project that includes bare-metal programming on an stm-8 micro-controller using the SDCC compiler in linux. The memory in the chip is quite low so I'm trying to keep things really lean. I have gotten by with using 8-bit and 16-bit variables and things have gone well. But recently I ran into a problem were I really needed a float variable. So i wrote a function that takes in a 16-bit value converts to a float does the math I need and returns an 8-bit number. This cause my final compiled code on the MCU to go from 1198 Bytes to 3462 Bytes. Now I understand that using floating points is memory intensive and that many functions may need to be called to handle the use of the floating point number but it seems crazy to increase the size of the program by that much. I would like some help understanding why this is and what happened exactly.
Specs: MCU stm8151f2
Compiler: SDCC with --opt_code_size option
int roundNo(uint16_t bit_input)
{
float num = (((float)bit_input) - ADC_MIN)/124.0;
return num < 0 ? num - 0.5 : num + 0.5;
}
To determine why the code is so large on your particular tool chain, you would need to look at the generated assembly code, and see what FP support calls it makes, then look at the map file to determine the size of each of those functions.
As an example on Godbolt for AVR using GCC 5.4.0 with -Os (Godbolt does not support STM8 or SDCC so this is for comparison as a 8-bit architecture) your code generates 6364 bytes compared 4081 bytes for an empty function. So the additional code required for the code body is 2283 bytes. Now accounting for the fact that you are using both a different compiler and architecture, these are not that different from your results. See in the generated code (below) the rcalls to subroutines such as __divsf3 - these are where the bulk of the code will be, and I suspect FP division is by far the larger contributor.
roundNo(unsigned int):
push r12
push r13
push r14
push r15
mov r22,r24
mov r23,r25
ldi r24,0
ldi r25,0
rcall __floatunsisf
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,lo8(69)
rcall __subsf3
ldi r18,0
ldi r19,0
ldi r20,lo8(-8)
ldi r21,lo8(66)
rcall __divsf3
mov r12,r22
mov r13,r23
mov r14,r24
mov r15,r25
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,0
rcall __ltsf2
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,lo8(63)
sbrs r24,7
rjmp .L6
mov r25,r15
mov r24,r14
mov r23,r13
mov r22,r12
rcall __subsf3
rjmp .L7
.L6:
mov r25,r15
mov r24,r14
mov r23,r13
mov r22,r12
rcall __addsf3
.L7:
rcall __fixsfsi
mov r24,r22
mov r25,r23
pop r15
pop r14
pop r13
pop r12
ret
You need to perform the same analysis on the code generated by your tool chain to answer your question. No doubt SDCC is capable of generating an assembly listing and a map file which will allow you to determine exactly what code and FP support is being generated and linked.
Ultimately though your use of FP in this case is entirely unnecessary:
int roundNo(uint16_t bit_input)
{
int s = (bit_input - ADC_MIN) ;
s += s < 0 ? -62 : 62 ;
return s / 124 ;
}
At Godbolt 2283 bytes compared to an empty function. Still somewhat large, but the issue there most likely is that the AVR lacks a DIV instruction so calls __divmodhi4. STM8 has a DIV for 16 bit dividend and 8 bit divisor, so it will likely be significantly smaller (and faster) on your target.
OK, a version of fixed point that actually works:
// Assume a 28.4 format for math. 12.4 can be used, but roundoff may occur.
// Input should be a literal float (Note that the multiply here will be handled by the
// compiler and not generate FP asm code.
#define TO_FIXED(x) (int)((x * 16))
// Takes a fixed and converts to an int - should turn into a right shift 4.
#define TO_INT(x) (int)((x / 16))
typedef int FIXED;
const uint16_t ADC_MIN = 32768;
int roundNo(uint16_t bit_input)
{
FIXED num = (TO_FIXED(bit_input - ADC_MIN)) / 124;
num += num < 0 ? TO_FIXED(-0.5) : TO_FIXED(0.5);
return TO_INT(num);
}
int main()
{
printf("%d", roundNo(0));
return 0;
}
Note we are using some 32-bit values here so it will be bigger than your current values. With care though, it could possibly convert back to a 12.4 (16-bit int) instead if round off and overflow can be managed carefully.
Or go grab a better full feature Fixed Point library from the web :)
(Update) After writing this, I noticed that #Clifford mentioned that your microcontroller supports this DIV instruction natively, in which case doing this is redundant. Anyway, I will leave it as a concept which can be applied in cases where DIV is implemented as an extern call, or for cases where DIV takes too many cycles and the goal is to make the calculation faster.
Anyway, shifting and adding is likely to be faster than division, if you ever need to squeeze some extra cycles. So if you start from the fact that 124 is almost equal to 4096/33 (the error factor is 0.00098, i.e. 0.098%, so less than 1 in 1000), you can implement the division with a single multiplication with 33 and a shift by 12 bits (division by 4096). Furthermore, 33 is 32+1, meaning multiplying by 33 is equal to shifting left by 5 and adding the input again.
Example: you want to divide 5000 by 124, and 5000/124 is approx. 40.323. What we will be doing is:
5,000 << 5 = 160,000
160,000 + 5,000 = 165,000
165,000 >> 12 = 40
Note that this only works for positive numbers. Also note that, if you're really doing lots of multiplications all over the code, then having a single extern mul or div function might result in smaller overall code in the long run, especially if the compiler is not particularly good at optimizing. And if the compiler can just emit a DIV instruction here, then the only thing you can get is a tiny bit of speed improvement, so don't bother with this.
#include <stdint.h>
#define ADC_MIN 2048
uint16_t roundNo(uint16_t bit_input)
{
// input too low, return zero
if (bit_input < ADC_MIN)
return 0;
bit_input -= (ADC_MIN - 62);
uint32_t x = bit_input;
// this gets us x = x * 33
x <<= 5;
x += bit_input;
// this gets us x = x / 4096
x >>= 12;
return (uint16_t)x;
}
GCC AVR with size optimizations produces this, i.e. all calls to extern mul or div functions are gone, but it seems like AVR doesn't support shifting multiple bits in a single instruction (it emits loops which shift 5 times and 12 times respectively). I don't have a clue what your compiler will do.
If you also need to handle the bit_input < ADC_MIN case, I would handle this part separately, i.e.:
#include <stdint.h>
#include <stdbool.h>
#define ADC_MIN 2048
int16_t roundNo(uint16_t bit_input)
{
// if subtraction would result in a negative value,
// handle it properly
bool negative = (bit_input < ADC_MIN);
bit_input = negative ? (ADC_MIN - bit_input) : (bit_input - ADC_MIN);
// we are always positive from this point on
bit_input -= (ADC_MIN - 62);
uint32_t x = bit_input;
x <<= 5;
x += bit_input;
x >>= 12;
return negative ? -(int16_t)x : (int16_t)x;
}
I'm trying to learn HCS12 assembly language but there are no enough examples on the internet. I've tried to write a code but there is no success. I'm stuck. It's not absolutely homework. Can someone write it in HCS12 assembly language with comments? I want code because really I want to read it step by step. By the way, is there any other way more simple to define array?
;The array arr will be located at $1500 and the contents {2, 5, 6, 16, 100, 29, 60}
sum = 0;
for i = 0 : 6
x = arr[i];
if( x < 50 )
sum = sum + x
end
My try:
Entry:
;2,5,6,16,100,39,60
LDAA #2
STAA $1500
LDAA #5
STAA $1501
LDAA #6
STAA $1502
LDAA #16
STAA $1503
LDAA #100
STAA $1504
LDAA #39
STAA $1505
LDAA #60
STAA $1506
CLRA ; 0 in accumulator A
CLRB ; 0 in accumulator B
ADDB COUNT ; B accumulator has 6
loop:
;LDAA 1, X+ ; 1500 should be x because it should increase up to 0 from 6
; A accumulator has 2 now
BLO 50; number less than 50
;ADDA
DECB
BNE loop
Below is one possible way to implement your specific FOR loop.
It's mostly for the HC11 which is source level compatible to the HCS12 so it should also assemble correctly for the HCS12. However, the HCS12 has some extra instructions and addressing modes (e.g., the indexed auto-increment) which can make the code a bit shorter and even more readable. Anyway, I haven't actually tried this but it should be OK.
BTW, your code shows you have some fundamental lack of understanding for certain instructions. For example, BLO 50 does not mean branch if accumulator is below 50. It means check the appropriate CCR (Condition Code Register) flags which should be already set by some previous instruction, and branch to address 50 (obviously, not what you intended) if the value is less than the target. To compare a register to a value or some memory location you must use the CMPx instructions (e.g., CMPA).
;The array arr will be located at $1500 and the contents {2, 5, 6, 16, 100, 29, 60}
org $1500 ;(somewhere in ROM)
arr fcb 2,5,6,16,100,29,60 ;as bytes (use dw if words)
org $100 ;wherever your RAM is
;sum = 0;
sum rmb 2 ;16-bit sum
org $8000 ;wherever you ROM is
;for i = 0 : 6
clrb ;B is your loop counter (i)
stb sum ;initialize sum to zero (MSB)
stb sum+1 ; -//- (LSB)
ForLoop cmpb #6 ;compare against terminating value
bhi ForEnd ;if above, exit FOR loop
; x = arr[i];
ldx #arr ;register X now points to array
abx ;add offset to array element (byte size assumed)
ldaa ,x ;A is your target variable (x)
;;;;;;;;;;;;;;;;;;; ldaa b,x ;HCS12 only version (for the above two HC11-compatible lines)
inx ;X points to next value for next iteration
;;;;;;;;;;;;;;;;;;; ldaa 1,x+ ;HCS12 only version (for the above two HC11-compatible lines)
; if( x < 50 )
cmpa #50
bhs EndIf
; sum = sum + x
adda sum+1
staa sum+1
ldaa sum
adca #0
staa sum
EndIf
incb ;(implied i = i + 1 at end of loop)
bra ForLoop
;end
ForEnd
The above assumes your array is constant, so it is placed somewhere in ROM at assembly time. If your array is dynamic, it should be located in RAM, and you would need to use code to load it (similar to how you did). However, for efficiency, a loop is usually used when loading (copying) multiple values from one location to another. This is both more readable and more efficient in terms of needed code memory.
Hope this helps.
Edited: Forgot to initialize SUM to zero.
Edited: Unlike in the HC08, a CLRA in HC11 clears the Carry so the sequence CLRA, ADCA is wrong. Replaced with correct one: LDAA, ADCA #0
NOTE This is a theoretical question. I'm happy with the performance of my actual code as it is. I'm just curious about whether there is an alternative.
Is there a trick to do an integer division of a constant value, which is itself an integer power of two, by an integer variable value, without having to use do an actual divide operation?
// The fixed value of the numerator
#define SIGNAL_PULSE_COUNT 0x4000UL
// The division that could use a neat trick.
uint32_t signalToReferenceRatio(uint32_t referenceCount)
{
// Promote the numerator to a 64 bit value, shift it left by 32 so
// the result has an adequate number of bits of precision, and divide
// by the numerator.
return (uint32_t)((((uint64_t)SIGNAL_PULSE_COUNT) << 32) / referenceCount);
}
I've found several (lots) of references for tricks to do division by a constant, both integer and floating point. For example, the question What's the fastest way to divide an integer by 3? has a number of good answers including references to other academic and community materials.
Given that the numerator is constant, and it's an integer power of two, is there a neat trick that could be used in place of doing an actual 64 bit division; some kind of bit-wise operation (shifts, AND, XOR, that kind of stuff) or similar?
I don't want any loss of precision (beyond a possible half bit due to integer rounding) greater than that of doing the actual division, as the precision of the instrument relies on the precision of this measurement.
"Let the compiler decide" is not an answer, because I want to know if there is a trick.
Extra, Contextual Information
I'm developing a driver on a 16 bit data, 24 bit instruction word micro-controller. The driver does some magic with the peripheral modules to obtain a pulse count of a reference frequency for a fixed number of pulses of a signal frequency. The required result is a ratio of the signal pulses to the reference pulse, expressed as an unsigned 32 bit value. The arithmetic for the function is defined by the manufacturer of the device for which I'm developing the driver, and the result is processed further to obtain a floating point real-world value, but that's outside the scope of this question.
The micro-controller I'm using has a Digital Signal Processor that has a number of division operations that I could use, and I'm not afraid to do so if necessary. There would be some minor challenges to overcome with this approach, beyond the putting together the assembly instructions to make it work, such as the DSP being used to do a PID function in a BLDC driver ISR, but nothing I can't manage.
You cannot use clever mathematical tricks to not do a division, but you can of course still use programming tricks if you know the range of your reference count:
Nothing beats a pre-computed lookup table in terms of speed.
There are fast approximate square root algorithms (probably already in your DSP), and you can improve the approximation by one or two Newton-Raphson iterations. If doing the computation with floating-point numbers is accurate enough for you, you can probably beat a 64bit integer division in terms of speed (but not in clarity of code).
You mentioned that the result will be converted to floating-point later, it might be beneficial to not compute the integer division at all, but use your floating point hardware.
I worked out a Matlab version, using fixed point arithmetic.
This method assumes that a integer version of log2(x) can be calculated efficiently, which is true for dsPIC30/33F and TI C6000 that have instruction to detect the most significant 1 of an integer.
For this reason, this code has strong ISA depency and can not be written in portable/standard C and can be improved using instructions like multiply-and-add, multiply-and-shift, so I won't try translating it to C.
nrdiv.m
function [ y ] = nrdiv( q, x, lut)
% assume q>31, lut = 2^31/[1,1,2,...255]
p2 = ceil(log2(x)); % available in TI C6000, instruction LMBD
% available in Microchip dsPIC30F/33F, instruction FF1L
if p2<8
pre_shift=0;
else
pre_shift=p2-8;
end % shr = (p2-8)>0?(p2-8):0;
xn = shr(x, pre_shift); % xn = x>>pre_shift;
y = shr(lut(xn), pre_shift); % y = lut[xn]>pre_shift;
y = shr(y * (2^32 - y*x), 30); % basic iteration
% step up from q31 to q32
y = shr(y * (2^33 - y*x), (64-q)); % step up from q32 to desired q
if q>39
y = shr(y * (2^(1+q) - y*x), (q)); % when q>40, additional
% iteration is required,
end % no step up is performed
end
function y = shr(x, r)
y=floor(x./2^r); % simulate operator >>
end
test.m
test_number = (2^22-12345);
test_q = 48;
lut_q31 = round(2^31 ./ [1,[1:1:255]]);
display(sprintf('tested 2^%d/%d, diff=%f\n',test_q, test_number,...
nrdiv( 39, (2^22-5), lut_q31) - 2^39/(2^22-5)));
sample output
tested 2^48/4181959, diff=-0.156250
reference:
Newton–Raphson division
A little late but here is my solution.
First some assumptions:
Problem:
X=N/D where N is a constant ans a power of 2.
All 32 bit unsigned integers.
X is unknown but we have a good estimate
(previous but no longer accurate solution).
An exact solution is not required.
Note: due to integer truncation this is not an accurate algorithm!
An iterative solution is okay (improves with each loop).
Division is much more expensive than multiplication:
For 32bit unsigned integer for Arduino UNO:
'+/-' ~0.75us
'*' ~3.5us
'/' ~36us 4 We seek to replace the Basically lets start with Newton's method:
Xnew=Xold-f(x)/(f`(x)
where f(x)=0 for the solution we seek.
Solving this I get:
Xnew=XNew*(C-X*D)/N
where C=2*N
First trick:
Now that the Numerator (constant) is now a Divisor (constant) then one solution here (which does not require the N to be a power of 2) is:
Xnew=XNew*(C-X*D)*A>>M
where C=2*N, A and M are constants (look for dividing by a constant tricks).
or (staying with Newtons method):
Xnew=XNew*(C-X*D)>>M
where C=2>>M where M is the power.
So I have 2 '*' (7.0us), a '-' (0.75us) and a '>>' (0.75us?) or 8.5us total (rather than 36us), excluding other overheads.
Limitations:
As the data type is 32 bit unsigned, 'M' should not exceed 15 else there will be problems with overflow (you can probably get around this using a 64bit intermediate data type).
N>D (else the algorithm blows up! at least with unsigned integer)
Obviously the algorithm will work with signed and float data types)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void)
{
unsigned long c,d,m,x;
// x=n/d where n=1<<m
m=15;
c=2<<m;
d=10;
x=10;
while (true)
{
x=x*(c-d*x)>>m;
printf("%ld",x);
getchar();
}
return(0);
}
Having tried many alternatives, I ended up doing normal binary long division in assembly language. However, the routine does use a few optimisations that bring the execution time down to an acceptable level.
/*
* Converts the reference frequency count for a specific signal frequency
* to a ratio.
* Xs = Ns * 2^32 / Nr
* Where:
* 2^32 is a constant scaling so that the maximum accuracy can be achieved.
* Ns is the number of signal counts (fixed at 0x4000 by hardware).
* Nr is the number of reference counts, passed in W1:W0.
* #param W1:W0 The number of reference frequency pulses.
* #return W1:W0 The scaled ratio.
*/
.align 2
.global _signalToReferenceRatio
.type _signalToReferenceRatio, #function
; This is the position of the most significant bit of the fixed Ns (0x4000).
.equ LOG2_DIVIDEND, 14
.equ DIVISOR_LIMIT, LOG2_DIVIDEND+1
.equ WORD_SIZE, 16
_signalToReferenceRatio:
; Create a dividend, MSB-aligned with the divisor, in W2:W3 and place the
; number of iterations required for the MSW in [W14] and the LSW in [W14+2].
LNK #4
MUL.UU W2, #0, W2
FF1L W1, W4
; If MSW is zero the argument is out of range.
BRA C, .returnZero
SUBR W4, #WORD_SIZE, W4
; Find the number of quotient MSW loops.
; This is effectively 1 + log2(dividend) - log2(divisor).
SUBR W4, #DIVISOR_LIMIT, [W14]
BRA NC, .returnZero
; Since the SUBR above is always non-negative and the C flag set, use this
; to set bit W3<W5> and the dividend in W2:W3 = 2^(16+W5) = 2^log2(divisor).
BSW.C W3, W4
; Use 16 quotient LSW loops.
MOV #WORD_SIZE, W4
MOV W4, [W14+2]
; Set up W4:W5 to hold the divisor and W0:W1 to hold the result.
MOV.D W0, W4
MUL.UU W0, #0, W0
.checkLoopCount:
; While the bit count is non-negative ...
DEC [W14], [W14]
BRA NC, .nextWord
.alignQuotient:
; Shift the current quotient word up by one bit.
SL W0, W0
; Subtract divisor from the current dividend part.
SUB W2, W4, W6
SUBB W3, W5, W7
; Check if the dividend part was less than the divisor.
BRA NC, .didNotDivide
; It did divide, so set the LSB of the quotient.
BSET W0, #0
; Shift the remainder up by one bit, with the next zero in the LSB.
SL W7, W3
BTSC W6, #15
BSET W3, #0
SL W6, W2
BRA .checkLoopCount
.didNotDivide:
; Shift the next (zero) bit of the dividend into the LSB of the remainder.
SL W3, W3
BTSC W2, #15
BSET W3, #0
SL W2, W2
BRA .checkLoopCount
.nextWord:
; Test if there are any LSW bits left to calculate.
MOV [++W14], W6
SUB W6, #WORD_SIZE, [W14--]
BRA NC, .returnQ
; Decrement the remaining bit counter before writing it back.
DEC W6, [W14]
; Move the working part of the quotient up into the MSW of the result.
MOV W0, W1
BRA .alignQuotient
.returnQ:
; Return the quotient in W0:W1.
ULNK
RETURN
.returnZero:
MUL.UU W0, #0, W0
ULNK
RETURN
.size _signalToReferenceRatio, .-_signalToReferenceRatio
I'm trying to copy array A into array N and then print the array (to test that it has worked) but all it outputs is -1
Here is my code:
ORG $1000
START: ; first instruction of program
clr.w d1
movea.w #A,a0
movea.w #N,a2
move.w #6,d2
for move.w (a0)+,(a2)+
DBRA d2,for
move.w #6,d2
loop
move.l (a2,D2),D1 ; get number from array at index D2
move.b #3,D0 ; display number in D1.L
trap #15
dbra d2,loop
SIMHALT ; halt simulator
A dc.w 2,2,3,4,5,6
N dc.l 6
END START ; last line of source
Why is -1 in the output only? If there is a better solution for this that would be very helpful
Since I don't have access to whatever assembler/simulator you're using, I can't actually test it, but here a few things (some of which are already noted in the comments):
dc.l declares a single long, you want ds.l (or similar) to allocate storage for 6 longs
dbra branches until the operand is equal to -1, so you'll probably want to turn
movw #loop_times, d0
loop
....
dbra d0, loop
into
movw #loop_times-1, d0
loop
....
dbra d0, loop
(this works as long as loop_times is > 0, otherwise you'll have to check the condition before entering the loop)
You display loop has a few problems: 1. On entry a2 points past the end of the N array. 2. Even fixing that, the way you're indexing it will cause problems. On the first entry you're trying to fetch a 4-byte long from address a2 + 6,then a long from a2 + 5...
What you want is to fetch longs from address a2 + 0, a2 + 4 .... One way of doing that:
move.w #6-1, d2 ; note the -1
movea.l #N, a2
loop
move.l (a2)+,D1 ; get next number from array
; use d1 here
dbra d2,loop
As already pointed out, your new array is only 4 bytes in size, you should change
dc.l 6 to ds.w 6
and also you work on 7 elements, since DBRA counts down to -1.
Second, and thats why you get -1 everywhere, you use A2 as pointer to the new array, but you do not reset it to point at the first word in new array. Since you increased it by one word per element during the copy, after the for loop has completed, A2 points to the first word after the array.
Your simulator outputting more than one number with your display loop indicates that your simulator does not emulate an MC68000, a real MC68000 would take a trap at "MOVE.L (A2,D2),D1" as soon as the sum of A2+D2 is odd - the 68000 does not allow W/L sized accesses to odd addresses (MC68020 and higher do).
A cleaned MC68000 compatible code could look like this:
lea A,a0
lea N,a2
moveq #5,d2
for move.w (a0)+,(a2)+
dbra d2,for
lea N,a2
moveq #5,d2
loop
move.w (a2)+,D1 ; get number (16 bits only)
ext.l d1 ; make the number 32 bits
moveq #3,D0 ; display number in D1.L
trap #15
dbra d2,loop
It probably contains some instructions you haven't encountered yet.