Harmonic Average Yields incorrect Result - c

As a homework assignment we are required to calculate the harmonic mean using an assembly program being driven by a C program.
We are using 64-bit linux machines and are required to use 64-bit floating point numbers.
I am new to Assembly. I apologize for any bad coding practices or if my code is just flat out wrong.
The problem with the code is the result returns only the last number entered in floating-point format. I do not know where the error occurs, although I believe it to lie in the addDen function.
As an example: If you were to enter the numbers 5, 6, 7, 8 the result would return 8.0000.
Here is my code for the assembly program:
;Assembly function that computs the harmonic mean
;of an array of 64-bit floating-point numbers.
;Retrieves input using a C program.
;
;Harmonic mean is defined as Sum(n/((1/x1) + (1/x2) + ... + (1/xn)))
;
; expects:
; RDI - address of array
; RSI - length of the array
; returns
; XMMO - the harmonic average of array's values
global harmonicMean
section .data
Zero dd 0.0
One dd 1.0
section .text
harmonicMean:
push rbp
mov rbp, rsp ;C prologue
movss xmm10, [Zero] ;Holds tally of denominator
cvtsi2ss xmm0, rsi ;Take length and put it into xmm0 register
.whileLoop:
cmp rsi, 0 ;Is the length of array 0?
je .endwhile
call addDen ;Compute a denominator value and add it to sum
add rdi, 4 ;Add size of float to address
dec rsi ;Decrease the length
jmp .whileLoop
.endwhile:
divss xmm0, xmm10
leave
ret
;Calculates a number in the denominator
addDen:
push rdi
movss xmm8, [One]
movss xmm9, [rdi]
divss xmm8, xmm9
addss xmm10, xmm8
pop rdi
ret
In order to recreate the logic error, i will also include my driver program:
/*
* Harmonic Mean Driver
* Tyler Weaver
* 03-12-2014
*/
#include<stdio.h>
#define ARRAYSIZE 4
double harmonicMean(double *, unsigned);
int main(int argc, char **argv) {
int i;
double ary[ARRAYSIZE];
double hm;
printf("Enter %d f.p. values: ", ARRAYSIZE);
for (i = 0; i < ARRAYSIZE; i++) {
scanf(" %lf", &ary[i]);
}
hm = harmonicMean(ary, ARRAYSIZE);
printf("asm: harmonic mean is %lf\n", hm);
return 0;
}
Any help will be much appreciated!

Yes there seems to be float vs double confusion. You pass in a double array, but pretty much all of the asm code expects floats: you use the ss instructions and you assume size 4 and you return a float too.
– Jester
There was an issue with floats and doubles! I really appreciate both of your responses. I was confused because the instructor had told us to use floats in our assembly program he had used doubles in an example driver. I spoke with the instructor and he had fixed his instructions. I thank you again! – Tyler Weaver

here is the algorithm, is a mix between C and pseudo code
My suggestion is to write this program in C.
Then have the compiler output the related asm language
then use that asm output as a guide in writing your own program
! ----------------------------------------------------------
! This program reads a series of input data values and
! computes their arithmetic, geometric and harmonic means.
! Since geometric mean requires taking n-th root, all input
! data item must be all positive (a special requirement of
! this program , although it is not absolutely necessary).
! If an input item is not positive, it should be ignored.
! Since some data items may be ignored, this program also
! checks to see if no data items remain!
! ----------------------------------------------------------
PROGRAM ComputingMeans
IMPLICIT NONE
REAL :: X
REAL :: Sum, Product, InverseSum
REAL :: Arithmetic, Geometric, Harmonic
INTEGER :: Count, TotalNumber, TotalValid
Sum = 0.0 ! for the sum
Product = 1.0 ! for the product
InverseSum = 0.0 ! for the sum of 1/x
TotalValid = 0 ! # of valid items
READ(*,*) TotalNumber ! read in # of items
DO Count = 1, TotalNumber ! for each item ...
READ(*,*) X ! read it in
WRITE(*,*) 'Input item ', Count, ' --> ', X
IF (X <= 0.0) THEN ! if it is non-positive
WRITE(*,*) 'Input <= 0. Ignored' ! ignore it
ELSE ! otherwise,
TotalValid = TotalValid + 1 ! count it in
Sum = Sum + X ! compute the sum,
Product = Product * X ! the product
InverseSum = InverseSum + 1.0/X ! and the sum of 1/x
END IF
END DO
IF (TotalValid > 0) THEN ! are there valid items?
Arithmetic = Sum / TotalValid ! yes, compute means
Geometric = Product**(1.0/TotalValid)
Harmonic = TotalValid / InverseSum
WRITE(*,*) 'No. of valid items --> ', TotalValid
WRITE(*,*) 'Arithmetic mean --> ', Arithmetic
WRITE(*,*) 'Geometric mean --> ', Geometric
WRITE(*,*) 'Harmonic mean --> ', Harmonic
ELSE ! no, display a message
WRITE(*,*) 'ERROR: none of the input is positive'
END IF
END PROGRAM ComputingMeans

Related

C Float with a basic integer value giving different results

Okay i have a simple question . In my adventure i seek the largest numbers can hold in data types and i was trying things like long int , doubles and floats etc.
But in the simplest assigns such as Float x = 12345789 , it gives me 123456792 as a output .
Here's the code
#include <stdio.h>
int main()
{
int x = 1234567891 ;
long int y = 9034567891234567899;
long long int z = 9034567891234567891;
float t = 123456789 ;
printf("%i \n%li \n%lli \n%f \n ",x,y,z,t);
}
and the output im getting is
1234567891
9034567891234567899
9034567891234567891
123456792.000000
im coding on a linux and using gcc. What could be the problem ?
For clearity , if you give a higher number like
float t = 123456789123456789
it will get the first 9 right but somekind of rounding in last numbers where it should not .
1234567890519087104.000000
İ could have understand it if i was working beyond 0 like 0.00123 but its just straight on integers just to find out limits of float.
As a visual and experiential learner, I would recommend you to take a good look at how floating point number is represented in the world of bits with a little help of some online converter such as https://www.h-schmidt.net/FloatConverter/IEEE754.html
Value: 123456789
Hexadecimal representation: 0x4ceb79a3
Binary representation: 01001100111010110111100110100011
sign (0) : +1
exponent(10011001) : 2^26
mantissa(11010110111100110100011): 1.8396495580673218
Value actually stored in float: 1.8396495580673218 * 2^26 = 123456792
Error due to conversion: 3
float_converter_image
Here is a closer look on how the compiler actually does its job: https://gcc.godbolt.org/z/C4YyKe
int main()
{
float t = 123456789;
}
main:
push rbp
mov rbp, rsp
movss xmm0, DWORD PTR .LC0[rip]
movss DWORD PTR [rbp-4], xmm0
mov eax, 0
pop rbp
ret
.LC0:
.long 1290500515 //(0x4CEB79A3)
compiler_explorer_image
For your adventure seeking the largest numbers of each data types, I guess your can explore standard header files such as float.h and limits.h.
To find the largest contiguous integer value that can be round-tripped from integer to float to integer, the following experiment could be used:
#include <stdio.h>
int main()
{
long i = 0 ;
float fint = 0 ;
while( i == (long)fint )
{
i++ ;
fint = (float)i ;
}
printf( "Largest integer representable exactly by float = %ld\n", i - 1 ) ;
return 0;
}
However the experiment is largely unnecessary, since the value is predictably 224 since 23 is the number of bits in the float mantissa.

Why is using a third variable faster than an addition trick?

When computing fibonacci numbers, a common method is mapping the pair of numbers (a, b) to (b, a + b) multiple times. This can usually be done by defining a third variable c and doing a swap. However, I realised you could do the following, avoiding the use of a third integer variable:
b = a + b; // b2 = a1 + b1
a = b - a; // a2 = b2 - a1 = b1, Ta-da!
I expected this to be faster than using a third variable, since in my mind this new method should only have to consider two memory locations.
So I wrote the following C programs comparing the processes. These mimic the calculation of fibonacci numbers, but rest assured I am aware that they will not calculate the correct values due to size limitations.
(Note: I realise now that it was unnecessary to make n a long int, but I will keep it as it is because that is how I first compiled it)
File: PlusMinus.c
// Using the 'b=a+b;a=b-a;' method.
#include <stdio.h>
int main() {
long int n = 1000000; // Number of iterations.
long int a,b;
a = 0; b = 1;
while (n--) {
b = a + b;
a = b - a;
}
printf("%lu\n", a);
}
File: ThirdVar.c
// Using the third-variable method.
#include <stdio.h>
int main() {
long int n = 1000000; // Number of iterations.
long int a,b,c;
a = 0; b = 1;
while (n--) {
c = a;
a = b;
b = b + c;
}
printf("%lu\n", a);
}
When I run the two with GCC (no optimisations enabled) I notice a consistent difference in speed:
$ time ./PlusMinus
14197223477820724411
real 0m0.014s
user 0m0.009s
sys 0m0.002s
$ time ./ThirdVar
14197223477820724411
real 0m0.012s
user 0m0.008s
sys 0m0.002s
When I run the two with GCC with -O3, the assembly outputs are equal. (I suspect I had confirmation bias when stating that one just outperformed the other in previous edits.)
Inspecting the assembly for each, I see that PlusMinus.s actually has one less instruction than ThirdVar.s, but runs consistently slower.
Question
Why does this time difference occur? Not only at all, but also why is my addition/subtraction method slower contrary to my expectations?
Why does this time difference occur?
There is no time difference when compiled with optimizations (under recent versions of gcc and clang). For instance, gcc 8.1 for x86_64 compiles both to:
Live at Godbolt
.LC0:
.string "%lu\n"
main:
sub rsp, 8
mov eax, 1000000
mov esi, 1
mov edx, 0
jmp .L2
.L3:
mov rsi, rcx
.L2:
lea rcx, [rdx+rsi]
mov rdx, rsi
sub rax, 1
jne .L3
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
add rsp, 8
ret
Not only at all, but also why is my addition/subtraction method slower contrary to my expectations?
Adding and subtracting could be slower than just moving. However, in most architectures (e.g. a x86 CPU), it is basically the same (1 cycle plus the memory latency); so this does not explain it.
The real problem is, most likely, the dependencies between the data. See:
b = a + b;
a = b - a;
To compute the second line, you have to have finished computing the value of the first. If the compiler uses the expressions as they are (which is the case under -O0), that is what the CPU will see.
In your second example, however:
c = a;
a = b;
b = b + c;
You can compute both the new a and b at the same time, since they do not depend on each other. And, in a modern processor, those operations can actually be computed in parallel. Or, putting it another way, you are not "stopping" the processor by making it wait on a previous result. This is called Instruction-level parallelism.

How do I make scanf() scan in integers one digit at a time?

I want to scan in a date in the form of mm/dd, so I've written this code:
#include <stdio.h>
int main (void)
{
int start_date[4];
scanf("%d%d/%d%d", &start_date[0], &start_date[1], &start_date[2], &start_date[3]);
printf("%d%d%d%d\n", start_date[0], start_date[1], start_date[2], start_date[3]);
return 0;
}
But when I enter the following for example:
04/20
scanf() reads this
4-419644000
How do I make it so that when I print out each element in start_date, I get this:
0420
when I enter the input from earlier?
Reading one digit at a time
Use %1d:
int n = scanf("%1d%1d/%1d%1d", &start_date[0], &start_date[1], &start_date[2], &start_date[3]);
if (n != 4) { …handle error… }
Note that this will accept both:
19/96
and
1 9/
9
6
as valid inputs. If you need the digits to be contiguous, you have to work harder. In general, it is usually best to read a line with fgets() (or POSIX's
getline()) and then parse the line with sscanf(). You can also consider checking the length of the string, etc.
What went wrong?
Incidentally, you said:
But when I enter the following for example:
04/20
scanf() reads this
4-419644000
What happens here is that the 04 is read by the first %d; the second %d fails because / doesn't match a number, so nothing is written to &start_date[1] (or the other two items), and scanf() returns 1. Since you didn't check the return value, you weren't aware of the problem. Note that the check should be as I showed (n != 4 where 4 is the number of items you expect to be converted). Checking for EOF would not work correctly; there wasn't an EOF on the file, but there was a conversion failure. Since you printed an uninitialized variable, the value you got was indeterminate ('undefined behaviour' in the jargon); a largish negative number is reasonable (as is any other value whatsoever, or a crash, or …). In fact, you have three indeterminate numbers smushed together, of which only the first is negative. Avoid undefined behaviour; always check that your input operations succeed — and don't use the results if they fail. (Or, at least, be very cautious about using the results if they fail; you need to be sure you know what's going on.)
The %d would consume the string "04", if you want to input digit by digit really, try %c, then adjust them to digit:
char start_data[4];
scanf("%c%c/%c%c", &start_date[0], &start_date[1], &start_date[2], &start_date[3]);
start_data[0] -= '0';
start_data[1] -= '0';
start_data[2] -= '0';
start_data[3] -= '0';
Another way, you can just input mm and dd and do arithmetic on it:
int mm, dd;
scanf("%d/%d", &mm, &dd);
int m0 = mm % 10;
int m1 = mm / 10;
int d0 = dd % 10;
int d1 = dd / 10;
Change the data type to char[], so your code look like this
#include <stdio.h>
int main (void)
{
char start_date[4];
scanf("%c%c/%c%c", &start_date[0], &start_date[1], &start_date[2], &start_date[3]);
printf("%c%c%c%c\n", start_date[0], start_date[1], start_date[2], start_date[3]);
return 0;
}
If you want to change the data type of each characters to integers you can change by look into the ASCII code then substract the characters with ASCII '0'
I think you should use strptime() instead. It's purpose-built for parsing standard date formats such as yours. Just read the entire "word" as a string and feed it to strptime().
This was too long for a comment, but you can see why this is happening here:
0x080484a7 <+58>: call 0x8048360 <__isoc99_scanf#plt>
0x080484ac <+63>: mov 0x2c(%esp),%ebx
0x080484b0 <+67>: mov 0x28(%esp),%ecx
0x080484b4 <+71>: mov 0x24(%esp),%edx
0x080484b8 <+75>: mov 0x20(%esp),%eax
0x080484bc <+79>: mov %ebx,0x10(%esp)
...
End of assembler dump.
(gdb) break *main+79
Breakpoint 4 at 0x80484bc
(gdb) r
Starting program: /root/test/testcode
04/21
Breakpoint 4, 0x080484bc in main ()
(gdb) i r
eax 0x4 4
ecx 0x80484fb 134513915
edx 0xb7fff000 -1207963648
ebx 0xb7fcc000 -1208172544
esp 0xbffff720 0xbffff720
ebp 0xbffff758 0xbffff758
esi 0x0 0
edi 0x0 0
eip 0x80484bc 0x80484bc <main+79>
eflags 0x286 [ PF SF IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) c
Continuing.
4-1207963648134513915-1208172544
Here's whats happening:
mov 0x2c(%esp),%ebx
mov 0x28(%esp),%ecx
mov 0x24(%esp),%edx
mov 0x20(%esp),%eax
These four mov statements are preparation for the printf() call. Each %d corresponds to one of the registers: eax ebx ecx edx
You can see that it is moving four bytes into each register. This is the reason you see gibberish. Each %d getting printed is expecting a 4-byte integer.
I set a breakpoint after all four of those mov statements, so lets look at what they contain:
eax 0x4 4
ecx 0x80484fb 134513915
edx 0xb7fff000 -1207963648
ebx 0xb7fcc000 -1208172544
This doesn't look right at all, but it does indeed correspond to what's on the stack:
0xbffff740: 0x00000004 0xb7fff000 0x080484fb 0xb7fcc000
These are the four values that wind up in the registers.
The first one is actually your 04 combined. It interpreted 04 as being within the bounds of %d, which is a 4-byte integer on my machine.
Since you tried to grab four %d's, the program continues as it should and prints out four %ds. In other words, it prints out four 4-byte integers.
The reason this is happening is what Jonathan Leffler explained: scanf is gobbling up the 04 into the first %d, then breaking on the /
You still have a printf() of %d%d%d%d though, so the program dutifully prints out the remaining 4-byte decimal values. Since scanf failed, the remaining 3 %ds contain whatever happened to be on the stack.
Unless you have another reason for using an array to store the individual digits, I suggest you just store the month and day as variables.
Also, because you know that there is a separator, accomodate it in your code. It is simpler.
This is the code you are looking for.
#include <stdio.h>
int main (void)
{
int month, day;
char sep;
scanf("%d%c%d", &month, &sep, &day);
printf("%d of %d\n", day, month);
return 0;
}
Explanation for the printing (Printing leading 0's in C):
The 0 indicates what you are padding with and the 5 shows the width of the integer number.
Example 1: If you use "%02d" (useful for dates) this would only pad zeros for numbers in the ones column. E.g., 06 instead of 6.
Example 2: "%03d" would pad 2 zeros for one number in the ones column and pad 1 zero for a number in the tens column. E.g., number 7 padded to 007 and number 17 padded to 017.
You could have also taken the data in as a string using %5s and further extract that data from the array of characters (string). Here is a good resource with examples. -> https://cplusplus.com/reference/cstdio/scanf/
I do feel that for your solution it would still have been an overkill, so I beleive the example code provided above is sufficient.

Trick to divide a constant (power of two) by an integer

NOTE This is a theoretical question. I'm happy with the performance of my actual code as it is. I'm just curious about whether there is an alternative.
Is there a trick to do an integer division of a constant value, which is itself an integer power of two, by an integer variable value, without having to use do an actual divide operation?
// The fixed value of the numerator
#define SIGNAL_PULSE_COUNT 0x4000UL
// The division that could use a neat trick.
uint32_t signalToReferenceRatio(uint32_t referenceCount)
{
// Promote the numerator to a 64 bit value, shift it left by 32 so
// the result has an adequate number of bits of precision, and divide
// by the numerator.
return (uint32_t)((((uint64_t)SIGNAL_PULSE_COUNT) << 32) / referenceCount);
}
I've found several (lots) of references for tricks to do division by a constant, both integer and floating point. For example, the question What's the fastest way to divide an integer by 3? has a number of good answers including references to other academic and community materials.
Given that the numerator is constant, and it's an integer power of two, is there a neat trick that could be used in place of doing an actual 64 bit division; some kind of bit-wise operation (shifts, AND, XOR, that kind of stuff) or similar?
I don't want any loss of precision (beyond a possible half bit due to integer rounding) greater than that of doing the actual division, as the precision of the instrument relies on the precision of this measurement.
"Let the compiler decide" is not an answer, because I want to know if there is a trick.
Extra, Contextual Information
I'm developing a driver on a 16 bit data, 24 bit instruction word micro-controller. The driver does some magic with the peripheral modules to obtain a pulse count of a reference frequency for a fixed number of pulses of a signal frequency. The required result is a ratio of the signal pulses to the reference pulse, expressed as an unsigned 32 bit value. The arithmetic for the function is defined by the manufacturer of the device for which I'm developing the driver, and the result is processed further to obtain a floating point real-world value, but that's outside the scope of this question.
The micro-controller I'm using has a Digital Signal Processor that has a number of division operations that I could use, and I'm not afraid to do so if necessary. There would be some minor challenges to overcome with this approach, beyond the putting together the assembly instructions to make it work, such as the DSP being used to do a PID function in a BLDC driver ISR, but nothing I can't manage.
You cannot use clever mathematical tricks to not do a division, but you can of course still use programming tricks if you know the range of your reference count:
Nothing beats a pre-computed lookup table in terms of speed.
There are fast approximate square root algorithms (probably already in your DSP), and you can improve the approximation by one or two Newton-Raphson iterations. If doing the computation with floating-point numbers is accurate enough for you, you can probably beat a 64bit integer division in terms of speed (but not in clarity of code).
You mentioned that the result will be converted to floating-point later, it might be beneficial to not compute the integer division at all, but use your floating point hardware.
I worked out a Matlab version, using fixed point arithmetic.
This method assumes that a integer version of log2(x) can be calculated efficiently, which is true for dsPIC30/33F and TI C6000 that have instruction to detect the most significant 1 of an integer.
For this reason, this code has strong ISA depency and can not be written in portable/standard C and can be improved using instructions like multiply-and-add, multiply-and-shift, so I won't try translating it to C.
nrdiv.m
function [ y ] = nrdiv( q, x, lut)
% assume q>31, lut = 2^31/[1,1,2,...255]
p2 = ceil(log2(x)); % available in TI C6000, instruction LMBD
% available in Microchip dsPIC30F/33F, instruction FF1L
if p2<8
pre_shift=0;
else
pre_shift=p2-8;
end % shr = (p2-8)>0?(p2-8):0;
xn = shr(x, pre_shift); % xn = x>>pre_shift;
y = shr(lut(xn), pre_shift); % y = lut[xn]>pre_shift;
y = shr(y * (2^32 - y*x), 30); % basic iteration
% step up from q31 to q32
y = shr(y * (2^33 - y*x), (64-q)); % step up from q32 to desired q
if q>39
y = shr(y * (2^(1+q) - y*x), (q)); % when q>40, additional
% iteration is required,
end % no step up is performed
end
function y = shr(x, r)
y=floor(x./2^r); % simulate operator >>
end
test.m
test_number = (2^22-12345);
test_q = 48;
lut_q31 = round(2^31 ./ [1,[1:1:255]]);
display(sprintf('tested 2^%d/%d, diff=%f\n',test_q, test_number,...
nrdiv( 39, (2^22-5), lut_q31) - 2^39/(2^22-5)));
sample output
tested 2^48/4181959, diff=-0.156250
reference:
Newton–Raphson division
A little late but here is my solution.
First some assumptions:
Problem:
X=N/D where N is a constant ans a power of 2.
All 32 bit unsigned integers.
X is unknown but we have a good estimate
(previous but no longer accurate solution).
An exact solution is not required.
Note: due to integer truncation this is not an accurate algorithm!
An iterative solution is okay (improves with each loop).
Division is much more expensive than multiplication:
For 32bit unsigned integer for Arduino UNO:
'+/-' ~0.75us
'*' ~3.5us
'/' ~36us 4 We seek to replace the Basically lets start with Newton's method:
Xnew=Xold-f(x)/(f`(x)
where f(x)=0 for the solution we seek.
Solving this I get:
Xnew=XNew*(C-X*D)/N
where C=2*N
First trick:
Now that the Numerator (constant) is now a Divisor (constant) then one solution here (which does not require the N to be a power of 2) is:
Xnew=XNew*(C-X*D)*A>>M
where C=2*N, A and M are constants (look for dividing by a constant tricks).
or (staying with Newtons method):
Xnew=XNew*(C-X*D)>>M
where C=2>>M where M is the power.
So I have 2 '*' (7.0us), a '-' (0.75us) and a '>>' (0.75us?) or 8.5us total (rather than 36us), excluding other overheads.
Limitations:
As the data type is 32 bit unsigned, 'M' should not exceed 15 else there will be problems with overflow (you can probably get around this using a 64bit intermediate data type).
N>D (else the algorithm blows up! at least with unsigned integer)
Obviously the algorithm will work with signed and float data types)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void)
{
unsigned long c,d,m,x;
// x=n/d where n=1<<m
m=15;
c=2<<m;
d=10;
x=10;
while (true)
{
x=x*(c-d*x)>>m;
printf("%ld",x);
getchar();
}
return(0);
}
Having tried many alternatives, I ended up doing normal binary long division in assembly language. However, the routine does use a few optimisations that bring the execution time down to an acceptable level.
/*
* Converts the reference frequency count for a specific signal frequency
* to a ratio.
* Xs = Ns * 2^32 / Nr
* Where:
* 2^32 is a constant scaling so that the maximum accuracy can be achieved.
* Ns is the number of signal counts (fixed at 0x4000 by hardware).
* Nr is the number of reference counts, passed in W1:W0.
* #param W1:W0 The number of reference frequency pulses.
* #return W1:W0 The scaled ratio.
*/
.align 2
.global _signalToReferenceRatio
.type _signalToReferenceRatio, #function
; This is the position of the most significant bit of the fixed Ns (0x4000).
.equ LOG2_DIVIDEND, 14
.equ DIVISOR_LIMIT, LOG2_DIVIDEND+1
.equ WORD_SIZE, 16
_signalToReferenceRatio:
; Create a dividend, MSB-aligned with the divisor, in W2:W3 and place the
; number of iterations required for the MSW in [W14] and the LSW in [W14+2].
LNK #4
MUL.UU W2, #0, W2
FF1L W1, W4
; If MSW is zero the argument is out of range.
BRA C, .returnZero
SUBR W4, #WORD_SIZE, W4
; Find the number of quotient MSW loops.
; This is effectively 1 + log2(dividend) - log2(divisor).
SUBR W4, #DIVISOR_LIMIT, [W14]
BRA NC, .returnZero
; Since the SUBR above is always non-negative and the C flag set, use this
; to set bit W3<W5> and the dividend in W2:W3 = 2^(16+W5) = 2^log2(divisor).
BSW.C W3, W4
; Use 16 quotient LSW loops.
MOV #WORD_SIZE, W4
MOV W4, [W14+2]
; Set up W4:W5 to hold the divisor and W0:W1 to hold the result.
MOV.D W0, W4
MUL.UU W0, #0, W0
.checkLoopCount:
; While the bit count is non-negative ...
DEC [W14], [W14]
BRA NC, .nextWord
.alignQuotient:
; Shift the current quotient word up by one bit.
SL W0, W0
; Subtract divisor from the current dividend part.
SUB W2, W4, W6
SUBB W3, W5, W7
; Check if the dividend part was less than the divisor.
BRA NC, .didNotDivide
; It did divide, so set the LSB of the quotient.
BSET W0, #0
; Shift the remainder up by one bit, with the next zero in the LSB.
SL W7, W3
BTSC W6, #15
BSET W3, #0
SL W6, W2
BRA .checkLoopCount
.didNotDivide:
; Shift the next (zero) bit of the dividend into the LSB of the remainder.
SL W3, W3
BTSC W2, #15
BSET W3, #0
SL W2, W2
BRA .checkLoopCount
.nextWord:
; Test if there are any LSW bits left to calculate.
MOV [++W14], W6
SUB W6, #WORD_SIZE, [W14--]
BRA NC, .returnQ
; Decrement the remaining bit counter before writing it back.
DEC W6, [W14]
; Move the working part of the quotient up into the MSW of the result.
MOV W0, W1
BRA .alignQuotient
.returnQ:
; Return the quotient in W0:W1.
ULNK
RETURN
.returnZero:
MUL.UU W0, #0, W0
ULNK
RETURN
.size _signalToReferenceRatio, .-_signalToReferenceRatio

Fibonacci sequence on 68HC11 using 4-byte numbers

I'm trying to figure out a way to implement the Fibonacci sequence using a 68HC11 IDE that uses a Motorolla as11 assembler.
I've done it using 2-byte unsigned in little-endian format, now I'm attempting to change it using 4-byte variables, using big-endian
My pseudo-code (which is written in c):
RESULT = 1;
PREV = 1;
COUNT = N;
WHILE(COUNT > 2){
NEXT = RESULT + PREV;
PREV = RESULT;
RESULT = NEXT;
COUNT--;
}
I'll include some of my current assembly code. Please note that count is set to unsigned int at 1-byte, and prev, next, and result are unsigned ints at 2 bytes. N is unsigned, set to 10.
ORG $C000
LDD #1
STD RESULT
STD PREV
LDAA N
STAA COUNT
WHILE LDAA COUNT
CMPA #2
BLS ENDWHILE
LDD RESULT
ADDD PREV
STD NEXT
LDD RESULT
STD PREV
LDD NEXT
STD RESULT
DEC COUNT
BRA WHILE
ENDWHILE
DONE BRA DONE
END
The issue that I'm having is now altering this (other than the obvious variable changes/declarations) N will begin at 40 now, not 10. Would altering my pseudo-code to include pointers allow me to implement it 1 to 1 better with big-endian? Since this is in little-endian, I assume I have to alter some of the branches. Yes this is an assignment for class, I'm not looking for the code, just some guidance would be nice.
Thank you!
(Your problem description is a bit vague as to what your actual problem is, so I may be guessing a bit.)
BTW, 68HC11 is big-endian.
The 68HC11 has a 16-bit accumulator, so as soon as your result overflows this, you need to do math operations in pieces.
I suppose you mean that by changing N from 10 to 40 your fibonacci number becomes too big to be stored in a 16-bit variable.
The use or not of pointers is irrelevant to your problem as you can solve it both with or without them. For example, you can use a pointer to tell your routine where to store the result.
Depending on your maximum expected result, you need to adjust your routine. I will assume you won't need to go over 32-bit result (N=47 => 2971215073).
Here's a partially tested but unoptimized possibility (using ASM11 assembler):
STACKTOP equ $1FF
RESET_VECTOR equ $FFFE
org $100 ;RAM
result rmb 4
org $d000 ;ROM
;*******************************************************************************
; Purpose: Return the Nth fibonacci number in result
; Input : HX -> 32-bit result
; : A = Nth number to calculate
; Output : None
; Note(s):
GetFibonacci proc
push ;macro to save D, X, Y
;--- define & initialize local variables
des:4 ;allocate 4 bytes on stack
tmp## equ 5 ;5,Y: temp number
ldab #1
pshb
clrb
pshb:3
prev## equ 1 ;1,Y: previous number (initialized to 1)
psha
n## equ 0 ;0,Y: N
;---
tsy ;Y -> local variables
clra
clrb
std ,x
std prev##,y
ldd #1
std 2,x
std prev##+2,y
Loop## ldaa n##,y
cmpa #2
bls Done##
ldd 2,x
addd prev##+2,y
std tmp##+2,y
ldaa 1,x
adca prev##+1,y
staa tmp##+1,y
ldaa ,x
adca prev##,y
staa tmp##,y
ldd ,x
std prev##,y
ldd 2,x
std prev##+2,y
ldd tmp##,y
std ,x
ldd tmp##+2,y
std 2,x
dec n##,y
bra Loop##
Done## ins:9 ;de-allocate all locals from stack
pull ;macro to restore D, X, Y
rts
;*******************************************************************************
; Test code
;*******************************************************************************
Start proc
ldx #STACKTOP ;setup our stack
txs
ldx #result
ldaa #40 ;Nth fibonacci number to get
bsr GetFibonacci
bra * ;check 'result' for answer
org RESET_VECTOR
dw Start

Resources