Avoid Overflow when Calculating π by Evaluating a Series Using 16-bit Arithmetic? - c

I'm trying to write a program that calculates decimal digits of π to 1000 digits or more.
To practice low-level programming for fun, the final program will be written in assembly, on a 8-bit CPU that has no multiplication or division, and only performs 16-bit additions. To ease the implementation, it's desirable to be able to use only 16-bit unsigned integer operations, and use an iterative algorithm. Speed is not a major concern. And fast multiplication and division is beyond the scope of this question, so don't consider those issues as well.
Before implementing it in assembly, I'm still trying to figure out an usable algorithm in C on my desktop computer. So far, I found the following series is reasonably efficient and relatively easy to implement.
The formula is derived from the Leibniz Series using a convergence acceleration technique, To derive it, see Computing the Digits in π, by Carl D. Offner (https://cs.umb.edu/~offner/files/pi.pdf), page 19-26. The final formula is shown in page 26. The initial formula I've written had some typos, please refresh the page to see the fixed formula. The constant term 2 at the greatest term is explained in page 54. The paper described an advanced iterative algorithm as well, but I didn't use it here.
If one evaluates the series using many (e.g. 5000) terms, it's possible to get thousands digits of π easily, and I found this series is easy to evaluate iteratively as well using this algorithm:
Algorithm
First, rearrange the formula to obtain its constant terms from an array.
Fill the array with 2 to start the first iteration, hence the new formula resembles the original one.
Let carry = 0.
Start from the greatest term. Obtain one term (2) from the array, multiply the term by PRECISION to perform a fixed-point division against 2 * i + 1, and save the reminder as the new term to the array. Then add the next term. Now decrement i, go to the next term, repeat until i == 1. Finally add the final term x_0.
Because 16-bit integer is used, PRECISION is 10, hence 2 decimal digits are obtained, but only the first digit is valid. Save the second digit as carry. Show the first digit plus carry.
x_0 is the integer 2, it should not be added for the successive iterations, clear it.
Goto step 4 to calculate the next decimal digit, until we have all the digits we want.
Implementation 1
Translating this algorithm to C:
#include <stdio.h>
#include <stdint.h>
#define N 2160
#define PRECISION 10
uint16_t terms[N + 1] = {0};
int main(void)
{
/* initialize the initial terms */
for (size_t i = 0; i < N + 1; i++) {
terms[i] = 2;
}
uint16_t carry = 0;
for (size_t j = 0; j < N / 4; j++) {
uint16_t numerator = 0;
uint16_t denominator;
uint16_t digit;
for (size_t i = N; i > 0; i--) {
numerator += terms[i] * PRECISION;
denominator = 2 * i + 1;
terms[i] = numerator % denominator;
numerator /= denominator;
numerator *= i;
}
numerator += terms[0] * PRECISION;
digit = numerator / PRECISION + carry;
carry = numerator % PRECISION;
printf("%01u", digit);
/* constant term 2, only needed for the first iteration. */
terms[0] = 0;
}
putchar('\n');
}
The code can calculate π to 31 decimal digits, until it makes an error.
31415926535897932384626433832794
10 <-- wrong
Sometimes digit + carry is greater than 9, so it needs an extra carry. If we are very unlucky, there may even be a double carry, triple carry, etc. We use a ring-buffer to store the last 4 digits. If an extra carry is detected, we output a backspace to erase the previous digit, perform a carry, and reprint them. This is just a ugly solution to the Proof-of-Concept, which is irrelevant to my question about overflow, but for completeness, here is it. Something better would be implemented in the future.
Implementation 2 with Repeated Carry
#include <stdio.h>
#include <stdint.h>
#define N 2160
#define PRECISION 10
#define BUF_SIZE 4
uint16_t terms[N + 1] = {0};
int main(void)
{
/* initialize the initial terms */
for (size_t i = 0; i < N + 1; i++) {
terms[i] = 2;
}
uint16_t carry = 0;
uint16_t digit[BUF_SIZE];
int8_t idx = 0;
for (size_t j = 0; j < N / 4; j++) {
uint16_t numerator = 0;
uint16_t denominator;
for (size_t i = N; i > 0; i--) {
numerator += terms[i] * PRECISION;
denominator = 2 * i + 1;
terms[i] = numerator % denominator;
numerator /= denominator;
numerator *= i;
}
numerator += terms[0] * PRECISION;
digit[idx] = numerator / PRECISION + carry;
/* over 9, needs at least one carry op. */
if (digit[idx] > 9) {
for (int i = 1; i <= 4; i++) {
if (i > 3) {
/* allow up to 3 consecutive carry ops */
fprintf(stderr, "ERROR: too many carry ops!\n");
return 1;
}
/* erase a digit */
putchar('\b');
/* carry */
digit[idx] -= 10;
idx--;
if (idx < 0) {
idx = BUF_SIZE - 1;
}
digit[idx]++;
if (digit[idx] < 10) {
/* done! reprint the digits */
for (int j = 0; j <= i; j++) {
printf("%01u", digit[idx]);
idx++;
if (idx > BUF_SIZE - 1) {
idx = 0;
}
}
break;
}
}
}
else {
printf("%01u", digit[idx]);
}
carry = numerator % PRECISION;
terms[0] = 0;
/* put an element to the ring buffer */
idx++;
if (idx > BUF_SIZE - 1) {
idx = 0;
}
}
putchar('\n');
}
Great, now the program can correctly calculate 534 digits of π, until it makes an
error.
3141592653589793238462643383279502884
1971693993751058209749445923078164062
8620899862803482534211706798214808651
3282306647093844609550582231725359408
1284811174502841027019385211055596446
2294895493038196442881097566593344612
8475648233786783165271201909145648566
9234603486104543266482133936072602491
4127372458700660631558817488152092096
2829254091715364367892590360011330530
5488204665213841469519415116094330572
7036575959195309218611738193261179310
5118548074462379962749567351885752724
8912279381830119491298336733624406566
43086021394946395
22421 <-- wrong
16-bit Integer Overflow
It turns out, during the calculation of the largest terms at the beginning, the error term gets quite large, since the divisors at the beginning are in the range of ~4000. When evaluating the series, numerator actually starts to overflow in the multiplication immediately.
The integer overflow is insignificant when calculating the first 500 digits, but starts to get worse and worse, until it gives an incorrect result.
Changing uint16_t numerator = 0 to uint32_t numerator = 0 can solve this problem and calculate π to 1000+ digits.
However, as I mentioned before, my target platform is a 8-bit CPU, and only has 16-bit operations. Is there a trick to solve the 16-bit integer overflow issue that I'm seeing here, using only one or more uint16_t? If it's not possible to avoid multiple-precision arithmetic, what is the simplest method to implement it here? I know somehow I need to introduce an extra 16-bit "extension word", but I'm not sure how can I implement it.
And thanks in advance for your patience to understand the long context here.

Take a look at related QA:
Baking-Pi Challenge - Understanding & Improving
Its using Wiki: Bailey–Borwein–Plouffe_formula which is more suited for integer arithmetics.
The real challenge however would be:
How do I convert a very long binary number to decimal?.
As you probably want to print the number in dec base ...
Also if you need carry in higher level language than asm take a look at this:
Cant make value propagate through carry
You can modify it to handle as many carry bits as you need (if still less than the data type bit-width).
[Edit1] BBP example in C++/VCL
I used this formula (taken from Wiki page linked above):
converted to fixed point...
//---------------------------------------------------------------------------
AnsiString str_hex2dec(const AnsiString &hex)
{
char c;
AnsiString dec="",s;
int i,j,l,ll,cy,val;
int i0,i1,i2,i3,sig;
sig=+1; l=hex.Length();
if (l) { c=hex[l]; if (c=='h') l--; if (c=='H') l--; }
i0=0; i1=l; i2=0; i3=l;
for (i=1;i<=l;i++) // scan for parts of number
{
char c=hex[i];
if (c=='-') sig=-sig;
if ((c=='.')||(c==',')) i1=i-1;
if ((c>='0')&&(c<='9')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
if ((c>='A')&&(c<='F')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
if ((c>='a')&&(c<='f')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
}
l=0; s=""; if (i0) for (i=i0;i<=i1;i++)
{
c=hex[i];
if ((c>='0')&&(c<='9')) c-='0';
else if ((c>='A')&&(c<='F')) c-='A'-10;
else if ((c>='a')&&(c<='f')) c-='A'-10;
for (cy=c,j=1;j<=l;j++)
{
val=(s[j]<<4)+cy;
s[j]=val%10;
cy =val/10;
}
while (cy>0)
{
l++;
s+=char(cy%10);
cy/=10;
}
}
if (s!="")
{
for (j=1;j<=l;j++) { c=s[j]; if (c<10) c+='0'; else c+='A'-10; s[j]=c; }
for (i=l,j=1;j<i;j++,i--) { c=s[i]; s[i]=s[j]; s[j]=c; }
dec+=s;
}
if (dec=="") dec="0";
if (sig<0) dec="-"+dec;
if (i2)
{
dec+='.';
s=hex.SubString(i2,i3-i2+1);
l=s.Length();
for (i=1;i<=l;i++)
{
c=s[i];
if ((c>='0')&&(c<='9')) c-='0';
else if ((c>='A')&&(c<='F')) c-='A'-10;
else if ((c>='a')&&(c<='f')) c-='A'-10;
s[i]=c;
}
ll=((l*1234)>>10); // num of decimals to compute
for (cy=0,i=1;i<=ll;i++)
{
for (cy=0,j=l;j>=1;j--)
{
val=s[j];
val*=10;
val+=cy;
s[j]=val&15;
cy=val>>4;
}
dec+=char(cy+'0');
for (;;)
{
if (!l) break;;
if (s[l]) break;
l--;
}
if (!l) break;;
}
}
return dec;
}
//---------------------------------------------------------------------------
AnsiString pi_BBP() // https://en.wikipedia.org/wiki/Bailey–Borwein–Plouffe_formula
{
const int N=100; // 32*N bit uint arithmetics
int sh;
AnsiString s;
uint<N> pi,a,b,k,k2,k3,k4;
for (pi=0,sh=(N<<5)-8,k=0;sh>=0;k++,sh-=4)
{
k2=k*k;
k3=k2*k;
k4=k3*k;
a =k2* 120;
a+=k * 151;
a+= 47;
b =k4* 512;
b+=k3*1024;
b+=k2* 712;
b+=k * 194;
b+= 15;
a<<=sh;
pi+=a/b;
}
pi<<=4;
s=pi.strhex();
s=s.Insert(".",2);
return str_hex2dec(s);
}
//---------------------------------------------------------------------------
The code is using VCL AnsiString which is a self allocating string and mine uint<N> template which is unsigned integer arithmetics of 32*N bitwidth based on mine ALU32. As you can see you only need big integer division addition and multiplication for this (all the other stuff is doable on normal integers).
Here decadic result versus 1000 digit Pi reference:
ref: 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989
BPP: 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912983367336244065664308602139494639522473719070217986094370277053921717629317675238467481846766940513200056812714526356082778577134275778960917363717872146844090122495343014654958537105079227968925892354201995611212902196086403441815981362977477130996051870721134999999837297804995105973173281609631859502445945534690830264252230825334468503526193118817101000313783875288658753320838142061717766914730359825349042875546873115956286388235378759375195778185778048187
The computed bigint value is exported to hex string and then converted to decadic base using str_hex2dec from link above. The number of iterations depends on the target bitwidth.
The code is not optimized yet...

What about implementing 32 bits arithmetic ?
For an addition, add the two high order words (16 bits), then the two low order words, test the overflow bit, and carry to the high order result if necessary.
If you can predict when overflow will occur, you can switch from 16 to 32 bits arithmetic when necessary.
Testing the overflow bit cannot be done in pure C, it will require some inline assembly or an intrinsic function.
Otherwise, you can be inspired by this answer: https://codereview.stackexchange.com/a/37178/39646

There is a trick:
Consider using an array for the numerators and another array for the denominators. Each position would represent the number of times that number is multiplied to get the actual number.
An example:
(1 * 2 * 3 * 7 * 7) / (3 * 6 * 8)
Would be represented as:
num[] = {1, 1, 1, 0, 0, 0, 2};
denom[] = {0, 0, 1, 0, 0, 1, 0, 1};
Then consider factorizing into prime numbers every number before storing it, so that you have lower numbers. Now you will need another array to store all the primes:
primes[] = {2, 3, 5, 7};
num[] = {1, 1, 0, 2};
denom[] = {4, 2, 0, 0};
This will allow you to store unimaginably big numbers, but you will sooner or later want to transform them back into numbers, so you will want to simplify this first. The way to do it is just subtract factors[i] += num[i] - denom[i] for every field in the arrays, for every fraction in the series. You will want to simplify after each iteration, so you minimize overflow risk.
factors[] = {-3, -1, 0, 2};
When you need the number, just do num *= pow(primes[i], factors[i]); if the factor is positive, or num /= pow(primes, -factors[i]); if it is negative, for every field in the arrays. (Do nothing if it is 0.
num and denom are temporary arrays used to store a fraction, the array where the result is being stored is factors. Remember to memset the temporary arrays before every use.
This explanation is useful for any big fraction. To adapt it to your specific problem, you may need to use an integer power function, and also multiply by 10^something to turn the decimal part into an integral part. That is your mission, should you accept it :)

Related

How to calculate the log2 of integer in C as precisely as possible with bitwise operations

I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an integer using bitwise operations. Nevertheless, I need to increase the precision of the results. Since no floating point operations are allowed, is there any way to calculate log2(x/y) with x < y so that the result would be something like log2(x/y)*10000, aiming at getting the precision I need through arithmetic integer?
You will base an algorithm on the formula
log2(x/y) = K*(-log(x/y));
where
K = -1.0/log(2.0); // you can precompute this constant before run-time
a = (y-x)/y;
-log(x/y) = a + a^2/2 + a^3/3 + a^4/4 + a^5/5 + ...
If you write the loop correctly—or, if you prefer, unroll the loop to code the same sequence of operations looplessly—then you can handle everything in integer operations:
(y^N*(1*2*3*4*5*...*N)) * (-log(x/y))
= y^(N-1)*(2*3*4*5*...*N)*(y-x) + y^(N-2)*(1*3*4*5*...*N)*(y-x)^2 + ...
Of course, ^, the power operator, binding tighter than *, is not a C operator, but you can implement that efficiently in the context of your (perhaps unrolled) loop as a running product.
The N is an integer large enough to afford desired precision but not so large that it overruns the number of bits you have available. If unsure, then try N = 6 for instance. Regarding K, you might object that that is a floating-point number, but this is not a problem for you because you are going to precompute K, storing it as a ratio of integers.
SAMPLE CODE
This is a toy code but it works for small values of x and y such as 5 and 7, thus sufficing to prove the concept. In the toy code, larger values can silently overflow the default 64-bit registers. More work would be needed to make the code robust.
#include <stddef.h>
#include <stdlib.h>
// Your program will not need the below headers, which are here
// included only for comparison and demonstration.
#include <math.h>
#include <stdio.h>
const size_t N = 6;
const long long Ky = 1 << 10; // denominator of K
// Your code should define a precomputed value for Kx here.
int main(const int argc, const char *const *const argv)
{
// Your program won't include the following library calls but this
// does not matter. You can instead precompute the value of Kx and
// hard-code its value above with Ky.
const long long Kx = lrintl((-1.0/log(2.0))*Ky); // numerator of K
printf("K == %lld/%lld\n", Kx, Ky);
if (argc != 3) exit(1);
// Read x and y from the command line.
const long long x0 = atoll(argv[1]);
const long long y = atoll(argv[2]);
printf("x/y == %lld/%lld\n", x0, y);
if (x0 <= 0 || y <= 0 || x0 > y) exit(1);
// If 2*x <= y, then, to improve accuracy, double x repeatedly
// until 2*x > y. Each doubling offsets the log2 by 1. The offset
// is to be recovered later.
long long x = x0;
int integral_part_of_log2 = 0;
while (1) {
const long long trial_x = x << 1;
if (trial_x > y) break;
x = trial_x;
--integral_part_of_log2;
}
printf("integral_part_of_log2 == %d\n", integral_part_of_log2);
// Calculate the denominator of -log(x/y).
long long yy = 1;
for (size_t j = N; j; --j) yy *= j*y;
// Calculate the numerator of -log(x/y).
long long xx = 0;
{
const long long y_minus_x = y - x;
for (size_t i = N; i; --i) {
long long term = 1;
size_t j = N;
for (; j > i; --j) {
term *= j*y;
}
term *= y_minus_x;
--j;
for (; j; --j) {
term *= j*y_minus_x;
}
xx += term;
}
}
// Convert log to log2.
xx *= Kx;
yy *= Ky;
// Restore the aforementioned offset.
for (; integral_part_of_log2; ++integral_part_of_log2) xx -= yy;
printf("log2(%lld/%lld) == %lld/%lld\n", x0, y, xx, yy);
printf("in floating point, this ratio of integers works out to %g\n",
(1.0*xx)/(1.0*yy));
printf("the CPU's floating-point unit computes the log2 to be %g\n",
log2((1.0*x0)/(1.0*y)));
return 0;
}
Running this on my machine with command-line arguments of 5 7, it outputs:
K == -1477/1024
x/y == 5/7
integral_part_of_log2 == 0
log2(5/7) == -42093223872/86740254720
in floating point, this ratio of integers works out to -0.485279
the CPU's floating-point unit computes the log2 to be -0.485427
Accuracy would be substantially improved by N = 12 and Ky = 1 << 20, but for that you need either thriftier code or more than 64 bits.
THRIFTIER CODE
Thriftier code, wanting more effort to write, might represent numerator and denominator in prime factors. For example, it might represent 500 as [2 0 3], meaning (22)(30)(53).
Yet further improvements might occur to your imagination.
AN ALTERNATE APPROACH
For an alternate approach, though it might not meet your requirements precisely as you have stated them, #phuclv has given the suggestion I would be inclined to follow if your program were mine: work the problem in reverse, guessing a value c/d for the logarithm and then computing 2^(c/d), presumably via a Newton-Raphson iteration. Personally, I like the Newton-Raphson approach better. See sect. 4.8 here (my original).
MATHEMATICAL BACKGROUND
Several sources including mine already linked explain the Taylor series underlying the first approach and the Newton-Raphson iteration of the second approach. The mathematics unfortunately is nontrivial, but there you have it. Good luck.

Recursively calculate integer sqrts with C

I adapted some python code I found here to calculate the sqrt of a number if it exists as an integer using bitwise operations. here is my code.
int ft_sqrt(int nb){
int smallcandidate;
int largecandidate;
if (nb < 0){
return (0);
}else if (nb < 2){
return (nb);
}else{
smallcandidate = ft_sqrt(nb >> 2) << 1;
largecandidate = smallcandidate + 1;
if (largecandidate * largecandidate > nb){
return (smallcandidate);
}
else{
return (largecandidate);
}
}
}
This works for every number i've tested (within the bounds of what an integer can hold), except for 3. Why is this? and how can i fix it?
Sorry, but you had better to use an iterative function, as you see your recursion is final recursion, that can be collapsed to a while loop. Your algorithm is:
#include <stdio.h>
unsigned isqrt(unsigned x)
{
unsigned quot = 1, mean = x; /* isqrt must be between these two */
/* we begin with extreme numbers and for each pair of (quot,mean),
* the first, below the square root, and the other above, we get
* mean value of the two (lesser than previous) and the
* quotient (above the prev. value, but still less than the
* square root, so closer to it) to get a better approach */
while (quot < mean) {
mean = (mean + quot) >> 1;
quot = x / mean;
}
/* quot is always <= mean so finally it should be the same,
* we can return quot or mean, indistinctly. */
return mean;
}
int main() /* main test function, eliminate to use the above. */
{
unsigned n;
while (scanf("%u", &n) == 1) {
printf("isqrt(%u) ==> %u\n", n, isqrt(n));
}
}
EDIT
This algorithm is based on the fact that the geometric mean is always closer to 1 than the arithmetic mean. So we take two approximations (the source number and 1, as their geometric mean is the square root) then we calculate their arithmetic mean (so the value obtained is between both, and so, closer to the geometric mean) then we divide the original number by the arithmetic mean so both aproximations multiply to the original data (and their geometric mean is, again, the square root). As, in each loop the arithmetic mean is closer to the geometric mean, so must be the quotient (and so the quotient to the geometric mean), leading to two numbers that are closer to the square root. We continue the algorithm until both numbers are equal (a / sqrt(a) = sqrt(a), and (sqrt(a) + sqrt(a))/2 = sqrt(a)) or, due to rounding errors, they cross over. ---this happens with integers---

Getting the amount of decimals a number has in c?

I am trying to get the amount of decimals a number has in c: 0.0001 -> 4 decimals, 3,54235 -> 5 decimals, and so on (If you don't get it, the number of numbers behind the comma.) our teacher sais it can be done in two ways, using a string and not using a string. I figured i would go ahead not using a string because I have NO experiance with strings.
So this is what I came up with
int funzione1(float decimals){
int x=1,c=0,y=1;
while (x!=0){
if((decimals - y) > 0){
y = y / 10;
c++;
}else{
decimals = decimals - y;
}
if(decimals == 0)
x=0;
}
return c-1;
}
When calling the function it should return the amount of decimals I figured, but it does not, actually it gets stuck in an infinite loop.
the Idea behind this code was to for every number in the "string" of numbers to get them to 0 and then check if the total number was 0
3.456 c=0
0.456 c=1
0.056 c=2
0.006 c=3
0.000 return c
But That leaves me with two problems 1 how to detirmine tha amount of numbers before the comma for like 5564.34234 this code will not work because it will count to 8 before the full number is a solid 0. and therefor not return the right number of decimals.2. the code I designed isn't working. Just gets stuck in an infinite loop. I don't know where the infiniteness of the loop is created.
How do i get this code to work?
PS. I found this article about this problem in Java: How to find out how many decimals a number has? but it is using strings and I would not like that because of the fact that I don't know how to use strings.
edit: Here is another piece of code i tried and which faild really bad givving an output of 50 when you enter a number higher than 1 and 0 if the number is lower than 0(I don't get it, not a little bit) anyway here is the code:
int funzione1(float decimals){
int i=0;
while(decimals!=((int)decimals)){
i++;
decimals=decimals*10;
}
return i;
}
If you don't care about rounding then you don't need to count the number of decimal places, you can just count the number of binary places. This is because 10 contains 2 as a factor exactly once so 10^n and 2^n have the same number of 2s as factors. The fastest way to count the number of binary places is to get the exponent of the floating point number.
e.g. binary 0.001 takes 3 decimal places to represent 0.125, 0.0001 takes 4 0.0625.
You can either get the fractional part of the value and keep multiplying by 2 and removing the integer as people have suggested doing with 10 (it will give you the same answer).
Or you can have a bit more fun over optimising the solution (the places function does most of the work):
#include <math.h>
int saturateLeft (unsigned int n) {
n |= (n << 1);
n |= (n << 2);
n |= (n << 4);
n |= (n << 8);
n |= (n << 16);
return n;
}
int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
int places (double num) {
int exponent;
float mantissa = frexp (num, &exponent);
/* The answer we are looking for is given by the
(number of bits used by mantissa) - the exponent.
*/
unsigned intMantissa = scalbnf (mantissa, 32);
/* Could also be got by doing:
intMantissa = *(unsigned *)&mantissa << 9;
*/
/* To work out how many bits the mantissa covered we
need no gaps in the mantissa, this removes any gaps.
*/
intMantissa = saturateLeft (intMantissa);
int bitCount = NumberOfSetBits (intMantissa);
/* bitCount could also be found like this:
intMantissa = ~intMantissa;
int bitCount = 32 - ilogb (intMantissa) - 1;
*/
int result = bitCount - exponent;
if (result < 0)
return 0;
return result;
}
The bitCounting algorithm was found here.
Your best bet would be to read the input as string and just count the digits after '.'. Floating point numbers are not exact representation i.e. the decimal values are stored in binary internally and may not exactly represent the true decimal value. However every binary representation is some decimal number with finite digits.
Have a look at this answer in SO.
Here's an idea:
Start with a floating point number, say a = 3.0141589
Make the part before the decimal point 0 by subtracting the integral part, leaving 0.0141589
In a loop, multiply a by 10 and save the integral part, this gives you a list of digits from 0 to 9.
From this list, derive the number of decimals
There are some interesting details in this algorithm for you to find out, and I won't spoil the fun or surprises waiting for you.
Consider my string-based solution:
#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <math.h>
#include <float.h>
int countDecimals(double x)
{
int cnt;
char * ptr;
char str[20] = {0};
// take fractional part from x
double ip;
double fp = modf (x , &ip);
// printf("%lg + %.15lg\n", ip, fp); // uncomment for debugging
// to be sure that Decimal-point character is '.'
setlocale(LC_NUMERIC, "C");
// make string from number's fractional part asking maximum digits (DBL_DIG)
sprintf(str,"%.*lg", DBL_DIG, fp);
// find point
if( ptr = strchr(str, '.') )
{
// calculate length of string after point
// length - (point position from beginning) - point = number of decimals
cnt = strlen(str) - (ptr - str) - 1;
}
else
{
cnt = 0;
}
return cnt;
}
int main (void)
{
double fn;
printf("Enter a float number: ");
scanf("%lf", &fn);
printf("Your number has %d decimals.\n" , countDecimals(fn) );
return 0;
}
But you should remember about possible rounding and accuracy errors.
Note: I have used double type, for float function modff should be used instead of modf, and FLT_DIG instead of DBL_DIG

sprintf code consuming lot of space (embedded)

I have written the following function for sprintf
#include <reg52.h>
char output_arr[7];
void print_to_arr(float val)
{
int integer, fractional, temp = 0, i, ni = 2;
float fract;
integer = (int)val;
fract = val - (float)integer;
fract = fract *1000;
fractional = (int)fract;
for(i=0; i<=2; i++)
{
temp = integer%10;
integer = integer/10;
output_arr[ni - i] = temp+48;
}
temp = 0;
output_arr[3] = '.';
for(i=0; i<=2; i++)
{
temp = fractional%10;
fractional = fractional/10;
output_arr[(ni+4) - i] = temp+48;
}
temp = 0;
}
and for this the code size is
Program Size: data=36.0 xdata=0 code=1112
Can anyone please hep me with how to reduce the code size to the minimal possible vale?
Performing floating point operations on a device with no floating point hardware, and especially an 8 bit device will itself introduce a significant amount of code (every operation must be supported by a function rather than a single instruction). Most stdio implementations for constrained embedded targets have an option to omit floating point support for exactly that reason.
Even without floating-point, on an 8 bit target integer modulo (%) and divide (/) are often expensive. In this case 8051 only has an 8 bit divide operation, so for larger types divide is synthesized from multiple instructions.
If you really must have floating point support, then the following may have a small impact on size. The changes include minimisation of floating point operations, eradication of unnecessary variables, the use of initialisation rather than assignment of the variable initial values, and removal of the loop invariant ni + 4 operation. These may have no impact, or they may make it easier for the optimiser to generate smaller code - they may make things worse; you'll have to experiment.
void print_to_arr(float val)
{
int integer = (int)val ;
int fractional = (int)(val * 1000.0f) - (integer * 1000) ;
int i ;
for( i = 0; i < 3; i++, integer /= 10 )
{
output_arr[2 - i] = integer % 10 + '0' ;
}
output_arr[3] = '.';
for( i = 0; i < 3; i++, fractional /= 10 )
{
output_arr[6 - i] = (fractional % 10) + '0' ;
}
}
Note that the implementation does not round the final fractional digit, and fails completely for negative values. Fixing these issues requires yet more code.

How can you easily calculate the square root of an unsigned long long in C?

I was looking at another question (here) where someone was looking for a way to get the square root of a 64 bit integer in x86 assembly.
This turns out to be very simple. The solution is to convert to a floating point number, calculate the sqrt and then convert back.
I need to do something very similar in C however when I look into equivalents I'm getting a little stuck. I can only find a sqrt function which takes in doubles. Doubles do not have the precision to store large 64bit integers without introducing significant rounding error.
Is there a common math library that I can use which has a long double sqrt function?
There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.
The rounding error is at most one part in 253. This causes an error in the square root of at most one part in 254. The sqrt itself has a rounding error of less than one part in 253, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 232, so an error of three parts in 254 is less than .00000072.
For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.
Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:
uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
++y;
(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 232.)
Function sqrtl(), taking a long double, is part of C99.
Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.
Yes, the standard library has sqrtl() (since C99).
If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:
uint64_t mysqrt (uint64_t a)
{
uint64_t min=0;
//uint64_t max=1<<32;
uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
while(1)
{
if (max <= 1 + min)
return min;
uint64_t sqt = min + (max - min)/2;
uint64_t sq = sqt*sqt;
if (sq == a)
return sqt;
if (sq > a)
max = sqt;
else
min = sqt;
}
Debugging is left as exercise for the reader.
Here we collect several observations in order to arrive to a solution:
In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.
http://en.wikipedia.org/wiki/Integer_square_root
The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:
#include <stdio.h> /* For printf()... */
#include <stdint.h> /* For uintmax_t... */
#include <math.h> /* For sqrt() .... */
int IntLog2(uintmax_t n) {
if (n == 0) return -1; /* Error */
int L;
for (L = 0; n >>= 1; L++)
;
return L; /* It takes < 64 steps for long long */
}
uintmax_t IntExp2(int n) {
if (n < 0)
return 0; /* Error */
uintmax_t E;
for (E = 1; n-- > 0; E <<= 1)
;
return E; /* It takes < 64 steps for long long */
}
uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }
uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }
int main(void) {
uintmax_t N = 947612934; /* Try here your number! */
uintmax_t sqrtn = IntSqrtDn(N), /* 1st approx. to sqrt(N) by below */
sqrtn0 = IntSqrtUp(N); /* 1st approx. to sqrt(N) by above */
/* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
/* However, we take care of subtractions on unsigned arithmetic, just in case... */
while ( (sqrtn > sqrtn0 + 1) || (sqrtn0 > sqrtn+1) )
sqrtn0 = sqrtn, sqrtn = (sqrtn0 + N/sqrtn0) / 2; /* Newton iteration */
printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",
N, sqrt(N), IntSqrtDn(N), IntSqrtUp(N), sqrtn, sqrtn*sqrtn);
return 0;
}
The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.
If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.
Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.
Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.
// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
uint64_t q = 0, b = 1, r = v;
for( b <<= 62; b > 0 && b > r; b >>= 2);
while( b > 0 ) {
uint64_t t = q + b;
q >>= 1;
if( r >= t ) {
r -= t;
q += b;
}
b >>= 2;
}
return q;
}
The for loop may be optimized by using the clz machine code instruction.

Resources